Quantum Techniques for Chemical Reaction Networks

16 May, 2013

 

The summer before last, I invited Brendan Fong to Singapore to work with me on my new ‘network theory’ project. He quickly came up with a nice new proof of a result about mathematical chemistry. We blogged about it, and I added it to my book, but then he became a grad student at Oxford and got distracted by other kinds of networks—namely, Bayesian networks.

So, we’ve just now finally written up this result as a self-contained paper:

• John Baez and Brendan Fong, Quantum techniques for studying equilibria in chemical reaction networks.

Check it out and let us know if you spot mistakes or stuff that’s not clear!

The idea, in brief, is to use math from quantum field theory to give a somewhat new proof of the Anderson–Craciun–Kurtz theorem.

This remarkable result says that in many cases, we can start with an equilibrium solution of the ‘rate equation’ which describes the behavior of chemical reactions in a deterministic way in the limit of a large numbers of molecules, and get an equilibrium solution of the ‘master equation’ which describes chemical reactions probabilistically for any number of molecules.

The trick, in our approach, is to start with a chemical reaction network, which is something like this:

and use it to write down a Hamiltonian describing the time evolution of the probability that you have various numbers of each kind of molecule: A, B, C, D, E, … Using ideas from quantum mechanics, we can write this Hamiltonian in terms of annihilation and creation operators—even though our problem involves probability theory, not quantum mechanics! Then we can write down the equilibrium solution as a ‘coherent state’. In quantum mechanics, that’s a quantum state that approximates a classical one as well as possible.


All this is part of a larger plan to take tricks from quantum mechanics and apply them to ‘stochastic mechanics’, simply by working with real numbers representing probabilities instead of complex numbers representing amplitudes!

I should add that Brendan’s work on Bayesian networks is also very cool, and I plan to talk about it here and even work it into the grand network theory project I have in mind. But this may take quite a long time, so for now you should read his paper:

• Brendan Fong, Causal theories: a categorical perspective on Bayesian networks.


Game Theory (Part 11)

6 February, 2013

Here’s a game. I flip a fair coin. If it lands heads up, I give you $1. If it lands tails up, I give you nothing.

How much should you pay to play this game?

This is not a mathematics question, because it asks what you “should” do. This could depend on many things that aren’t stated in the question.

Nonetheless, mathematicians have a way they like to answer this question. They do it by computing the so-called ‘expected value’ of your payoff. With probability 1/2 you get 1 dollar; with probability 1/2 you get 0 dollars. So, the expected value is defined to be

\displaystyle{ \frac{1}{2} \times 1 + \frac{1}{2} \times 0 = \frac{1}{2} }

Don’t be fooled by the word ‘expected’: mathematicians use words in funny ways. I’m not saying you should expect to get 1/2 a dollar each time you play this game: obviously you don’t! It means that you get 1/2 a dollar ‘on average’. More precisely: if you play the game lots of times, say a million times, there’s a high probability that you’ll get fairly close to 1/2 a million dollars. (We could make this more precise and prove it, but that would be quite a digression right now.)

So, if you have lots of money and lots of time, you could pay up to 1/2 a dollar to play this game, over and over, and still make money on average. If you pay exactly 1/2 a dollar you won’t make money on average, but you won’t lose it either—on average.

Expected values

Let’s make the idea precise:

Definition. Suppose X is a finite set and p is a probability distribution on that set. Suppose

f: X \to \mathbb{R}

is a function from X to \mathbb{R}. Then the expected value of f with respect to p is defined to be

\displaystyle{ \langle f \rangle = \sum_{i \in X} p_i f(i) }

The idea here is that we are averaging the different values f(i) of the function f, but we count f(i) more when the probability of the event i is bigger. We pronounce \langle f \rangle like this: “the expected value of f“.

Examples

Example 1. Suppose you enter a lottery have a 0.01% chance of winning $100 and a 99.99% chance of winning nothing. What is the expected value of your payoff?

With probability 0.0001 you win $100. With probability .9999 you win zero dollars. So, your expected payoff is

\displaystyle{ 0.0001 \times 100 + .9999 \times 0 = 0.01 }

dollars. So: if you play this game over and over, you expect that on average you will win a penny per game.

But usually you have to pay to enter a lottery! This changes everything. Let’s see how:

Example 2. Suppose you pay $5 to enter a lottery. Suppose you have a 0.01% chance of winning $100 and a 99.99% chance of winning nothing. What is the expected value of your payoff, including your winnings but also the money you paid?

With probability 0.0001 you win $100, but pay $5, so your payoff is $95 in this case. With probability .9999 you win nothing, but pay $5, so your payoff is -$5 in this case. So, your expected payoff is

\displaystyle{ 0.0001 \times 95 - .9999 \times 5 = - 4.99 }

dollars. In simple terms: if we play this game over and over, we expect that on average we will lose $4.99 per play.

Example 3. Suppose you pay $5 to play a game where you
flip a coin 5 times. Suppose the coin is fair and the flips are independent. If the coin lands heads up every time, you win $100. Otherwise you win nothing. What is the expected value of your payoff, including your winnings but also the money you paid?

Since the coin flips are fair and independent, the probability that it lands heads up every time is

\displaystyle{ \frac{1}{2^5} = \frac{1}{32} }

So, when we count the $5 you pay to play, with probability 1/32 your payoff is $95, and with probability (1 – 1/32) = 31/32 your payoff is -$5. The expected value of your payoff is thus

\displaystyle{ \frac{1}{32} \times 95 - \frac{31}{32} \times 5 = -1.875 }

dollars.

Risk aversion and risk tolerance

Soon we’ll start talking about games where players used ‘mixed strategies’, meaning that they randomly make their choices according to some probability distribution. To keep the math simple, we will assume our ‘rational agents’ want to maximize the expected value of their payoff.

But it’s important to remember that life is not really so simple, especially if payoffs are measured in dollars. Rational agents may have good reasons to do something else!

For example, suppose some evil fiend says they’ve kidnapped my wife and they’ll kill her unless I give him a dollar. Suppose I only have 99 cents. But suppose they offer me a chance to play this game: I flip a fair coin, and if it lands heads up, I get $1. If it lands tails up, I get nothing.

How much would I pay to play this game?

Assuming I had no way to call the police, etcetera, I would pay all my 99 cents to play this game. After all, if I don’t play it, my wife will die! But if I do play it, I would at least have a 50% chance of saving her.

The point here is that my happiness, or utility, is not proportional to my amount of money. If I have less than $1, I’m really miserable. If I have $1 or more, I’m much better off.

There are many other reasons why people might be willing to pay more or less to play a game than the expected value of its monetary payoff. Some people are risk tolerant: they are willing to accept higher risks to get a chance at a higher payoffs. Others are risk averse: they would prefer to have a high probability of getting a payoff even if it’s not so big. See:

Risk aversion, Wikipedia.

In class I asked all the students: would you like to play the following game? I’ll flip a fair coin. Then I’ll double your quiz score for today if it comes heads, but give you a zero for your quiz score if it comes up tails.

Suppose your quiz score is Q. If you get heads, I’ll give you Q more points. If you get tails, I’ll take away Q points. So the expected value of the payoff for this game, measured in points, is

\displaystyle{ \frac{1}{2} \times Q - \frac{1}{2} \times Q = 0 }

So, if the expected value is what matters to you, you’ll be right on the brink of wanting to play this game: it doesn’t help you, and it doesn’t hurt you.

But in reality, different people will make different decisions. I polled the students, using our electronic clicker system, and 46% said they wanted to play this game. 54% said they did not.

Then I changed the game. I said that I would roll a fair 6-sided die. If a 6 came up, I would multiply their quiz score by 6. Otherwise I would set their quiz score to zero.

If your quiz score is Q, your payoff if you win will be 5 Q, since I’m multiplying your score by 6. If you lose, your payoff will be -Q. So, the expected value of your payoff is still zero:

\displaystyle{ \frac{1}{6} \times 5Q - \frac{5}{6} \times Q = 0 }

But now the stakes are higher, in a certain sense. You can win more, but it’s less likely.

Only 30% of students wanted to play this new game, while 70% said they would not.

I got the students who wanted to play to hand in slips of paper with their names on them. I put them in a hat and had a student randomly choose one. The winner got to play this game. He rolled a 1. So, his quiz score for the day went to zero.

Ouch!

Here is a famous beggar in San Francisco:


Game Theory (Part 10)

5 February, 2013

Last time we solved some probability puzzles involving coin flips. This time we’ll look at puzzles involving cards.

Permutations

Example 1. How many ways are there to order 3 cards: a jack (J), a queen (Q), and a king (K)?

By order them I mean put one on top, then one in the middle, then one on the bottom. There are three choices for the first card: it can be A, Q, or K. That leaves two choices for what the second card can be, and just one for the third. So, there are

3 \times 2 \times 1 = 6

ways to order the cards.

Example 2. How many ways are there to order all 52 cards in an ordinary deck?

By the same reasoning, the answer is

52 \times 51 \times 50 \times \cdots \times 2 \times 1

This is a huge number. We call it 52 factorial, or 52! for short. I guess the exclamation mark emphasizes how huge this number is. In fact

52! \approx 8.06 \times 10^{67}

This is smaller than the number of atoms in the observable universe, which is about 10^{80}. But it’s much bigger than the number of galaxies in the observable universe, which is about 10^{11}, or even the number of stars in the observable universe, which is roughly 10^{22}. It’s impressive that we can hold such a big number in our hand… in the form of possible ways to order a deck of cards!

A well-shuffled deck

Definition 1. We say a deck is well-shuffled if each of the possible ways of ordering the cards in the deck has the same probability.

Example 3. If a deck of cards is well-shuffled, what’s the probability that it’s in this order?

Since all orders have the same probability, and there are 52! of them, the probability that they’re in any particular order is

\displaystyle{ \frac{1}{52!} }

So, the answer is

\displaystyle{ \frac{1}{52!} \approx 1.24 \times 10^{-68} }

A hand from a well-shuffled deck

Suppose you take the top k cards from a well-shuffled deck of n cards. You’ll get a subset of cards—though card players call this a hand of cards instead of a subset. And, there are n choose k possible hands you could get! Remember from last time:

Definition 2. The binomial coefficient

\displaystyle{ \binom{n}{k} = \frac{n(n-1)(n-2) \cdots (n-k+1)}{k(k-1)(k-2) \cdots 1}}

called n choose k is the number of ways of choosing a subset of k things from a set of n things.

I guess card-players call a set a ‘deck’, and a subset a ‘hand’. But now we can write a cool new formula for n choose k. Just multiply the top and bottom of that big fraction by

\displaystyle{  (n-k)(n-k-1) \cdots 1}

We get

\begin{array}{ccl} \displaystyle{ \binom{n}{k}} &=& \displaystyle{  \frac{n(n-1)(n-2) \cdots 1}{(k(k-1)(k-2) \cdots 1)((n-k)(n-k-1) \cdots 1)} } \\ &=& \displaystyle{ \frac{n!}{k! (n-k)!} } \end{array}

I won’t do it here, but here’s something you can prove using stuff I’ve told you. Suppose you have a well-shuffled deck of n cards and you draw a hand of k cards. Then each of these hands is equally probable!

Using this we can solve lots of puzzles.

Example 4. If you draw a hand of 5 cards from a well-shuffled standard deck, what’s the probability that you get the 10, jack, queen, king and ace of spades?

Since I’m claiming that all hands are equally probable, we just need to count the number of hands, and take the reciprocal of that.

There are

\displaystyle{ \binom{52}{5} = \frac{52 \times 51 \times 50 \times 49 \times 48}{5 \times 4 \times 3 \times 2 \times 1} }

5-card hands drawn from a 52-card deck. So, the probability of getting any particular hand is

\displaystyle{  \frac{1}{\binom{52}{5}} = \frac{5 \times 4 \times 3 \times 2 \times 1}{52 \times 51 \times 50 \times 49 \times 48} }

We can simplify this a bit since 50 is 5 × 10 and 48 is twice 4 × 3 × 2 × 1. So, the probability is

\displaystyle{  \frac{1}{52 \times 51 \times 10 \times 49 \times 2} = \frac{1}{2598960} \approx 3.85 \cdot 10^{-7}}

A royal flush

The hand we just saw:

{10♠, J♠, Q♠, K♠, A♠}

is an example of a ‘royal flush’… the best kind of hand in poker!

Definition 3. A straight is a hand of five cards that can be arranged in a consecutive sequence, for example:

{7♥, 8♣, 9♠, 10♠, J♦}

Definition 4. A straight flush is a straight whose cards are all of the same suit, for example:

{7♣, 8♣, 9♣, 10♣, J♣}

Definition 5. A royal flush is a straight flush where the cards go from 10 to ace, for example:

{10♠, J♠, Q♠, K♠, A♠}

Example 5. If you draw a 5-card hand from a standard deck, what is the probability that it is a royal flush?

We have seen that each 5-card hand has probability

\displaystyle{ \frac{1}{\binom{52}{5}} = \frac{1}{2598960} }

There are just 4 royal flushes, one for each suit. So, the probability of getting a royal flush is

\displaystyle{ \frac{4}{\binom{52}{5}} = \frac{1}{649740} \approx 0.000154\%}

Puzzles

Suppose you have a well-shuffled standard deck of 52 cards, and you draw a hand of 5 cards.

Puzzle 1. What is the probability that the hand is a straight flush?

Puzzle 2. What is the probability that the hand is a straight flush but not a royal flush?

Puzzle 3. What is the probability that the hand is a straight?

Puzzle 4. What is the probability that the hand is a straight but not a straight flush?


Game Theory (Part 9)

5 February, 2013

Last time we talked about independence of a pair of events, but we can easily go on and talk about independence of a longer sequence of events. For example, suppose we have three coins. Suppose:

• the 1st coin has probability p_H of landing heads up and p_T of landing tails up;
• the 2nd coin has probability q_H of landing heads up and q_T of landing tails up;
• the 3rd coin has probability r_H of landing heads up and r_T of landing tails up.

Suppose we flip all of these coins: the 1st, then the 2nd, then the 3rd. What’s the probability that we get this sequence of results:

(H, T, T)

If the coin flips are independent, the probability is just this product:

p_H \, q_T \, r_T

See the pattern? We just multiply the probabilities. And there’s nothing special about coins here, or the number three. We could flip a coin, roll a die, pick a card, and see if it’s raining outside.

For example, what’s the probability that we get heads with our coin, the number 6 on our die, an ace of spades with our cards, and it’s raining? If these events are independent, we just calculate:

the probability that we get heads, times
the probability that we roll a 6, times
the probability that we get an ace of spades, times
the probability that it’s raining outside.

Let’s solve some puzzles using this idea!

Three flips of a fair coin

Example 1. Suppose you have a fair coin: this means it has a 50% chance of landing heads up and a 50% chance of landing tails up. Suppose you flip it three times and these flips are independent. What is the probability that it lands heads up, then tails up, then heads up?

We’re asking about the probability of this event:

(H, T, H)

Since the flips are independent this is

p_{(H,T,H)} = p_H \, p_T \, p_H

Since the coin is fair we have

\displaystyle{ p_H = p_T = \frac{1}{2} }

so

\displaystyle{ p_H p_T p_H = \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} = \frac{1}{8} }

So the answer is 1/8, or 12.5%.

Example 2. In the same situation, what’s the probability that the coin lands heads up exactly twice?

There are 2 × 2 × 2 = 8 events that can happen:

(H,H,H)
(H,H,T), \; (H,T,H), \; (T,H,H)
(H,T,T), \; (T,H,T), \; (T,T,H)
(T,T,T)

We can work out the probability of each of these events. For example, we’ve already seen that (H,T,H) is

\displaystyle{ p_{(H,T,H)} = p_H p_T p_H = \frac{1}{8} }

since the coin is fair and the flips are independent. In fact, all 8 probabilities work out the same way. We always get 1/8. In other words, each of the 8 events is equally likely!

But we’re interested in the probability that we get exactly two heads. That’s the probability of this subset:

S = \{ (T,H,H), (H,T,H), (H,H,T) \}

Using the rule we saw in Part 7, this probability is

\displaystyle{ p(S) = p_{(T,H,H)} + p_{(H,T,H)} + p_{(H,H,T)} = 3 \times \frac{1}{8} }

So the answer is 3/8, or 37.5%.

I could have done this a lot faster. I could say “there are 8 events that can happen, each equally likely, and three that give us two heads, so the probability is 3/8.” But I wanted to show you how we’re just following rules we’ve already seen!

Three flips of a very unfair coin

Example 3. Now suppose we have an unfair coin with a 90% chance of landing heads up and 10% chance of landing tails up! What’s the probability that if we flip it three times, it lands heads up exactly twice? Again let’s assume the coin flips are independent.

Most of the calculation works exactly the same way, but now our coin has

\displaystyle{ p_H = 0.9, \quad p_T = 0.1 }

We’re interested in the events where the coin comes up heads twice, so we look at this subset:

S = \{ (T,H,H), (H,T,H), (H,H,T) \}

The probability of this subset is

\begin{array}{ccl} p(S) &=& p_{(T,H,H)} + p_{(H,T,H)} + p_{(H,H,T)} \\  &=& p_T \, p_H  \, p_H + p_H \, p_T \, p_H + p_H \, p_H \, p_T \\ &=& 3 p_T p_H^2 \\ &=& 3 \times 0.1 \times 0.9^2 \\ &=& 0.3 \times 0.81 \\ &=& 0.243 \end{array}

So now the probability is just 24.3%.

Six flips of a fair coin

Example 4. Suppose you have a fair coin. Suppose you flip it six times and these flips are independent. What is the probability that it lands heads up exactly twice?

We did a similar problem already, where we flipped the coin three times. Go back and look at that if you forget! The answer to that problem was

\displaystyle{ 3 \times \frac{1}{8} }

Why? Here’s why: there were 3 ways to get two heads when you flipped 3 coins, and each of these events had probability

\displaystyle{ \left(\frac{1}{2}\right)^3 = \frac{1}{8} }

We can do our new problem the same way. Count the number of ways to get two heads when we flip six coins. Then multiply this by

\displaystyle{ \left(\frac{1}{2}\right)^6 = \frac{1}{64} }

The hard part is to count how many ways we can get two heads when we flip six coins. To get good at probabilities, we have to get good at counting. It’s boring to list all the events we’re trying to count:

(H,H,T,T,T,T), (H,T,H,T,T,T), (H,T,T,H,T,T), …

So let’s try to come up with a better idea.

We have to pick 2 out of our 6 flips to be H’s. How many ways are there to do this?

There are 6 ways to pick one of the flips and draw a red H on it, and then 5 ways left over to pick another and draw a blue H on it… letting the rest be T’s. For example:

(T, H, T, T, H, T)

So, we’ve got 6 × 5 = 30 choices. But we don’t really care which H is red and which H is blue—that’s just a trick to help us solve the problem. For example, we don’t want to count

(T, H, T, T, H, T)

as different from

(T, H, T, T, H, T)

So, there aren’t really 30 ways to get two heads. There are only half as many! There are 15 ways.

So, the probability of getting two heads when we flip the coin six times is

\displaystyle{ 15 \times \frac{1}{64} = \frac{15}{64} \approx .234 }

where the squiggle means ‘approximately’. So: about 23.4%.

Binomial coefficients

Now for some jargon, which will help when we do harder problems like this. We say there are 6 choose 2 ways to choose 2 out of 6 things, and we write this as

\displaystyle{ \binom{6}{2} }

This sort of number is called a binomial coefficient.

We’ve just shown that

\displaystyle{ \binom{6}{2}  = \frac{6 \times 5}{2 \times 1} = 15 }

Why write it like this funky fraction: \frac{6 \times 5}{2 \times 1}? Because it’ll help us see the pattern for doing harder problems like this!

Nine flips of a fair coin

If we flip a fair coin 9 times, and the flips are independent, what’s the probability that we get heads exactly 6 times?

This works just like the last problem, only the numbers are bigger. So, I’ll do it faster!

When we flip the coin 9 times there are 2^9 possible events that can happen. Each of these is equally likely if it’s a fair coin and the flips are independent. So each has probability

\displaystyle{  \frac{1}{2^9} }

To get the answer, we need to multiply this by the number of ways we can get heads exactly 6 times. This number is called ’9 choose 6′ or

\displaystyle{ \binom{9}{6}  }

for short. It’s the number of ways we can choose 6 things out of a collection of 9.

So we just need to know: what’s 9 choose 6? We can work this out as before. There are 9 ways to pick one of the flips and draw a red H on it, then 8 ways left to pick another and draw a blue H on it, and 7 ways left to pick a third and draw a orange H on it. That sounds like 9 × 8 × 7.

But we’ve overcounted! After all, we don’t care about the colors. We don’t care about the difference between this:

(T, H, T, T, H, T, T, H, T)

and this:

(T, H, T, T, H, T, T, H, T)

In fact we’ve counted each possibility 6 times! Why six? The first H could be red, green or blue—that’s 3 choices. But then the second H could be either of the two remaining 2 colors… and for the third, we just have 1 choice. So there are 3 × 2 × 1 = 6 ways to permute the colors.

So, the actual number of ways to get 6 heads out of 9 coin flips is

\displaystyle{ \frac{9 \times 8 \times 7}{3 \times 2 \times 1} }

In other words:

\displaystyle{ \binom{9}{6} = \frac{9 \times 8 \times 7}{3 \times 2 \times 1} }

To get the answer to our actual problem, remember we need to multiply 1/2^9 by this. So the answer is

\displaystyle{ \frac{1}{2^9} \times \binom{9}{6} }

If you’re a pure mathematician, you can say you’re done now. But normal people won’t understand this answer, so let’s calculate it out. I hope you know the first ten powers of two: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024. So:

\displaystyle{ 2^9 = 512 }

I hope you can also do basic arithmetic like this:

\displaystyle{ \binom{9}{6} = \frac{9 \times 8 \times 7}{3 \times 2 \times 1} = 84}

So, the probability of getting 6 heads when you do 9 independent flips of a fair coin is

\displaystyle{ \frac{1}{2^9} \times \binom{9}{6}  = \frac{84}{512} = 0.164025 }

or 16.4025%. I broke down and used a calculator at the last step. We’re becoming serious nerds here.

Okay, that’s enough for now. We’ve been counting how many ways we can get a certain number of heads from a certain number of coin flips. What we’re realy doing is taking a set of coin flips, say n of them, and choosing a subset of k of them to be heads. So, we say

Definition. The binomial coefficient

\displaystyle{ \binom{n}{k} }

called n choose k, is the number of ways of choosing a subset of k things from a set of n things.

We have seen in some examples that

\displaystyle{ \binom{n}{k} = \frac{n(n-1)(n-2) \cdots (n-k+1)}{k(k-1)(k-2) \cdots 1} }

Here there’s a product of k consecutive numbers on top, and k on bottom too. We didn’t prove this is true in general, but it’s not hard to see, using the tricks we’ve used already.


Game Theory (Part 8)

28 January, 2013

Last time we learned some rules for calculating probabilities. But we need a few more rules to get very far.

For example:

We say a coin is fair if it has probability 1/2 of landing heads up and probability 1/2 of landing tails up. What is the probability that if we flip two fair coins, both will land heads up?

Since each coin could land heads up or tails up, there are 4 events to consider here:

(H,H), (H,T),
(T,H), (T,T)

It seems plausible that each should be equally likely. If so, each has probability 1/4. So then the answer to our question would be 1/4.

But this is plausible only because we’re assuming that what one coin does doesn’t affect that the other one does! In other words, we’re assuming the two coin flips are ‘independent’.

If the coins were connected in some sneaky way, maybe each time one landed heads up, the other would land tails up. Then the answer to our question would be zero. Of course this seems silly. But it’s good to be very clear about this issue… because sometimes one event does affect another!

For example, suppose there’s a 5% probability of rain each day in the winter in Riverside. What’s the probability that it rains two days in a row? Remember that 5% is 0.05. So, you might guess the answer is

0.05 \times 0.05 = 0.0025

But this is wrong, because if it rains one day, that increases the probability that it will rain the next day. In other words, these events aren’t independent.

But if two events are independent, there’s an easy way to figure out the probability that they both happen: just multiply their probabilities! For example, if the chance that it will rain today in Riverside is 5% and the chance that it will rain tomorrow in Singapore is 60%, the chance that both these things will happen is

0.05 \times 0.6 = 0.03

or 3%, if these events are independent. I could try to persuade that this is a good rule, and maybe I will… but for now let’s just state it in a general way.

Independence

So, let’s make a precise definition out of all this! Suppose we have two sets of events, X and Y. Remember that X \times Y, the Cartesian product of the sets X and Y, is the set of all ordered pairs (i,j) where i \in X and j \in Y:

X \times Y = \{ (i,j) : \; i \in X, j \in Y \}

So, an event in X \times Y consists of an event in X and an event in Y. For example, if

X = \{ \textrm{rain today}, \textrm{no rain today} \}

and

Y = \{ \textrm{rain tomorrow}, \textrm{no rain tomorrow} \}

then

X \times Y = \begin{array}{l} \{ \textrm{(rain today, rain tomorrow)}, \\ \textrm{(no rain today, rain tomorrow)}, \\   \textrm{(rain today, no rain tomorrow}, \\ \textrm{(no rain today, no rain tomorrow)} \} \end{array}

Now we can define ‘independence’. It’s a rule for getting a probability distribution on X \times Y from probability distributions on X and Y:

Definition. Suppose p is a probability distribution on a set of events X, and q is a probability distribution on a set of events Y. If these events are independent, we use the probability distribution r on X \times Y given by

r_{(i,j)} = p_i q_j

People often call this probability distribution p \times q instead of r.

Examples

Example 1. Suppose we have a fair coin. This means we have a set of events

X = \{H, T \}

and a probability distribution p with

\displaystyle{ p_H = p_T = \frac{1}{2} }

Now suppose we flip it twice. We get a set of four events:

X \times X = \{(H,H), (H,T), (T,H), (T,T)\}

Suppose the two coin flips are independent. Then we describe the pair of coin flips using the probability measure r = p \times p on X \times X, with

\displaystyle{ r_{(H,H)} = p_H p_H = \frac{1}{4} }

\displaystyle{ r_{(H,T)} = p_H p_T = \frac{1}{4} }

\displaystyle{ r_{(T,H)} = p_T p_H = \frac{1}{4} }

\displaystyle{ r_{(T,T)} = p_T p_T = \frac{1}{4} }

So, each of the four events—”heads, heads” and so on—has probability 1/4. This is fairly boring: you should have known this already!

But now we can do a harder example:

Example 2. Suppose we have an unfair coin that has a 60% chance of landing heads up and a 40% chance of landing tails up. Now we have a new probability distribution on X, say q:

\displaystyle{ q_H = .6, \quad q_T = .4 }

Now say we flip this coin twice. What are the probabilities of the four different events that can happen? Let’s assume the two coin flips are independent. This means we should describe the pair of coin flips with a probability measure s = q \times q on X \times X. This tells us the answer to our question. We can work it out:

\displaystyle{ s_{(H,H)} = q_H q_H = 0.6 \times 0.6 = 0.36 }

\displaystyle{ s_{(H,T)} = q_H q_T = 0.6 \times 0.4 = 0.24 }

\displaystyle{ s_{(T,H)} = q_T q_H = 0.4 \times 0.6 = 0.24 }

\displaystyle{ s_{(T,T)} = q_T q_T = 0.4 \times 0.4 = 0.16 }

Puzzle 1. In this situation what is the probability that when we flip the coin twice it comes up heads exactly once?

Puzzle 2. In this situation what is the probability that when we flip the coin twice it comes up heads at least once?

For these puzzles you need to use what I told you in the section on ‘Probabilities of subsets’ near the end of Part 7.

Puzzle 3. Now suppose we have one fair coin and one coin that has a 60% chance of landing heads up. The first one is described by the probability distribution p, while the second is described by q. How likely is it that the first lands heads up and the second lands tails up? We can answer questions like this if the coin flips are independent. We do this by multiplying p and q to get a probability measure t = p \times q on X \times X. Remember the rule for how to do this:

t_{(i,j)} = p_i q_j

where each of i and j can be either H or T.

What are these probabilities:

\displaystyle{ t_{(H,H)} = ? }

\displaystyle{ t_{(H,T)} = ? }

\displaystyle{ t_{(T,H)} = ? }

\displaystyle{ t_{(T,T)} = ? }

Puzzle 4. In this situation what is the probability that exactly one coin lands heads up?

Puzzle 5. In this situation what is the probability that at least one coin lands heads up?

Next time we’ll go a lot further…


Game Theory (Part 7)

26 January, 2013

We need to learn a little probability theory to go further in our work on game theory.

We’ll start with some finite set X of ‘events’. The idea is that these are things that can happen—for example, choices you could make while playing a game. A ‘probability distribution’ on this set assigns to each event a number called a ‘probability’—which says, roughly speaking, how likely that event is. If we’ve got some event i, we’ll call its probability p_i.

For example, suppose we’re interested in whether it will rain today or not. Then we might look at a set of two events:

X = \{\textrm{rain}, \textrm{no rain} \}

If the weatherman says the chance of rain is 20%, then

p_{\textrm{rain} } = 0.2

since 20% is just a fancy way of saying 0.2. The chance of no rain will then be 80%, or 0.8, since the probabilities should add up to 1:

p_{\textrm{no rain}} = 0.8

Let’s make this precise with an official definition:

Definition. Given a finite set X of events, a probability distribution p assigns a real number p_i called a probability to each event i \in X, such that:

1) 0 \le p_i \le 1

and

2) \displaystyle{ \sum_{i \in X} p_i = 1}

Note that this official definition doesn’t say what an event really is, and it doesn’t say what probabilities really mean. But that’s how it should be! As usual with math definitions, the words in boldface could be replaced by any other words and the definition would still do its main job, which is to let us prove theorems involving these words. If we wanted, we could call an event a doohickey, and call a probability a schnoofus. All our theorems would still be true.

Of course we hope our theorems will be useful in real world applications. And in these applications, the probabilities p_i will be some way of measuring ‘how likely’ events are. But it’s actually quite hard to say precisely what probabilities really mean! People have been arguing about this for centuries. So it’s good that we separate this hard task from our definition above, which is quite simple and 100% precise.

Why is it hard to say what probabilities really are? Well, what does it mean to say “the probability of rain is 20%”? Suppose you see a weather report and read this. What does it mean?

A student suggests: “it means that if you looked at a lot of similar days, it would rain on 20% of them.”

Yes, that’s pretty good. But what counts as a “similar day”? How similar does it have to be? Does everyone have to wear the same clothes? No, that probably doesn’t matter, because presumably doesn’t affect the weather. But what does affect the weather? A lot of things! Do all those things have to be exactly the same for it count as similar day.

And what counts as a “lot” of days? How many do we need?

And it won’t rain on exactly 20% of those days. How close do we need to get?

Imagine I have a coin and I claim it lands heads up 50% of the time. Say I flip it 10 times and it lands heads up every time. Does that mean I was wrong? Not necessarily. It’s possible that the coin will do this. It’s just not very probable.

But look: now we’re using the word ‘probable’, which is the word we’re trying to understand! It’s getting sort of circular: we’re saying a coin has a 50% probability of landing heads up if when you flip it a lot of times, it probably lands head up close to 50% of the time. That’s not very helpful if you don’t already have some idea what ‘probability’ means.

For all these reasons, and many more, it’s tricky to say exactly what probabilities really mean. People have made a lot of progress on this question, but we will sidestep it and focus on learning to calculate with probabilities.

If you want to dig in a bit deeper, try this:

Probability interpretations, Wikipedia.

Equally likely events

As I’ve tried to convince you, it can be hard to figure out the probabilities of events. But it’s easy if we assume all the events are equally likely.

Suppose we have a set X consisting of n events. And suppose that all the probabilities p_i are equal: say for some constant c we have

p_i = c

for all i \in X. Then by rule 1) above,

\displaystyle{ 1 = \sum_{i \in X} p_i = \sum_{i \in X} c = n c }

since we’re just adding the number c to itself n times. So,

\displaystyle{  c = \frac{1}{n} }

and thus

\displaystyle{ p_i = \frac{1}{n} }

for all i \in X.

I made this look harder than it really is. I was just trying to show you that it follows from the definitions, not any intuition. But it’s obvious: if you have n events that are equally likely, each one has probability 1/n.

Example 1. Suppose we have a coin that can land either heads up or tails up—let’s ignore the possibility that it lands on its edge! Then

X = \{ H, T\}

If we assume these two events are equally probable, we must have

\displaystyle{ p_H = p_T =  \frac{1}{2} }

Note I said “if we assume” these two events are equally probable. I didn’t say they actually are! Are they? Suppose we take a penny and flip it a zillion times. Will it land heads up almost exactly half a zillion times?

Probably not! The treasury isn’t interested in making pennies that do this. They’re interested in making the head look like Lincoln, and the tail look like the Lincoln national monument:

Or at least they used to. Since the two sides are different, there’s no reason they should have the exact same probability of landing on top.

In fact nobody seems to have measured the difference between heads and tails in probabilities for flipping pennies. For hand-flipped pennies, it seems whatever side that starts on top has a roughly 51% chance of landing on top! But if you spin a penny, it’s much more likely to land tails up:

The coin flip: a fundamentally unfair proposition?, Coding the Wheel.

Example 2. Suppose we have a standard deck of cards, well-shuffled, and assume that when I draw a card from this deck, each card is equally likely to be chosen. What is the probability that I draw the ace of spades?

If there’s no joker in the deck, there are 52 cards, so the answer is 1/52.

Let me remind you how a deck of cards works: I wouldn’t want someone to fail the course because they didn’t ever play cards! Here are the 52 cards in a standard deck. Here’s what they all look like (click to enlarge):

As you can see, they come in 4 kinds, called suits. The suits are:

• clubs: ♣

• spades: ♠

diamonds: ♦

hearts: ♥

Two suits are black and two are red. Each suit has 13 cards in it, for a total of 4 × 13 = 52. The cards in each suit are numbered from 1 to 13, except for four exceptions. They go like this:

A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K

A stands for ‘ace’, J for ‘jack’, Q for ‘queen’ and K for ‘king’.

Probabilities of subsets

If we know a probability distribution on a finite set X, we can define the probability that an event in some subset S \subseteq X will occur. We define this to be

\displaystyle{p(S) = \sum_{i \in S} p_i }

For example, I usually have one of three things for breakfast:

X = \{ \textrm{oatmeal}, \textrm{waffles}, \textrm{eggs} \}

I have an 86% chance of eating oatmeal for breakfast, a 10% chance of eating waffles, and a 4% chance of eating eggs and toast. What’s the probability that I will eat oatmeal or waffles? These choices form the subset

S = \{ \textrm{oatmeal}, \textrm{waffles} \}

and the probability for this subset is

p(S) = p_{\textrm{oatmeal}} + p_{\textrm{waffles}} = 0.86 + 0.1 = 0.96

Here’s an example from cards:

Example 2. Suppose we have a standard deck of cards, well-shuffled, and assume that when I draw a card from this deck, each card is equally likely to be chosen. What is the probability that I draw a card in the suit of hearts?

Since there are 13 cards in the suit of hearts, each with probability 1/52, we add up their probabilities and get

\displaystyle{ 13 \times \frac{1}{52} = \frac{1}{4} }

This should make sense, since there are 4 suits, and as many cards in each suit.

Card tricks

This is just a fun digression. The deck of cards involves some weird numerology. For starters, it has 52 cards. That’s a strange number! Where else have you seen this number?

A student says: “It’s the number of weeks in a year.”

Right! And these 52 cards are grouped in 4 suits. What does the year have 4 of?

A student says: “Seasons!”

Right! And we have 52 = 4 × 13. So what are there 13 of?

A student says: “Weeks in a season!”

Right! I have no idea if this is a coincidence or not. And have you ever added up the values of all the cards in a suit, where we count the ace as 1, and so on? We get

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13

And what’s that equal to?

After a long pause, a student says “91.”

Yes, that’s a really strange number. But let’s say we total up the values of all the cards in the deck, not just one suit. What do we get?

A student says “We get 4 × 91… or 364.”

Right. Three-hundred and sixty-four. Almost the number of days in year.

“So add one more: the joker! Then you get 365!”

Right, maybe that’s why they put an extra card called the joker in the deck:

One extra card for one extra day, joker-day… April Fool’s Day! That brings the total up to 365.

Again, I have no idea if this is a coincidence or not. But the people who invented the Tarot deck were pretty weird—they packed it with symbolism—so maybe the ordinary cards were designed this way on purpose too.

Puzzle. What are the prime factors of the number 91? You should know by now… and you should know what they have to do with the calendar!


Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH- + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH-. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3-

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3-. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]

So:

\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }

Conclusion

We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.


Game Theory for Undergraduates

14 December, 2012

Starting in January I’m teaching an introduction to game theory to students who have taken a year of calculus, a quarter of multivariable calculus, but in general nothing else. The syllabus says this course:

Covers two-person zero-sum games, minimax theorem, and relation to linear programming. Includes nonzero-sum games, Nash equilibrium theorem, bargaining, the core, and the Shapley value. Addresses economic market games.

However, I can do what I want, and I’d like to include some evolutionary game theory. Right now I’m rounding up resources to help me teach this course.

Here are three books that are not really suitable as texts for a course of this sort, but useful nonetheless:

• Andrew M. Colman, Game Theory and its Applications in the Social and Biological Sciences, Routledge, London, 1995.

This describes 2-person and multi-person zero-sum and nonzero-sum games, concepts like ‘Nash equilibrium’, ‘core’ and ‘Shapley value’, their applications, and—especially refreshing—empirical evidence comparing the theory of ideal rational players to what actual organisms (including people) do in the real world.

• J. D. Williams, The Compleat Strategyst, McGraw–Hill, New York, 1966.

This old-fashioned book is chatty and personable. It features tons of zero-sum games described by 2×2, 3×3 and 4×4 matrices, analyzed in loving detail. It’s very limited in scope, but a good supply of examples.

• Lynne Pepall, Dan Richards and George Norman, Industrial Organization: Contemporary Theory and Empirical Applications, Blackwell, Oxford, 2008.

This is a book about industrial organizations, antitrust law, monopolies and oligopolies. But it uses a hefty portion of game theory, especially in the chapters on ‘Static games and Cournot competition’, ‘Price competition’, and ‘Dynamic games and first and second movers’. So, I think I can squeeze some nice examples and ideas out of this book to use in my course.

Over on Google+

I also got a lot of help from a discussion on Google+. (If ever it seems a bit too quiet here, visit me over there!)

I want the students to play games in class, and Lee Worden had some great advice on how to do that effectively:

For actually playing in class, I like the black-card, red-card system:

• Charles A. Holt and Monica Capra, Classroom games: a prisoner’s dilemma.

Students can keep their cards at their seats and use them for a whole series of 2-person or n-person games.

I like the double entendre in the title of Holt and Capra’s paper! I also like their suggestion of letting students play for small amounts of money: this would grab their attention and also make it easier to explain what their objective should be.

I’m also considering letting them play for points that improve their grade. But this might be controversial! Maybe if it only has a small effect on their grade?

(Students often ask, when they do badly in a course, what they can do to improve their grade. Usually I just say “learn the material and get good at solving problems!” But now I could say “let’s play a game. If you win, I’ll add 5 points to your class score. If you lose….”)

Vincent Knight gave a nice long reply:

I teach game theory in our MSc program and can suggest two “games” that can be played in class:

• Your comic suggests it already, the 2/3rds of the average game. I use that in class and play it twice, once before rationalising it and once after. In the meantime I get TAs to put the results in to a google spreadsheet and show the distribution of guesses to the students. The immediate question is: “what would happen if we played again and again”. This brings up ideas of convergence to equilibria.

• The second game I play with students is an Iterated Prisoner’s dilemma. I separate the whole class (40 students) in to 4 teams and play a round robin tournament of 5 rounds. Specifying that the goal is to minimise total “years in prison” (and not the number of duels won). This often throws up a coalition or two at the end which is quite cool.

I don’t only use the above on our MSc students but also at outreach events and I’ve written a series of blog posts about it:

• School kids: http://goo.gl/5u6Ic

• PhD students: http://goo.gl/6rkOt

• Conference delegates: http://goo.gl/JGWM7

• MSc students: http://goo.gl/oHoz0

The slides I use for the outreach event are available here: http://goo.gl/vJVWV. They include some cool videos (that have certainly made the rounds). I use some of that in the class itself.

I’m also in the middle of a teaching certification process called pcutl and my first module portfolio is available here: http://goo.gl/NhJYg. There’s a lot more stuff then you might care about in there but towards the end is a lesson plan as well as a reflection about how the session went with the students. There are some pics of the session (with the students up and playing the game) here: http://goo.gl/wBZwC.

The notes that I use are in the above portfolio but here is my page on game theory which contains the notes I use on the MSc course (which only has the time to go in to normal form games) and also some videos and Sage Mathematical Software System code: http://goo.gl/RXr1k.

Here are 3 videos I put together that I get my students to watch:

• Normal form games and mixed equilibria: http://goo.gl/dBtDK

• Routing games (Pigou’s example): http://goo.gl/807G4

• Cooperative games (Shapley Value): http://goo.gl/Pzf1F

Finally (I really do apologise for the length of this comment), here are some books I recommend:

• Webb’s Game Theory (in my opinion written for mathematicians): http://goo.gl/2M83l

• Osborne’s Introduction to Game Theory (a very nice and easy to read text): http://goo.gl/FXbcd

• Rosenthal’s A Complete Idiot’s Guide to Game Theory (this is more of a bedside read, that could serve as an introduction to game theory for a non mathematician): http://goo.gl/PCs76

I’m actually going to be writing a new game theory course for final year undergraduates next year and will be sharing any resources I put together for that if it’s of interest to anybody :)

And here are some other suggestions I got:

• Peter Morris, Introduction to Game Theory, Springer, Berlin, 1994.

Over on Google+, Joerg Fliege said this “is an excellent book for undergraduate students to start with. I used it myself a couple of years ago for a course in game theory. It is a bit outdated, though, and does not cover repeat games to any depth.”

• K. G. Binmore, Playing for Real: a Text on Game Theory, Oxford U. Press, Oxford, 2007.

Benjamin McKay said: “It has almost no prerequisites, but gets into some serious stuff. I taught game theory once from my own lecture notes, but then I found Binmore’s book and I wish I had used it instead.” A summary says:

This new book is a replacement for Binmore’s previous game theory textbook, Fun and Games. It is a lighthearted introduction to game theory suitable for advanced undergraduate students or beginning graduate students. It aims to answer three questions: What is game theory? How is game theory applied? Why is game theory right? It is the only book that tackles all three questions seriously without getting heavily mathematical.

• Herbert Gintis, Game Theory Evolving: a Problem-Centered Introduction to Modeling Strategic Behavior, Princeton U. Press, Princeton, 2000.

A summary says this book

exposes students to the techniques and applications of game theory through a problems involving human (and even animal) behaviour. This book shows students how to apply game theory to model how people behave in ways that reflect the nature of human sociality and individuality.

Finally, this one is mostly too advanced for my course, but it’s 750 pages and it’s free.

• Noam Nisan, Tim Roughgarde, Eva Tardos and Vijay V. Vazirani, editors, Algorithmic Game Theory, Cambridge U. Press, Cambridge, 2007.

It’s about:

• algorithms for computing equilibria in games and markets,

auction algorithms,

mechanism design (also known
as ‘reverse game theory’, this is the art of designing a game that coaxes the players into becoming good at doing something you want),

the price of anarchy: how the efficiency of a system degrades due to selfish behavior of its agents.

Over on Google+, Adam Smith said:

One suggestion is to get some mechanism design into the course (auctions, VCG, …) and from there into matching. Reasons to do this:

1) Teaching the stable marriage theorem is very fun.

2) This year’s Nobel prize in economics went to two game theorists for their work on matchings and markets.

3) Interesting auctions are everywhere—on Ebay, Google’s ad auctioning system, spectrum distribution, …


Mathematics of the Environment (Part 7)

13 November, 2012

Last time we saw how the ice albedo effect could, in theory, make the Earth’s climate have two stable states. In a very simple model, we saw that a hot Earth might stay hot since it’s dark and absorbs lots of sunlight, while a cold Earth might stay cold—since it’s icy and white and reflects most of the sunlight.

If you haven’t tried it yet, make sure to play around with this program pioneered by Lesley de Cruz and then implemented in Java by Allan Erskine:

Temperature dynamics.

The explanations were all given in Part 6 so I won’t repeat them here!

This week, we’ll see how noise affects this simple climate model. We’ll borrow lots of material from here:

Glyn Adgie and Tim van Beek, Increasing the signal-to-noise ratio with more noise.

And we’ll use software written by these guys together with Allan Erskine and Jim Stuttard. The power of the Azimuth Project knows no bounds!

Stochastic differential equations

The Milankovich cycles are periodic changes in how the Earth orbits the Sun. One question is: can these changes can be responsible for the ice ages? On the first sight it seems impossible, because the changes are simply too small. But it turns out that we can find a dynamical system where a small periodic external force is actually strengthened by random ‘noise’ in the system. This phenomenon has been dubbed ‘stochastic resonance’ and has been proposed as an explanation for the ice ages. It also shows up in many other phenomena:

• Roberto Benzi, Stochastic resonance: from climate to biology.

But to understand it, we need to think a little about stochastic differential equations.

A lot of systems can be described by ordinary differential equations:

\displaystyle{ \frac{d x}{d t} = f(x, t)}

If f is nice, the time evolution of the system will be a nice smooth function x(t), like the trajectory of a thrown stone. But there are situations where we have some kind of noise, a chaotic, fluctuating influence, that we would like to take into account. This could be, for example, turbulence in the air flow around a rocket. Or, in our case, short term fluctuations of the weather of the earth. If we take this into account, we get an equation of the form

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

where the w is a ‘random function’ which models the noise. Typically this noise is just a simplified way to take into account rapidly changing fine-grained aspects of the system at hand. This way we do not have to explicitly model these aspects, which is often impossible.

White noise

We’ll look at a model of this sort:

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

where w(t) is ‘white noise’. But what’s that?

Very naively speaking, white noise is a random function that typically looks very wild, like this:

White noise actually has a sound, too: it sounds like this! The idea is that you can take a random function like the one graphed above, and use it to drive the speakers of your computer, to produce sound waves of an equally wild sort. And it sounds like static.

However, all this is naive. Why? The concept of a ‘random function’ is not terribly hard to define, at least if you’ve taken the usual year-long course on real analysis that we force on our grad students at U.C. Riverside: it’s a probability measure on some space of functions. But white noise is a bit too spiky and singular too be a random function: it’s a random distribution.

Distributions were envisioned by Dirac but formalized later by Laurent Schwarz and others. A distribution D(t) is a bit like a function, but often distributions are too nasty to have well-defined values at points! Instead, all that makes sense are expressions like this:

\displaystyle{ \int_\infty^\infty D(t) f(t) \, d t}

where we multiply the distribution by any compactly supported smooth function f(t) and then integrate it. Indeed, we specify a distribution just by saying what all these integrals equal. For example, the Dirac delta \delta(t) is a distribution defined by

\displaystyle{ \int_\infty^\infty \delta(t) f(t) \, d t = f(0) }

If you try to imagine the Dirac delta as a function, you run into a paradox: it should be zero everywhere except at t = 0, but its integral should equal 1. So, if you try to graph it, the region under the graph should be an infinitely skinny infinitely tall spike whose area is 1:

But that’s impossible, at least in the standard framework of mathematics—so such a function does not really exist!

Similarly, white noise w(t) is too spiky to be an honest random function, but if we multiply it by any compactly supported smooth function f(t) and integrate, we get a random variable

\displaystyle{ \int_\infty^\infty w(t) f(t) \, d t }

whose probability distribution is a Gaussian with mean zero and standard deviation equal to

\displaystyle{ \sqrt{ \int_\infty^\infty f(t)^2 \, d t} }

(Note: the word ‘distribution’ has a completely different meaning when it shows up in the phrase probability distribution! I’m assuming you’re comfortable with that meaning already.)

Indeed, the above formulas make sense and are true, not just when f is compactly supported and smooth, but whenever

\displaystyle{ \sqrt{ \int_\infty^\infty f(t)^2 \, d t} } < \infty

If you know about Gaussians and you know about this sort of integral, which shows up all over math and physics, you’ll realize that white noise is an extremely natural concept!

Brownian motion

While white noise is not a random function, its integral

\displaystyle{ W(s) = \int_0^s w(s) \, ds }

turns out to be a well-defined random function. It has a lot of names, including: the Wiener process, Brownian motion, red noise and—as a kind of off-color joke—brown noise.

The capital W here stands for Wiener: that is, Norbert Wiener, the famously absent-minded MIT professor who studied this random function… and also invented cybernetics:

Brownian noise sounds like this.

Puzzle: How does it sound different from white noise, and why?

It looks like this:

Here we are zooming in on closer and closer, while rescaling the vertical axis as well. We see that Brownian noise is self-similar: if W(t) is Brownian noise, so is W(c t)/\sqrt{c} for all c > 0.

More importantly for us, Brownian noise is the solution of this differential equation:

\displaystyle{ \frac{d W}{d t} = w(t) }

where w(t) is white noise. This is essentially true by definition, but making it rigorous takes some work. More fancy stochastic differential equations

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

take even more work to rigorously formulate and solve. You can read about them here:

Stochastic differential equation, Azimuth Library.

It’s actually much easier to explain the difference equations we use to approximately solve these stochastic differential equation on the computer. Suppose we discretize time into steps like this:

t_i = i \Delta t

where \Delta t > 0 is some fixed number, our ‘time step’. Then we can define

x(t_{i+1}) = x(t_i) + f(x(t_i), t_i) \; \Delta t + w_i

where w_i are independent Gaussian random variables with mean zero and standard deviation

\sqrt{ \Delta t}

The square root in this formula comes from the definition I gave of white noise.

If we use a random number generator to crank out random numbers w_i distributed in this way, we can write a program to work out the numbers x(t_i) if we are given some initial value x(0). And if the time step \Delta t is small enough, we can hope to get a ‘good approximation to the true solution’. Of course defining what we mean by a ‘good approximation’ is tricky here… but I think it’s more important to just plunge in and see what happens, to get a feel for what’s going on here.

Stochastic resonance

Let’s do an example of this equation:

\displaystyle{ \frac{d x}{d t} = f(x, t) + w(t) }

which exhibits ‘stochastic resonance’. Namely:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t)  + \sqrt{2D} w(t) }

Here A and D are constants we get to choose. If we set them both to zero we get:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 }

This has stable equilibrium solutions at x = \pm 1 and an unstable equilibrium in between at x = 0. So, this is a bistable model similar to the one we’ve been studying, but mathematically simpler!

Then we can add an oscillating time-dependent term:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t) }

which wiggles the system back and forth. This can make it jump from one equilibrium to another.

And then we can add on noise:

\displaystyle{ \frac{d x}{d t} = x(t) - x(t)^3 + A \;  \sin(t)  + \sqrt{2D} w(t) }

Let’s see what the solutions look like!

In the following graphs, the green curve is A \sin(t),, while the red curve is x(t). Here is a simulation with a low level of noise:

low noise level

As you can see, within the time of the simulation there is no transition from the stable state at 1 to the one at -1. If we were doing a climate model, this would be like the Earth staying in the warm state.

Here is a simulation with a high noise level:

high noise level

The solution x(t) jumps around wildly. By inspecting the graph with your eyes only, you don’t see any pattern in it, do you?

But finally, here is a simulation where the noise level is not too small, and not too big:

high noise level

Here we see the noise helping the solution x(t) hops from around x = -1 to x = 1, or vice versa. The solution x(t) is not at all ‘periodic’—it’s quite random. But still, it tends to hop back and forth thanks to the combination of the sinusoidal term A \sin(t) and the noise.

Glyn Adgie, Allan Erskine, Jim Stuttard and Tim van Beek have created an online model that lets you solve this stochastic differential equation for different values of A and D. You can try it here:

Stochastic resonance example.

You can change the values using the sliders under the graphic and see what happens. You can also choose different ‘random seeds’, which means that the random numbers used in the simulation will be different.

To read more about stochastic resonance, go here:

Stochastic resonance, Azimuth Library.

In future weeks I hope to say more about the actual evidence that stochastic resonance plays a role in our glacial cycles! It would also be great to go back to our climate model from last time and add noise. We’re working on that.


The Mathematical Origin of Irreversibility

8 October, 2012

guest post by Matteo Smerlak

Introduction

Thermodynamical dissipation and adaptive evolution are two faces of the same Markovian coin!

Consider this. The Second Law of Thermodynamics states that the entropy of an isolated thermodynamic system can never decrease; Landauer’s principle maintains that the erasure of information inevitably causes dissipation; Fisher’s fundamental theorem of natural selection asserts that any fitness difference within a population leads to adaptation in an evolution process governed by natural selection. Diverse as they are, these statements have two common characteristics:

1. they express the irreversibility of certain natural phenomena, and

2. the dynamical processes underlying these phenomena involve an element of randomness.

Doesn’t this suggest to you the following question: Could it be that thermal phenomena, forgetful information processing and adaptive evolution are governed by the same stochastic mechanism?

The answer is—yes! The key to this rather profound connection resides in a universal property of Markov processes discovered recently in the context of non-equilibrium statistical mechanics, and known as the ‘fluctuation theorem’. Typically stated in terms of ‘dissipated work’ or ‘entropy production’, this result can be seen as an extension of the Second Law of Thermodynamics to small systems, where thermal fluctuations cannot be neglected. But it is actually much more than this: it is the mathematical underpinning of irreversibility itself, be it thermodynamical, evolutionary, or else. To make this point clear, let me start by giving a general formulation of the fluctuation theorem that makes no reference to physics concepts such as ‘heat’ or ‘work’.

The mathematical fact

Consider a system randomly jumping between states a, b,\dots with (possibly time-dependent) transition rates \gamma_{a b}(t) where a is the state prior to the jump, while b is the state after the jump. I’ll assume that this dynamics defines a (continuous-time) Markov process, namely that the numbers \gamma_{a b} are the matrix entries of an infinitesimal stochastic matrix, which means that its off-diagonal entries are non-negative and that its columns sum up to zero.

Now, each possible history \omega=(\omega_t)_{0\leq t\leq T} of this process can be characterized by the sequence of occupied states a_{j} and by the times \tau_{j} at which the transitions a_{j-1}\longrightarrow a_{j} occur (0\leq j\leq N):

\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T}).

Define the skewness \sigma_{j}(\tau_{j}) of each of these transitions to be the logarithmic ratio of transition rates:

\displaystyle{\sigma_{j}(\tau_{j}):=\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}

Also define the self-information of the system in state a at time t by:

i_a(t):= -\ln\pi_{a}(t)

where \pi_{a}(t) is the probability that the system is in state a at time t, given some prescribed initial distribution \pi_{a}(0). This quantity is also sometimes called the surprisal, as it measures the ‘surprise’ of finding out that the system is in state a at time t.

Then the following identity—the detailed fluctuation theorem—holds:

\mathrm{Prob}[\Delta i-\Sigma=-A] = e^{-A}\;\mathrm{Prob}[\Delta i-\Sigma=A]

where

\displaystyle{\Sigma:=\sum_{j}\sigma_{j}(\tau_{j})}

is the cumulative skewness along a trajectory of the system, and

\Delta i= i_{a_N}(T)-i_{a_0}(0)

is the variation of self-information between the end points of this trajectory.

This identity has an immediate consequence: if \langle\,\cdot\,\rangle denotes the average over all realizations of the process, then we have the integral fluctuation theorem:

\langle e^{-\Delta i+\Sigma}\rangle=1,

which, by the convexity of the exponential and Jensen’s inequality, implies:

\langle \Delta i\rangle=\Delta S\geq\langle\Sigma\rangle.

In short: the mean variation of self-information, aka the variation of Shannon entropy

\displaystyle{ S(t):= \sum_{a}\pi_{a}(t)i_a(t) }

is bounded from below by the mean cumulative skewness of the underlying stochastic trajectory.

This is the fundamental mathematical fact underlying irreversibility. To unravel its physical and biological consequences, it suffices to consider the origin and interpretation of the ‘skewness’ term in different contexts. (By the way, people usually call \Sigma the ‘entropy production’ or ‘dissipation function’—but how tautological is that?)

The physical and biological consequences

Consider first the standard stochastic-thermodynamic scenario where a physical system is kept in contact with a thermal reservoir at inverse temperature \beta and undergoes thermally induced transitions between states a, b,\dots. By virtue of the detailed balance condition:

\displaystyle{ e^{-\beta E_{a}(t)}\gamma_{a b}(t)=e^{-\beta E_{b}(t)}\gamma_{b a}(t),}

the skewness \sigma_{j}(\tau_{j}) of each such transition is \beta times the energy difference between the states a_{j} and a_{j-1}, namely the heat received from the reservoir during the transition. Hence, the mean cumulative skewness \langle \Sigma\rangle is nothing but \beta\langle Q\rangle, with Q the total heat received by the system along the process. It follows from the detailed fluctuation theorem that

\langle e^{-\Delta i+\beta Q}\rangle=1

and therefore

\Delta S\geq\beta\langle Q\rangle

which is of course Clausius’ inequality. In a computational context where the control parameter is the entropy variation itself (such as in a bit-erasure protocol, where \Delta S=-\ln 2), this inequality in turn expresses Landauer’s principle: it impossible to decrease the self-information of the system’s state without dissipating a minimal amount of heat into the environment (in this case -Q \geq k T\ln2, the ‘Landauer bound’). More general situations (several types of reservoirs, Maxwell-demon-like feedback controls) can be treated along the same lines, and the various forms of the Second Law derived from the detailed fluctuation theorem.

Now, many would agree that evolutionary dynamics is a wholly different business from thermodynamics; in particular, notions such as ‘heat’ or ‘temperature’ are clearly irrelevant to Darwinian evolution. However, the stochastic framework of Markov processes is relevant to describe the genetic evolution of a population, and this fact alone has important consequences. As a simple example, consider the time evolution of mutant fixations x_{a} in a population, with a ranging over the possible genotypes. In a ‘symmetric mutation scheme’, which I understand is biological parlance for ‘reversible Markov process’, meaning one that obeys detailed balance, the ratio between the a\mapsto b and b\mapsto a transition rates is completely determined by the fitnesses f_{a} and f_b of a and b, according to

\displaystyle{\frac{\gamma_{a b}}{\gamma_{b a}} =\left(\frac{f_{b}}{f_{a}}\right)^{\nu} }

where \nu is a model-dependent function of the effective population size [Sella2005]. Along a given history of mutant fixations, the cumulated skewness \Sigma is therefore given by minus the fitness flux:

\displaystyle{\Phi=\nu\sum_{j}(\ln f_{a_j}-\ln f_{a_{j-1}}).}

The integral fluctuation theorem then becomes the fitness flux theorem:

\displaystyle{ \langle e^{-\Delta i -\Phi}\rangle=1}

discussed recently by Mustonen and Lässig [Mustonen2010] and implying Fisher’s fundamental theorem of natural selection as a special case. (Incidentally, the ‘fitness flux theorem’ derived in this reference is more general than this; for instance, it does not rely on the ‘symmetric mutation scheme’ assumption above.) The ensuing inequality

\langle \Phi\rangle\geq-\Delta S

shows that a positive fitness flux is “an almost universal evolutionary principle of biological systems” [Mustonen2010], with negative contributions limited to time intervals with a systematic loss of adaptation (\Delta S > 0). This statement may well be the closest thing to a version of the Second Law of Thermodynamics applying to evolutionary dynamics.

It is really quite remarkable that thermodynamical dissipation and Darwinian evolution can be reduced to the same stochastic mechanism, and that notions such as ‘fitness flux’ and ‘heat’ can arise as two faces of the same mathematical coin, namely the ‘skewness’ of Markovian transitions. After all, the phenomenon of life is in itself a direct challenge to thermodynamics, isn’t it? When thermal phenomena tend to increase the world’s disorder, life strives to bring about and maintain exquisitely fine spatial and chemical structures—which is why Schrödinger famously proposed to define life as negative entropy. Could there be a more striking confirmation of his intuition—and a reconciliation of evolution and thermodynamics in the same go—than the fundamental inequality of adaptive evolution \langle\Phi\rangle\geq-\Delta S?

Surely the detailed fluctuation theorem for Markov processes has other applications, pertaining neither to thermodynamics nor adaptive evolution. Can you think of any?

Proof of the fluctuation theorem

I am a physicist, but knowing that many readers of John’s blog are mathematicians, I’ll do my best to frame—and prove—the FT as an actual theorem.

Let (\Omega,\mathcal{T},p) be a probability space and (\,\cdot\,)^{\dagger}=\Omega\to \Omega a measurable involution of \Omega. Denote p^{\dagger} the pushforward probability measure through this involution, and

\displaystyle{ R=\ln \frac{d p}{d p^\dagger} }

the logarithm of the corresponding Radon-Nikodym derivative (we assume p^\dagger and p are mutually absolutely continuous). Then the following lemmas are true, with (1)\Rightarrow(2)\Rightarrow(3):

Lemma 1. The detailed fluctuation relation:

\forall A\in\mathbb{R} \quad  p\big(R^{-1}(-A) \big)=e^{-A}p \big(R^{-1}(A) \big)

Lemma 2. The integral fluctuation relation:

\displaystyle{\int_{\Omega} d p(\omega)\,e^{-R(\omega)}=1 }

Lemma 3. The positivity of the Kullback-Leibler divergence:

D(p\,\Vert\, p^{\dagger}):=\int_{\Omega} d p(\omega)\,R(\omega)\geq 0.

These are basic facts which anyone can show: (2)\Rightarrow(3) by Jensen’s inequality, (1)\Rightarrow(2) trivially, and (1) follows from R(\omega^{\dagger})=-R(\omega) and the change of variables theorem, as follows,

\begin{array}{ccl} \displaystyle{ \int_{R^{-1}(-A)} d p(\omega)} &=& \displaystyle{ \int_{R^{-1}(A)}d p^{\dagger}(\omega) } \\ \\ &=& \displaystyle{ \int_{R^{-1}(A)} d p(\omega)\, e^{-R(\omega)} } \\ \\ &=& \displaystyle{ e^{-A} \int_{R^{-1}(A)} d p(\omega)} .\end{array}

But here is the beauty: if

(\Omega,\mathcal{T},p) is actually a Markov process defined over some time interval [0,T] and valued in some (say discrete) state space \Sigma, with the instantaneous probability \pi_{a}(t)=p\big(\{\omega_{t}=a\} \big) of each state a\in\Sigma satisfying the master equation (aka Kolmogorov equation)

\displaystyle{ \frac{d\pi_{a}(t)}{dt}=\sum_{b\neq a}\Big(\gamma_{b a}(t)\pi_{a}(t)-\gamma_{a b}(t)\pi_{b}(t)\Big),}

and

• the dagger involution is time-reversal, that is \omega^{\dagger}_{t}:=\omega_{T-t},

then for a given path

\displaystyle{\omega=(\omega_{0}=a_{0}\overset{\tau_{0}}{\longrightarrow} a_{1} \overset{\tau_{1}}{\longrightarrow}\cdots \overset{\tau_{N}}{\longrightarrow} a_{N}=\omega_{T})\in\Omega}

the logarithmic ratio R(\omega) decomposes into ‘variation of self-information’ and ‘cumulative skewness’ along \omega:

\displaystyle{ R(\omega)=\underbrace{\Big(\ln\pi_{a_0}(0)-\ln\pi_{a_N}(T) \Big)}_{\Delta i(\omega)}-\underbrace{\sum_{j=1}^{N}\ln\frac{\gamma_{a_{j}a_{j-1}}(\tau_{j})}{\gamma_{a_{j-1}a_{j}}(\tau_{j})}}_{\Sigma(\omega)}.}

This is easy to see if one writes the probability of a path explicitly as

\displaystyle{p(\omega)=\pi_{a_{0}}(0)\left[\prod_{j=1}^{N}\phi_{a_{j-1}}(\tau_{j-1},\tau_{j})\gamma_{a_{j-1}a_{j}}(\tau_{j})\right]\phi_{a_{N}}(\tau_{N},T)}

where

\displaystyle{ \phi_{a}(\tau,\tau')=\phi_{a}(\tau',\tau)=\exp\Big(-\sum_{b\neq a}\int_{\tau}^{\tau'}dt\, \gamma_{a b}(t)\Big)}

is the probability that the process remains in the state a between the times \tau and \tau'. It follows from the above lemma that

Theorem. Let (\Omega,\mathcal{T},p) be a Markov process and let i,\Sigma:\Omega\rightarrow \mathbb{R} be defined as above. Then we have

1. The detailed fluctuation theorem:

\forall A\in\mathbb{R}, p\big((\Delta i-\Sigma)^{-1}(-A) \big)=e^{-A}p \big((\Delta i-\Sigma)^{-1}(A) \big)

2. The integral fluctuation theorem:

\int_{\Omega} d p(\omega)\,e^{-\Delta i(\omega)+\Sigma(\omega)}=1

3. The ‘Second Law’ inequality:

\displaystyle{ \Delta S:=\int_{\Omega} d p(\omega)\,\Delta i(\omega)\geq \int_{\Omega} d p(\omega)\,\Sigma(\omega)}

The same theorem can be formulated for other kinds of Markov processes as well, including diffusion processes (in which case it follows from the Girsanov theorem).

References

Landauer’s principle was introduced here:

• [Landauer1961] R. Landauer, Irreversibility and heat generation in the computing process}, IBM Journal of Research and Development 5, (1961) 183–191.

and is now being verified experimentally by various groups worldwide.

The ‘fundamental theorem of natural selection’ was derived by Fisher in his book:

• [Fisher1930] R. Fisher, The Genetical Theory of Natural Selection, Clarendon Press, Oxford, 1930.

His derivation has long been considered obscure, even perhaps wrong, but apparently the theorem is now well accepted. I believe the first Markovian models of genetic evolution appeared here:

• [Fisher1922] R. A. Fisher, On the dominance ratio, Proc. Roy. Soc. Edinb. 42 (1922), 321–341.

• [Wright1931] S. Wright, Evolution in Mendelian populations, Genetics 16 (1931), 97–159.

Fluctuation theorems are reviewed here:

• [Sevick2008] E. Sevick, R. Prabhakar, S. R. Williams, and D. J. Searles, Fluctuation theorems, Ann. Rev. Phys. Chem. 59 (2008), 603–633.

Two of the key ideas for the ‘detailed fluctuation theorem’ discussed here are due to Crooks:

• [Crooks1999] Gavin Crooks, The entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences, Phys. Rev. E 60 (1999), 2721–2726.

who identified (E_{a}(\tau_{j})-E_{a}(\tau_{j-1})) as heat, and Seifert:

• [Seifert2005] Udo Seifert, Entropy production along a stochastic trajectory and an integral fluctuation theorem, Phys. Rev. Lett. 95 (2005), 4.

who understood the relevance of the self-information in this context.

The connection between statistical physics and evolutionary biology is discussed here:

• [Sella2005] G. Sella and A.E. Hirsh, The application of statistical physics to evolutionary biology, Proc. Nat. Acad. Sci. USA 102 (2005), 9541–9546.

and the ‘fitness flux theorem’ is derived in

• [Mustonen2010] V. Mustonen and M. Lässig, Fitness flux and ubiquity of adaptive evolution, Proc. Nat. Acad. Sci. USA 107 (2010), 4248–4253.

Schrödinger’s famous discussion of the physical nature of life was published here:

• [Schrödinger1944] E. Schrödinger, What is Life?, Cambridge University Press, Cambridge, 1944.


Follow

Get every new post delivered to your Inbox.

Join 729 other followers