## Network Theory (Part 4)

Last time I explained the rate equation of a stochastic Petri net. But now let’s get serious: let’s see what’s really stochastic— that is, random—about a stochastic Petri net. For this we need to forget the rate equation (temporarily) and learn about the ‘master equation’. This is where ideas from quantum field theory start showing up!

A Petri net has a bunch of states and a bunch of transitions. Here’s an example we’ve already seen, from chemistry:

The states are in yellow, the transitions in blue. A labelling of our Petri net is a way of putting some number of things in each state. We can draw these things as little black dots:

In this example there are only 0 or 1 things in each state: we’ve got one atom of carbon, one molecule of oxygen, one molecule of sodium hydroxide, one molecule of hydrochloric acid, and nothing else. But in general, we can have any natural number of things in each state.

In a stochastic Petri net, the transitions occur randomly as time passes. For example, as time passes we could see a sequence of transitions like this:

Each time a transition occurs, the number of things in each state changes in an obvious way.

#### The Master Equation

Now, I said the transitions occur ‘randomly’, but that doesn’t mean there’s no rhyme or reason to them! The miracle of probability theory is that it lets us state precise laws about random events. The law governing the random behavior of a stochastic Petri net is called the ‘master equation’.

In a stochastic Petri net, each transition has a rate constant, a nonnegative real number. Roughly speaking, this determines the probability of that transition.

A bit more precisely: suppose we have a Petri net that is labelled in some way at some moment. Then the probability that a given transition occurs in a short time $\Delta t$ is approximately:

• the rate constant for that transition, times

• the time $\Delta t$, times

• the number of ways the transition can occur.

More precisely still: this formula is correct up to terms of order $(\Delta t)^2$. So, taking the limit as $\Delta t \to 0$, we get a differential equation describing precisely how the probability of the Petri net having a given labelling changes with time! And this is the master equation.

Now, you might be impatient to actually see the master equation, but that would be rash. The true master doesn’t need to see the master equation. It sounds like a Zen proverb, but it’s true. The raw beginner in mathematics wants to see the solutions of an equation. The more advanced student is content to prove that the solution exists. But the master is content to prove that the equation exists.

A bit more seriously: what matters is understanding the rules that inevitably lead to some equation: actually writing it down is then straightforward.

And you see, there’s something I haven’t explained yet: “the number of ways the transition can occur”. This involves a bit of counting. Consider, for example, this Petri net:

Suppose there are 10 rabbits and 5 wolves.

• How many ways can the birth transition occur? Since birth takes one rabbit as input, it can occur in 10 ways.

• How many ways can predation occur? Since predation takes one rabbit and one wolf as inputs, it can occur in 10 × 5 = 50 ways.

• How many ways can death occur? Since death takes one wolf as input, it can occur in 5 ways.

Or consider this one:

Suppose there are 10 hydrogen atoms and 5 oxygen atoms. How many ways can they form a water molecule? There are 10 ways to pick the first hydrogen, 9 ways to pick the second hydrogen, and 5 ways to pick the oxygen. So, there are

$10 \times 9 \times 5 = 450$

ways.

Note that we’re treating the hydrogen atoms as distinguishable, so there are $10 \times 9$ ways to pick them, not $\frac{10 \times 9}{2} = \binom{9}{2}$. In general, the number of ways to choose $M$ distinguishable things from a collection of $L$ is the falling power

$L^{\underline{M}} = L \cdot (L - 1) \cdots (L - M + 1)$

where there are $M$ factors in the product, but each is 1 less than the preceding one—hence the term ‘falling’.

Okay, now I’ve given you all the raw ingredients to work out the master equation for any stochastic Petri net. The previous paragraph was a big fat hint. One more nudge and you’re on your own:

Puzzle. Suppose we have a stochastic Petri net with $k$ states and just one transition, whose rate constant is $r$. Suppose the $i$th state appears $m_i$ times as the input of this transition and $n_i$ times as the output. A labelling of this stochastic Petri net is a $k$-tuple of natural numbers $\ell = (\ell_1, \dots, \ell_k)$ saying how many things are in each state. Let $\psi_\ell(t)$ be the probability that the labelling is $\ell$ at time $t$. Then the master equation looks like this:

$\frac{d}{d t} \psi_{\ell}(t) = \sum_{\ell'} H_{\ell \ell'} \psi_{\ell'}(t)$

for some matrix of real numbers $H_{\ell \ell'}.$ What is this matrix?

You can write down a formula for this matrix using what I’ve told you. And then, if you have a stochastic Petri net with more transitions, you can just compute the matrix for each transition using this formula, and add them all up.

Someday I’ll tell you the answer to this puzzle, but I want to get there by a strange route: I want to guess the master equation using ideas from quantum field theory!

#### Some clues

Why?

Well, if we think about a stochastic Petri net whose labelling undergoes random transitions as I’ve described, you’ll see that any possible ‘history’ for the labelling can be drawn in a way that looks like a Feynman diagram. In quantum field theory, Feynman diagrams show how things interact and turn into other things. But that’s what stochastic Petri nets do, too!

For example, if our Petri net looks like this:

then a typical history can be drawn like this:

Some rabbits and wolves come in on top. They undergo some transitions as time passes, and go out on the bottom. The vertical coordinate is time, while the horizontal coordinate doesn’t really mean anything: it just makes the diagram easier to draw.

If we ignore all the artistry that makes it cute, this Feynman diagram is just a graph with states as edges and transitions as vertices. Each transition occurs at a specific time.

We can use these Feynman diagrams to compute the probability that if we start it off with some labelling at time $t_1$, our stochastic Petri net will wind up with some other labelling at time $t_2$. To do this, we just take a sum over Feynman diagrams that start and end with the given labellings. For each Feynman diagram, we integrate over all possible times at which the transitions occur. And what do we integrate? Just the product of the rate constants for those transitions!

That was a bit of a mouthful, and it doesn’t really matter if you followed it in detail. What matters is that it sounds a lot like stuff you learn when you study quantum field theory!

That’s one clue that something cool is going on here. Another is the master equation itself:

$\frac{d}{dt} \psi_{\ell}(t) = \sum_{\ell'} H_{\ell \ell'} \psi_{\ell'}(t)$

This looks a lot like Schrödinger’s equation, the basic equation describing how a quantum system changes with the passage of time.

We can make it look even more like Schrödinger’s equation if we create a vector space with the labellings $\ell$ as a basis. The numbers $\psi_\ell(t)$ will be the components of some vector $\psi(t)$ in this vector space. The numbers $H_{\ell \ell'}$ will be the matrix entries of some operator $H$ on that vector space. And the master equation becomes:

$\frac{d}{d t} \psi(t) = H \psi(t)$

Compare Schrödinger’s equation:

$i \frac{d}{d t} \psi(t) = H \psi(t)$

The only visible difference is that factor of $i$!

But of course this is linked to another big difference: in the master equation $\psi$ describes probabilities, so it’s a vector in a real vector space. In quantum theory $\psi$ describes probability amplitudes, so it’s a vector in a complex Hilbert space.

Apart from this huge difference, everything is a lot like quantum field theory. In particular, our vector space is a lot like the Fock space one sees in quantum field theory. Suppose we have a quantum particle that can be in $k$ different states. Then its Fock space is the Hilbert space we use to describe an arbitrary collection of such particles. It has an orthonormal basis denoted

$| \ell_1 \cdots \ell_k \rangle$

where $\ell_1, \dots, \ell_k$ are natural numbers saying how many particles there are in each state. So, any vector in Fock space looks like this:

$\psi = \sum_{\ell_1, \dots, \ell_k} \; \psi_{\ell_1 , \dots, \ell_k} \, | \ell_1 \cdots \ell_k \rangle$

But if write the whole list $\ell_1, \dots, \ell_k$ simply as $\ell$, this becomes

$\psi = \sum_{\ell} \psi_{\ell} | \ell \rangle$

This is almost like what we’ve been doing with Petri nets!—except I hadn’t gotten around to giving names to the basis vectors.

In quantum field theory class, I learned lots of interesting operators on Fock space: annihilation and creation operators, number operators, and so on. So, when I bumped into this master equation

$\frac{d}{d t} \psi(t) = H \psi(t)$

it seemed natural to take the operator $H$ and write it in terms of these. There was an obvious first guess, which didn’t quite work… but thinking a bit harder eventually led to the right answer. Later, it turned out people had already thought about similar things. So, I want to explain this.

When I first started working on this stuff, I was focused on the difference between collections of indistinguishable things, like bosons or fermions, and collections of distinguishable things, like rabbits or wolves.

But with the benefit of hindsight, it’s even more important to think about the difference between ordinary quantum theory, which is all about probability amplitudes, and the game we’re playing now, which is all about probabilities. So, next time I’ll explain how we need to modify quantum theory so that it’s about probabilities. This will make it easier to guess a nice formula for $H$.

### 44 Responses to Network Theory (Part 4)

1. Typo: “How many ways can death occur. Since death takes one wolf as input, it can occur in 10 ways.”

There are only 5 wolves.

2. John Baez says:

Hi, David. Whoops!

I switched rabbits and wolves halfway through. At first I had 5 rabbits and 10 wolves, but that made me sorry for both the rabbits and wolves, so I switched to 10 rabbits and 5 wolves, but I didn’t fix everything. I’ll fix it now.

(Everyone: David Corfield is a philosopher of mathematics who knows lots of math, a good friend of mine, and a coblogger of mine on the n-Category Café.)

(In case you’re wondering, David, I’ve decided to start introducing people to each other.)

3. And I’m very interested in why there’s a commonality between learning theory and physics, e.g., Bayesian Field Theory. I’m still hoping that ‘changing the rig (semiring)’ will yield some answers. See The Generalized Distributive Law for something on this.

4. John Baez says:

I’ll admit I’m still confused about why we should count ordered pairs of hydrogen atoms instead of unordered pairs when counting the number of ways to make a water molecule out of two H’s and an O. It’s the same darn water molecule either way, no?

On the other hand, if the two H’s were attached to the O in very different ways, I’d feel better about using ordered pairs. And I’d feel even better if instead of atoms we were talking about some process involving large ‘classical’ objects, where quantum considerations play no role.

For example, if I had 10 nails, and I were counting the ways I could take 2 and hammer them into a vertical beam, one near the floor and one near the ceiling, I’d feel quite happy about counting ordered pairs and saying the answer was 10 × 9.

But you’ll notice our stochastic Petri nets can’t tell if the two hydrogen atoms are being attached to the oxygen in the same way or different ways.

Some of this will straighten itself out when one of you solves the Puzzle following the rules I’ve described. Then we check to see whether the master equation you got is consistent, in the limit of large numbers of things in each state, with the rate equation we got last time. That’ll mean we at least have some consistent setup, even if we still want to argue about when it’s applicable.

• John F says:

Hmm. Ok, how about this: If they aren’t distinguishable, they are still exchangeable. So even after one H makes an OH, you don’t know that it didn’t switch with the second H.

5. No time right now, but were you to separate the process into O meets H, then OH meets H, would you expect to have an extra factor in the rate of OH production because they can connect in 2 slots? But then I suppose that would be subsumed in the rates for each transition.

Maybe the rate for the transition as a whole already takes such things into account.

• John Baez says:

David wrote:

No time right now, but were you to separate the process into O meets H, then OH meets H, would you expect to have an extra factor in the rate of OH production because they can connect in 2 slots?

That’s an interesting question. Here are a few random remarks, which don’t quite answer it.

If we’re talking about real-world chemistry, rather than ball-and-stick models of molecules, I don’t think a lone oxygen atom really has two distinct ‘slots’ waiting for hydrogens. It’s more like it has a propensity to attach to a hydrogen, which can come in from any direction and stick… and then the resulting hydroxyl radical OH still has a propensity to attach to another hydrogen, which will then move to form a 104.45° angle to the first.

Now, I mentioned both the following options back in part 3:

but there my focus was not on counting ways things could happen, but on the fact that the first process proceeds at a rate proportional to $[\mathrm{H}]^2 [\mathrm{O}]$, while the second process has two stages proceeding at rates proportional to $[\mathrm{H}] [\mathrm{O}]$ and $[\mathrm{H}] [\mathrm{OH}]$, respectively. In chemistry all the factors here are very small, which makes the second process vastly faster (i.e., more likely per unit time) than the first.

In simple terms, it’s very unlikely for three things to bump into each other at once—at least if we’re talking about atoms or molecules in a gas, or ions in a solution at low concentration.

In a sense this is irrelevant to the more combinatorial considerations we might like to focus on now, but it means that chemistry books hardly ever discuss rate equations for reactions where 3 molecules need to meet simultaneously… so I don’t know what the chemists think about the combinatorial factors!

• John Baez says:

John wrote:

It’s more like it has a propensity to attach to a hydrogen, which can come in from any direction and stick…

Actually I should check that: sad to say, I don’t remember if a lone oxygen atom in its ground state is spherically symmetric! Ugh… I could look it up, but now it’s bed-time.

• John Baez says:

John wrote:

I don’t remember if a lone oxygen atom in its ground state is spherically symmetric! Ugh… I could look it up, but now it’s bed-time.

I’m still confused about this and could use some help, but here’s what I’ve got so far. A lone oxygen atom in its ground state has 2 electrons in the 1s orbital, 2 in the 2s orbital, and 4 in the 2p orbital.

Here you can see the 1s orbital, the 2s orbital and the 2px, 2py and 2pz orbitals:

Each of these 5 orbitals can hold electrons in 2 spin states, for a total of 10. If our atom had 10 electrons it would be neon, a smugly self-satisfied noble gas—but oxygen, with just 8 electrons is hungry for 2 more. That’s why it has ‘valence 2’.

But here’s my question: can we imagine the 4 electrons that oxygen has in the 2p orbital as being quite definitely in the 2px and 2py orbitals, for example? If so, the oxygen atom isn’t spherically symmetric, but each of the 2 ‘slots’ for that are ‘hungry for electrons’ look exactly the same: the 2pz orbital.

To make sure this is right, I want to know the total angular momentum of an oxygen atom in its ground state. The atom will be spherically symmetric, i.e. rotationally invariant, if and only if this angular momentum is zero. So, I’m hoping the angular momentum of oxygen is nonzero in its ground state. But trying to look this up, I’m getting lost in the verbiage of Hund’s rule!

Someone out there must remember their physical chemistry. I never studied much of that… I picked it up on the streets.

• streamfortyseven says:

Try valence shell electron pair repulsion (VSEPR) theory: http://en.wikipedia.org/wiki/VSEPR_theory

My bet is that the lone pairs and the radicals try to get as far apart as possible, which implies that spherical symmetry does not exist, it’s closer to C2 symmetry. H-O-H is tetrahedral, with the H-O-H bond angle at 109.5 degrees, and the lone pair-O-lone pair bond angle also at 109.5 degrees.

• John Baez says:

Streamfortyseven wrote:

Try valence shell electron pair repulsion (VSEPR) theory: http://en.wikipedia.org/wiki/VSEPR_theory

Thanks… but I’m hoping some chemist out there will just take pity and tell me the answer.

My bet is that the lone pairs and the radicals try to get as far apart as possible, which implies that spherical symmetry does not exist…

That’s my current hunch, but quantum mechanics is funky: I’m scared that in reality, the oxygen atom’s ground state is a superposition of all possible rotated versions of some charismatic easy-to-draw non-spherically symmetric state. But if that were true, its total angular momentum would be zero. And somewhere, somebody knows if that’s true. I could just walk over to the chemistry department and ask, but I don’t want to make a fool of myself like that—I’d rather make a fool of myself publicly in front of the whole world.

H-O-H is tetrahedral, with the H-O-H bond angle at 109.5 degrees…

The water molecules where you live must be slightly stretched; Wikipedia says the angle is just 104.45 degrees:

The tetrahedral angle is a bit more:

and that’s exactly what we see in methane (CH4), but water doesn’t have any reason to perfectly match that angle, since it doesn’t have tetrahedral symmetry. Still, it sort of ‘fakes it’ in ice crystals.

…and the lone pair-O-lone pair bond angle also at 109.5 degrees.

I’m not sure what you mean by that, but I’m pretty darn sure that an atom of oxygen all by itself has no features that involve angles like 109.5 or 104.45 degrees. The 104.45 degree angle in water only shows up when you stick two hydrogens onto the oxygen: it’s a complicated angle, with no simple formula for it, arising from the complicated interactions between the oxygen nucleus, the hydrogen nucleus, and the 10 electrons. This is the sort of thing we pay physical chemists to compute.

• streamfortyseven says:

Monatomic oxygen is really reactive, you don’t find it in nature except when adsorbed on a metal surface, so oxygen usually shows up as a diatomic molecule. However, you can use this orbital diagram to figure out what monatomic oxygen looks like:

• Kenneth Malcolm Mackay, Rosemary Ann Mackay, W. Henderson, Introduction to Modern Inorganic Chemistry.

See p. 72.

We have 2 paired electrons in a spherically symmetric 2s orbital, 2 paired electrons in a 2p orbital, $2p_z$ and one electron in each of the remaining 2p orbitals $2p_x$ and $2p_y$ and these electrons are both spin “up” or spin “down”, +1 or -1. (see http://en.wikipedia.org/wiki/Atomic_orbital for graphic representations of the orbital wavefunctions) The 2p orbitals are orthogonal to each other and have ellipsoidal symmetry with a point of tangency at the nucleus of the atom. Given that the $2p_z$ orbital is occupied by a pair of electrons, and that the remaining 2p orbitals $2p_x$ and $2p_y$ are each occupied by unpaired electrons, I’d expect that the monatomic species of oxygen would exhibit C4 symmetry – not C2.

H-O-H exhibits C2 symmetry, and the H-O-H bond angle is indeed 104.45 degrees instead of the tetrahdral angle of 109.5 degrees, due to Coulombic repulsion between the two lone pairs on the oxygen.

• John Baez says:

Thanks, streamfortyseven. I don’t seem able to see page 72 of that textbook, but as far as monatomic oxygen goes, it sounds like you’re correcting my earlier guess that of the $2p_x$, $2p_y$ and $2p_z$ orbitals, two hold 2 electrons and one is completely empty. You’re saying one holds 2 electrons and two hold 1. Thanks!

I think this is the kind of question Hund’s rules are supposed to settle, but I never learned those and I don’t have the patience to right now.

Monatomic oxygen is really reactive, you don’t find it in nature except when adsorbed on a metal surface, so oxygen usually shows up as a diatomic molecule.

Yeah. By the way, when I googled monatomic oxygen, I got quite a few flaky websites near the top.

• “How does Oxy-Powder® Release Nascent/Monatomic Oxygen?” – an ad for some dubious product; contains some interesting information but blurs the difference between monatomic oxygen and singlet oxygen, which is something completely different.

• An ad for an even more dubious product: “Ambaya Gold’s revolutionary Fusion Gold™ formula combines four vital ingredients into one powerful formula – blending Monatomic Silver, Monatomic Gold, and Oxygen with an activated Fulvic Base. Benefits may include:

* Overall boost in energy
* Increased systemic fitness
* Immune support assistance
* Enhanced mental clarity
* Heightened sense of well being”

Etcetera…

• Jon Rowlands says:

I’m no chemist, but aren’t the orbitals you drew calculated as the excited states of a single electron around an otherwise bare nucleus? Since they don’t consider any interactions between multiple electrons except the exclusion “force”, it feels like they just don’t have it in them to predict bond angles. I wonder if after a certain point, they’re just a nice basis. Density functional theory looks better a it.

Nice series, by the way. I’ve been a fan of TWF for years. Azimuth code project is a great idea — Mersenne meets Stallman.

• John Baez says:

Jon wrote:

I’m no chemist, but aren’t the orbitals you drew calculated as the excited states of a single electron around an otherwise bare nucleus? Since they don’t consider any interactions between multiple electrons except the exclusion “force”, it feels like they just don’t have it in them to predict bond angles.

Right, they don’t have the ability to predict bond angles. I was only trying to settle the question of whether a lone oxygen atom is spherically symmetric or not. And why? It’s because I told David Corfield this:

If we’re talking about real-world chemistry, rather than ball-and-stick models of molecules, I don’t think a lone oxygen atom really has two distinct ‘slots’ waiting for hydrogens. It’s more like it has a propensity to attach to a hydrogen, which can come in from any direction and stick… and then the resulting hydroxyl radical OH still has a propensity to attach to another hydrogen, which will then move to form a 104.45° angle to the first.

As soon as I wrote this, I started doubting it. I started wondering: is the oxygen atom really spherically symmetric before the hydrogen comes along, so that the hydrogen can attach itself anywhere with equal probability? Or does a hydrogen atom coming up to an oxygen atom attach to one of two predefined ‘slots’ with more or less well-defined locations?

(Of course ‘slots’ is a somewhat naive term for patterns in electron wavefunctions, but never mind.)

Until someone tells me the angular momentum of a lone oxygen atom, I won’t know the answer for sure. But I’m now pretty sure the latter picture is closer to the truth. I believe the 4 electrons in the outermost shell of the oxygen atom break the spherical symmetry and in fact pick out two preferred axes, lying at a roughly 90° angle to each other. That’s what those ‘p orbital’ pictures are supposed to suggest. But of course those pictures are only approximate, since (as you note) they neglect inter-electron forces.

Then, when two hydrogens actually attach to the oxygen, they wind up attaching at an angle of 104.45°, for complicated reasons that take a computer simulation to fully fathom. Presumably the repulsion of the hydrogens’ nuclei is largely to blame.

• Damian Allis says:

Greetings John,

Bruce Smith sent me to try to say something constructive (and a while back, I started thinking + searching for an answer, realized how long it was going to be, then forgot to follow-up).

I am pleased to report that (1) you’d be hard pressed to find a chemist with a convincing answer to all other chemists, (2) this is more atomic physics than chemistry, which is why you’d easily stump a chemist on this, (3) I have found a few very eloquent and involved discussions that say “it’s spherically symmetric” and “it’s not spherically symmetric,” (which means the answer is either “neither” or a superposition of the two) and (4) it probably doesn’t matter in the grand scheme because the response of the atom to any kind of external stimuli (such as an H atom) would be effectively instantaneous.

Just going by a combination of Hund’s Rule(s) (max multiplicity = lowest E; largest orbital angular momentum = lowest energy for a given multiplicity; outermost subshell half-filled or less = lowest total angular momentum is lowest in E / outermost subshell more than half-filled = highest total angular momentum is lowest in E), the Aufbau Principle (add e- into a p-shell one-at-a-time, so in nitrogen each p-orbital has 1 e- each), and Unsold’s Theorem (the square of the electronic wavefunction for a completely filled or half-filled sub-shell is spherically symmetric), oxygen is NOT spherically symmetric (well, not spherically symmetric if you were standing on the nucleus). These are the most fundamental “non-computational-quantum-chemical” ways to interpret SINGLE atoms and are all from good olde fashioned spectroscopy back in the day. I note that these are SINGLE atom cases, where the larger the vacuum the better the result. Importantly, these rules break down (or don’t apply) when the atom exists in an external potential (such as an H atom, H nucleus, whatever).

My initial thought (as a computational chemist) was that the “I’m scared that in reality, the oxygen atom’s ground state is a superposition of all possible rotated versions of some charismatic easy-to-draw non-spherically symmetric state” was a bit more formally correct as an interpretation of the system. That is, all isolated atoms are spherically symmetric, with the electrons existing in a superposition of all linear combinations of the available orbitals, making the “density” of the atom the “ensemble density” of the linear combinations. More to the point, the potential around an isolated atom would be spherically symmetric (if it is truly isolated), so the electrons have no bias to group in any arrangement one might describe as px, py, pz, whatever. One obtains a differentiation of “orbital arrangements” in an external potential (introducing the laboratory reference frame), and it is the balance of internal electronic repulsion in the presence of this potential and the stabilization of electronic arrangements that come from the presence of the external potential (two H atoms around an oxygen, for instance) that gives one a reference frame for any separation of electron density into “directions” (along bonds, external field, etc.).

All that said, orbitals are a pleasant fiction we use to describe the behavior of electrons in our modern one-electron wavefunction treatment in quantum chemistry. The isolated atom is one of the hardest systems to treat with great accuracy by quantum chemical methods because one must use multireference computational approaches to deal with the “spherical degeneracy” of the electronic states. I’ll take a methane calculation over isolated carbon any day.

Extra credit – once you’ve formed OH, the electron density of the O will attract the second H, which will then adopt the standard H2O geometry discussed above. The 104.45 angle is best explained as “the two lone pairs on the “other side” of the oxygen take up more space because you’ve got four repulsing electrons, so they take up more space, giving the H-O-H bend reduction to take up the slack.

I hope that it any way helps. If not, I’ll be happy to try to attempt a better explanation.

• John Baez says:

Hi, Damian! Thanks for trying to help! I’m still a bit confused, but misery loves company, so I feel much better hearing you say “I’ll take a methane calculation over isolated carbon any day” and scary stuff like:

The isolated atom is one of the hardest systems to treat with great accuracy by quantum chemical methods because one must use multireference computational approaches to deal with the “spherical degeneracy” of the electronic states.

It makes me feel like I wasn’t just being really dumb.

To compress your long answer into a soundbite, I guess you’re saying that a completely isolated unexcited oxygen atom is spherically symmetric.

• Damian Allis says:

Well, I think I’d go with the “simultaneous states of non-spherically symmetric” interpretation because there’s no way to define axes without imposing on the atom. The olde guard atomic physics would argue non-spherically symmetric because you’ve got two unpaired electrons in two p orbitals, but you’d never be able to “dock” along an unfilled orbital in any kind of chemical trajectory (which was how I interpreted the basis of your question in my first pass of your query).

My tweet-friendly would be “non-spherically symmetric locally, spherical enough at a distance, no way to sneak up and ask at any instant.”

6. DavidTweed says:

This is a complete shot in the dark, but is it definitely the case that the denominator factor of $M!$ is missing because the items are “distinguishable”, rather than some other thing cancelling it out? (AIUI, the model of a chemical-style Petri net is that there’s some volume where a transition occurs if all the inputs happen to be in the same “region” at the same time — and the actual volume disappears into concentrations so it doesn’t appear in the final differential equation. In the approximation of discretised space and completely random location, the statistics of this kind of thing are studied by mathematicians in turns of balls in urns and by theoretical computer sicentists as models of hash tables, but I immediately can’t find any applicable work, but its conceivable that the denominator might cancel from these considerations.)

• John Baez says:

David wrote:

This is a complete shot in the dark, but is it definitely the case that the denominator factor of M! is missing because the items are “distinguishable”, rather than some other thing cancelling it out?

Until I started thinking hard about examples from chemistry, I felt completely sure that we should count the things in Petri nets as distinguishable, so that we should get the falling power

$L^{\underline{M}} = L (L - 1) \cdots (L - M + 1)$

$\binom{L}{M} = \frac{L (L - 1) \cdots (L - M + 1)}{M (M - 1) \cdots 1}$

when counting the number of ways we can pick $M$ things from a collection of $L$.

But now I’m entering a period of fruitful confusion about this sort of issue, so I’ll just say “I’m not sure”.

AIUI, the model of a chemical-style Petri net is that there’s some volume where a transition occurs if all the inputs happen to be in the same “region” at the same time — and the actual volume disappears into concentrations so it doesn’t appear in the final differential equation. In the approximation of discretised space and completely random location, the statistics of this kind of thing are studied by mathematicians in turns of balls in urns and by theoretical computer scientists…

That all sounds good. When thinking about stochastic Petri nets we should not need to think about quantum mechanics or continuum space. It should all be combinatorics, balls in urns, that sort of thing.

However, thinking about chemistry makes me waver from that principle. In particular, while wolves and rabbits are so complicated that they’re always distinguishable, atoms are so simple that they can sometimes lie in the exact same state (Bose-Einstein condensation). And chemistry lies in that ‘semiclassical’ regime where I have trouble deciding whether I should use classical or quantum reasoning.

All these worries are probably red herrings, since stochastic Petri nets are supposed to describe the behavior of classical things, not quantum ones. However, studying them using math developed in quantum mechanics makes it easy to get confused… and using examples from chemistry, where quantum mechanics really does matter sometimes, makes it even easier.

• streamfortyseven says:

If you’re looking at anything having to do with distributions of electron density, such as reactivity, kinetics, or electronic properties, you’re using quantum chemistry.

If you’re looking at properties of molecules in aqueous solution, or properties of receptor sites in large molecules, or 2D and 3D folding and structure, you would use semi-empirical methods which use a combination of experimentally-derived measurements and quantum mechanical approximations, which use a mixture of classical and quantum reasoning.

Finally, if you’re looking strictly at structures, especially for large molecules like proteins and the like, you’ll be using molecular mechanics calculations, which use classical reasoning exclusively.

In this example, namely the electronic properties of monatomic oxygen, you’ll be using strictly quantum reasoning.

• streamfortyseven says:

And I just gave you the state of affairs 20 years ago, when I was last involved in doing any sort of computational chemistry… Things have changed, there are groups which are currently using Petri nets to study kinetics in chemical reaction networks:

D. Angeli, P. de Leenheer, and E.D. Sontag. Persistence results for chemical reaction networks with time-dependent kinetics and no global conservation laws. SIAM Journal on Applied Mathematics, 71:128-146, 2011. (at http://www.math.rutgers.edu/~sontag/FTP_DIR/angeli_leenheer_sontag_siap_online2011.pdf)

D. Angeli, P. de Leenheer, and E.D. Sontag. Graph-theoretic characterizations of monotonicity of chemical networks in reaction coordinates. J. Mathematical Biology, 61:581-616, 2010. (at http://www.math.rutgers.edu/~sontag/FTP_DIR/angeli_leenheer_sontag_graphs_JMB2010.pdf)

D. Angeli, P. de Leenheer, and E.D. Sontag. Chemical networks with inflows and outflows: A positive linear differential inclusions approach. Biotechnology Progress, 25:632-642, 2009. (at http://www.math.rutgers.edu/~sontag/FTP_DIR/angeli_leenheer_sontag_biotech08.pdf)

“Abstract: Certain mass-action kinetics models of biochemical reaction networks, although described by nonlinear differential equations, may be partially viewed as state-dependent linear time-varying systems, which in turn may be modeled by convex compact valued positive linear differential inclusions. A result is provided on asymptotic stability of such inclusions, and applied to biochemical reaction networks with inflows and outflows. Included is also a characterization of exponential stability of general homogeneous switched systems.”

and other papers available at http://www.math.rutgers.edu/~sontag/Author/ANGELI-D.html

• John Baez says:

John wrote:

Until I started thinking hard about examples from chemistry, I felt completely sure that we should count the things in Petri nets as distinguishable, so that we should get the falling power…

Now I’ve found that in chemical reaction network theory they also use falling powers to count the number of ways a reaction can occur. So, I’m content again.

I won’t give a reference yet, since it gives away the answer to the Puzzle.

Is H the probability matrix of transitioning from state n to state m at time t? In Hidden Markov models, for instance, the transition matrix T plays this role, and is typically diagonally dominant, meaning that there is a much greater probability of staying in state n than transitioning to state m at time t. I am reasoning by analogy, rather than analytically, so my guess might be way off!

• John Baez says:

Tom wrote:

Is H the probability matrix of transitioning from state n to state m at time t?

It’s actually the probability-per-unit time rate of transitioning from one labelling to another.

(I am using ‘labelling’ in a specific way in my post, which is different from my usage of ‘state’, but if you think of a ‘state’ as the state of a single particle, then a ‘labelling’ is a state of a collection of particles. So I don’t really mind you saying ‘state’.)

In Hidden Markov models, for instance, the transition matrix T plays this role…

From what you say, I think the relation is

$T = \exp(t H)$

• phorgyphynance says:

Fun stuff. I did quite a bit of modeling these Markov chains with transition matrices. I was looking at credit ratings of companies for credit risk and covenant triggers, etc. For example, what is the probability of being downgraded by the ratings agencies over the next year?

• John Baez says:

In the series of posts I’m dreaming of, we’ll soon get to Markov chains and later hidden Markov models, Bayesian networks and other related things. Let’s see if I can keep up the energy!

• Blake Stacey says:

$T = \exp(tH)$ looks right, tho’ I’ve often seen the “Hamiltonian” defined with the opposite sign.

• John Baez says:

Yes—as you know, in physics there’s a reason why $T = \exp(-tH)$ is the standard convention: $H$ means ‘energy’, we learned as children that energy should be positive, and later we learned that probability densities spread out when we think of $H$ as an operator and use $\exp(-t H)$ to evolve our probability densities with time.

So, for example, if $H = -\nabla^2$ is the positive version of the Laplace operator, $\exp(-t H)$ describes the diffusion of heat.

This isn’t a terribly convincing example to an outsider, since it sounds like after we chose one bad sign convention we needed another to fix it! It’s only after having been convinced that kinetic energy should be positive and $-\nabla^2$ means ‘energy’ that all these minus signs seem like good ideas.

And indeed, I decided that in my expository treatment it would be more confusing than enlightening to first stick a minus sign in the definition of $H$ and then declare that time evolution was given by $\exp(-t H)$.

But I knew someone would wonder about that—thanks for doing it so gently.

I haven’t carefully examined how many equations would simplify if we changed our conventions and decided that energy should be negative. We could use $\exp(t H)$ instead of $\exp(-t H)$… and force would equal the gradient of potential energy, instead of minus the gradient. Aren’t these hints that perhaps energy should be replaced by negergy? Perhaps after we eliminate $\pi$ and embrace Tauism, we can tackle that.

• It’s only after having been convinced that kinetic energy should be positive and $-\nabla^2$ means ‘energy’ that all these minus signs seem like good ideas.

Plus, $-\mathrm{div}$ being the adjoint of $\mathrm{grad}$, so “$-\nabla^2$$= \nabla^\ast\nabla$ being the right thing in Hilbert space.

• Blake Stacey says:

I’m sure you could make a fortune marketing a brand of negative-calorie soda.

(To paraphrase Tony Zee: the difference between a good theorist and a bad theorist is that the good one makes an even number of sign errors.)

In this case, the sign on $H$ is probably less important than the fact that this “Hamiltonian” is not guaranteed to be Hermitian, so you can’t count on its eigenvalues being real. In fact, the cookbook way of writing $H$ for a reaction-diffusion system gives two terms (of opposite sign1) for each reaction, only one of which has to have a “balance” between creation and annihilation operators.

1 “Theorem Zero: You can’t win.” — Jeffrey Goldstone

• John Baez says:

Blake wrote:

I’m sure you could make a fortune marketing a brand of negative-calorie soda.

Great idea! I’ll call it Ice Water™.

It’s amazing stuff. If your body temperature is 37°, it has -37,000 calories per liter.

In fact, the cookbook way of writing H for a reaction-diffusion system gives two terms (of opposite sign) for each reaction, only one of which has to have a “balance” between creation and annihilation operators.

Cookbook way, eh? Oh well, I reinvented this recipe in the context of Petri nets. I’ve been putting off reading all those references you gave me… but I know exactly what you’re talking about. I’ll explain it here soon.

8. Giampiero Campa says:

What lead me astray at the beginning is that you called $H_{\ell \ell'}$ a matrix, which made me think of $\psi_\ell(t)$ as a (k by 1) vector, and after a while things didn’t make too much sense.

Now i see that you meant that H is a matrix but both $H_{\ell \ell'}$ and $\psi_\ell(t)$ are scalars, so I have to rethink it a little.

Right now my line of thinking is that, ok let’s see if i get the latex right:

$\frac{d}{dt} \psi_{\ell '}(t) = \prod_{i=1}^k r (n_i - m_i) {\ell '_1}^{\underline{m_1}} \cdots {\ell'_k}^{\underline{m_k}}$

but of course i need to think about it some more.

• Giampiero Campa says:

Of course i did NOT get the latex right, oh well it doesn’t matter as it probably doesn’t make any sense anyway :-)

• John Baez says:

Giampiero wrote:

What lead me astray at the beginning is that you called $H_{\ell \ell'}$ a matrix…

Sorry!

Physicists are usually happy to call $A_{i j}$ a matrix, $v_i$ a vector, and $f(x)$ a function, while mathematicians like to emphasize that $A_{i j}$ is an entry of the matrix $A$, $v_i$ is an entry of the vector $v$, and $f(x)$ is the value of the function $f$ at the point $x$. I have never before seen anyone except mathematicians get confused by the physicists’ way of talking. Or maybe you thought that since I’m a mathematician, I’d use the mathematicians’ way of talking? I’m actually enjoying being a bit more of a physicist here in Singapore—it’s like a vacation from rigor.

Anyway:

$H_{\ell \ell'}$ is the probability per unit time that the Petri net will transition to the labelling $\ell$ given that it’s in the labelling $\ell'$.

$\psi_{\ell}(t)$ is the probability that the Petri net is in the labelling $\ell$.

I’m not sure why your latex didn’t work, but I retyped it slightly differently and it worked. I hope this is what you were trying to say:

$\frac{d}{dt} \psi_{\ell '}(t) = \prod_{i=1}^k r (n_i - m_i) {\ell '_1}^{\underline{m_1}} \cdots {\ell'_k}^{\underline{m_k}}$

It looks close to right, but not quite right.

First, you have a product

${\ell '_1}^{\underline{m_1}} \cdots {\ell'_k}^{\underline{m_k}}$

but also a product symbol

$\prod_{i=1}^k$

Second, the rate of change of $\psi$ has to depend on $\psi$ somehow. I said:

Suppose we have a Petri net that is labelled in some way at some moment. Then the probability that a given transition occurs in a short time $\Delta t$ is approximately…

But of course we don’t know for sure that the Petri net is labelled in some particular way $\ell$. We only know that it has the labelling $\ell$ with probability $\psi_\ell(t)$.

If it starts in the labelling $\ell$, which labelling(s) can it make a transition to?

Third, $n_i - m_i$ is really important, and belongs in the formula somehow, but I don’t think the probability is proportional to this quantity. Remember, this is the number of outputs of type $i$ minus the number of inputs of type $i$.

• Giampiero Campa says:

Or maybe you thought that since I’m a mathematician, I’d use the mathematicians’ way of talking?

No, not at all. Engineers also refer to $A_{i j}$ and $v_i$ as entries of the matrix A and the vector v respectively. On the other hand we do call $f(x)$ a function.

Second, the rate of change of $\psi$ has to depend on $\psi$ somehow.

Of course, i just forgot to multiply the right hand side by $\psi_{\ell}(t)$

but also a product symbol
$\prod_{i=1}^k$

The reason i have that product, as well as $(n_i-m_i)$ is because i was trying to form the derivative of $\psi_{\ell'}(t)$ as the product of the derivatives of the probabilities of each single component of the state $\ell'$, so to speak.

Perhaps the terms that depend on i compensate each other somehow and you can drop both the product sign and the term $(n_i-m_i)$ in the formula that i have written. But i still haven’t had a chance to really think about it.

If it starts in the labelling $\ell$, which labelling(s) can it make a transition to?

I am still at a loss on this. And i actually wonder why we can’t just propagate the random variable $\ell$ in time, which we should be able to do if we have the deterministic evolution equations (and we do) and the probability distribution (and it looks like we assume it).

• John Baez says:

Giampiero wrote:

And i actually wonder why we can’t just propagate the random variable \ell in time, which we should be able to do if we have the deterministic evolution equations (and we do) and the probability distribution (and it looks like we assume it).

There are a couple of reasons:

First, those deterministic evolution equations (the ‘rate equations’) assume the amount of things we have in each state varies continuously. For example, the rate equation for this model

might predict that starting out with 10 rabbits now, we will have 10.5 rabbits one week later.

But now we are studying the ‘master equation’, which assumes the number of things we have in each state varies discretely. No half-rabbits! The number must be a natural number.

Second, the master equations are really probabilistic, not deterministic. Starting out with 10 rabbits now, they’ll tell us the probability that we’ll have 9 or 10 or 11 or some other number of rabbits one week from now.

So the master equation is really answering a different question than the rate equation. It just so happens—we’ll see this later—that in the limit of large numbers, the results we get from the master equation are very close on average to the results we get from the rate equation.

I am still at a loss on this.

Okay. If you start with $\ell_i$ things in the $i$th state, and something happens that eats $m_i$ things in the $i$th state and produces $n_i$ new things in that state, you’re left with

$\ell_i + n_i - m_i$

things in the $i$th state.

We can say the same thing faster using more jargon: if our system starts in the labelling $\ell$, our transition takes it to the labelling $\ell + m - n$.

So, we expect that the matrix entry

$H_{\ell' \ell}$

will be nonzero if

$\ell' = \ell + m - n$

(Note that we’ve switched the roles of $\ell$ and $\ell'$. In my article I was talking about $H_{\ell \ell'}$; now we’re talking about $H_{\ell' \ell}$. It doesn’t really matter as long as we pay attention.)

So, it would be nice to know the formula for $H_{\ell' \ell}$ in this case. The answer is supposed to be contained in these cryptic comments of mine:

A bit more precisely: suppose we have a Petri net that is labelled in some way at some moment. Then the probability that a given transition occurs in a short time $\Delta t$ is approximately:

• the rate constant for that transition, times

• the time $\Delta t$, times

• the number of ways the transition can occur.

More precisely still: this formula is correct up to terms of order $(\Delta t)^2$. So, taking the limit as $\Delta t \to 0$, we get a differential equation describing precisely how the probability of the Petri net having a given labelling changes with time! And this is the master equation.

But be careful: is

$\ell' = \ell + m - n$

the only value of $\ell'$ for which $H_{\ell' \ell}$ is nonzero?

The answer to that may become clearer upon reading part 5. On the other hand, you can also figure it out just by thinking.

Here’s another way to ask the question. The question is, which labellings $\ell'$ have their probabilities changing

$\frac{d}{dt} \psi_{\ell'}(t) \ne 0$

if at some moment we’re sure the system has the labelling $\ell$:

$\psi_\ell(t) = 1$

We know that

$\ell' = \ell + m - n$

is one such labelling. But is it the only one?

Once we settle this, we can work out exactly what

$\frac{d}{dt} \psi_{\ell'}(t)$

equals. We know

$\frac{d}{dt} \psi_{\ell'}(t) = \sum_\ell H_{\ell' \ell}\; \psi_\ell(t)$

so once we know which matrix entries $H_{\ell' \ell}$ are nonzero, we can figure out what they are and we’ll be done!

(By the way, I encourage other folks to join in and help Giampiero solve this puzzle! The equation we’re looking for is called the ‘chemical master equation’, because it’s the basic equation governing chemical reactions. So if you’re already an expert on that, this puzzle might be too easy to be enjoyable. But if you’ve never thought about it, there’s no better way to learn it than to reinvent it.)

9. Blake Stacey says:

The “start here” link on your Network Theory page (which points to here) seems to be broken.

• John Baez says:

Thanks! It’s pretty rare that I manage to delete a page on my website, but that’s what happened. Luckily I had a backup.

I’ve gone ahead and improved the links so people can jump straight to any one of the ‘lectures’ starting from here.

10. John Baez says:

Okay, I’ll answer the puzzle in this blog entry. It will look a tiny bit prettier if we switch the roles of $\ell$ and $\ell'$:

Puzzle. Suppose we have a stochastic Petri net with $k$ states and just one transition, whose rate constant is $r$. Suppose the $i$th state appears $m_i$ times as the input of this transition and $n_i$ times as the output. A labelling of this stochastic Petri net is a $k$-tuple of natural numbers $\ell = (\ell_1, \dots, \ell_k)$ saying how many things are in each state. Let $\psi_\ell(t)$ be the probability that the labelling is $\ell$ at time $t$. Then the master equation looks like this:

$\frac{d}{d t} \psi_{\ell'}(t) = \sum_{\ell} H_{\ell' \ell} \psi_{\ell}(t)$

for some matrix of real numbers $H_{\ell' \ell}$. What is this matrix?

Answer. To compute $H_{\ell' \ell}$ it’s enough to start the Petri net in a definite labelling $\ell$ and see how fast the probability of being in some labelling $\ell'$ changes. In other words, if at some time $t$ we have

$\psi_{\ell}(t) = 1$

then

$\frac{d}{d t } \psi_{\ell'}(t) = H_{\ell' \ell}$

at this time.

Now, suppose we have a Petri net that is labelled in some way at some moment. Then I said the probability that the transition occurs in a short time $\Delta t$ is approximately:

• the rate constant $r$, times

• the time $\Delta t$, times

• the number of ways the transition can occur, which is the product of falling powers $\ell_1^{\underline{m_1}} \cdots \ell_k^{\underline{m_k}}.$ Let’s call this product $\ell^{\underline{m}}$ for short.

Multiplying these 3 things we get

$r \; \ell^{\underline{m}} \; \Delta t$

So, the rate at which the transition occurs is just:

$r \ell^{\underline{m}}$

And when the transition occurs, it eats up $m_i$ things in the ith state, and produces $n_i$ things in that state. So, it carries our system from the original labelling $\ell$ to the new labelling

$\ell' = \ell + n - m$

So, in this case we have

$\frac{d}{d t} \psi_{\ell'}(t) = r \ell^{\underline{m}}$

and thus

$H_{\ell' \ell} = r \ell^{\underline{m}}$

However, that’s not all: there’s another case to consider! Since the probability of the Petri net being in this new labelling $\ell'$ is going up, the probability of it staying in the original labelling $\ell$ must be going down by the same amount. So we must also have

$H_{\ell \ell} = - r \ell^{\underline{m}}$

We can combine both cases into one formula like this:

$H_{\ell' \ell} = r \ell^{\underline{m}} \left(\delta_{\ell', \ell + n - m} - \delta_{\ell', \ell}\right)$

Here the first term tells us how fast the probability of being in the new labelling is going up. The second term tells us how fast the probability of staying in the same labelling is going down.

Note: each column in the matrix $H_{\ell' \ell}$ sums to zero, and all the off-diagonal entries are nonnegative. That’s good: we now know that this matrix must be ‘infinitesimal stochastic’, meaning it must have these properties!

• Giampiero Campa says:

OK, i finally had some time to go through this yesterday, and now everything makes sense.

The last formula with the Kronecker’s delta is really what turned all the the lights on for me. I guess i was not really allowing l’ to be an “independent” variable in H, in some sense.

Maybe a part of it is that it does not come natural for me to reason in the “labeling space”, as opposite to the “state space”, of the system (actually i am not yet too sure about what are the advantages of doing so). I guess it’s because I haven’t studied quantum mechanics long enough :)

• John Baez says:

Hi, Giampiero! I’m glad you get it and sort of like it. I’m going to resume posts in this series soon: I’ll start by reviewing how to get the rate equation and master equation from a stochastic Petri net, and how the latter reduces to the former in the limit of ‘large particle number’ (as the physicists would say).

The reason people work in ‘label space’ instead of ‘state space’, if I understand what you mean by those terms, is that many different probability distributions $\psi_\ell(t)$ have the same mean values for the number of things of the $\ell$th type, and their time evolution will be different. Only in a certain limit where there are many things of every type can we describe dynamics in terms of these mean values.

11. David Corfield says:

In ‘Puzzle’, the summation in the master equation should be over $\ell$ rather than $\ell'$.

In the final equation, $H_{\ell' \ell} = r \ell^{\underline{m}} \left(\delta_{\ell + n - m,\ell} - \delta_{\ell, \ell}\right)$, you ought to have a couple of instances of $\ell'$ on the right.

So, with a collection of transitions in place in your Petri net you just add these $H$ matrices, and all is well since infinitesimal stochastic matrices are closed under addition.

• John Baez says:

Thanks for all the corrections, David! I’ll fix that stuff. As usual, a more beautiful version of this blog entry, with solutions to the puzzles, is available on my networks page.

So, with a collection of transitions in place in your Petri net you just add these $H$ matrices, and all is well since infinitesimal stochastic matrices are closed under addition.

Right! They’re closed under nonnegative linear combinations.

Hmm. If we only keep the condition that the columns sum to zero, and drop the condition that the off-diagonal entries are nonnegative, it seems we get not just a vector space of matrices, but even a Lie algebra $\mathfrak{g}$. I’m guessing this because if we take the definition of ‘stochastic matrix’, and drop the condition that the entries are nonnegative, keeping the condition that the columns sum to 1, we seem to get a Lie group $G$. This is the Lie group of transformations of $\mathbb{R}^n$ that preserve the sum of the entries. The stochastic matrices form a sub-monoid of $G$. And thus the infinitesimal stochastic matrices form a cone in the Lie algebra $\mathfrak{g}$.

This site uses Akismet to reduce spam. Learn how your comment data is processed.