Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.
But first, let me quickly sketch why it could be worthwhile.
A Poisson process
Consider this stochastic Petri net with rate constant
:
It describes an inexhaustible supply of fish swimming down a river, and getting caught when they run into a fisherman’s net. In any short time
there’s a chance of about
of a fish getting caught. There’s also a chance of two or more fish getting caught, but this becomes negligible by comparison as
. Moreover, the chance of a fish getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.
Problem. Suppose we start out knowing for sure there are no fish in the fisherman’s net. What’s the probability that he has caught
fish at time
?
At any time there will be some probability of having caught
fish; let’s call this probability
, or
for short. We can summarize all these probabilities in a single power series, called a generating function:

Here
is a formal variable—don’t ask what it means, for now it’s just a trick. In quantum theory we use this trick when talking about collections of photons rather than fish, but then the numbers
are complex ‘amplitudes’. Now they are real probabilities, but we can still copy what the physicists do, and use this trick to rewrite the master equation as follows:

This describes how the probability of having caught any given number of fish changes with time.
What’s the operator
? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:

We can try to apply this to our fish. If at some time we’re 100% sure we have
fish, we have

so applying the creation operator gives

One more fish! That’s good. So, an obvious wild guess is

where
is the rate at which we’re catching fish. Let’s see how well this guess works.
If you know how to exponentiate operators, you know to solve this equation:

It’s easy:

Since we start out knowing there are no fish in the net, we have
so with our guess for
we get

But
is the operator of multiplication by
, so
is multiplication by
, so

So, if our guess is right, the probability of having caught
fish at time
is

Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is

We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!
This amounts to guessing that the probability of having caught
fish by time
is

And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.
So on the one hand our wild guess about
was wrong, but on the other hand it was not so far off. We can fix it as follows:

The extra
gives us the fudge factor we need.
So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?
What’s really going on is that
, or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator
using a Hamiltonian like this, probabilities won’t sum to 1! But
is okay. So, we need to think about which Hamiltonians are okay.
In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.
Probability versus quantum theory
Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set
of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:
• In a probabilistic model, we may instead say that the system has a probability
of being in any state
. These probabilities are nonnegative real numbers with

• In a quantum model, we may instead say that the system has an amplitude
of being in any state
. These amplitudes are complex numbers with

Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.
We often want to replace the sums above by integrals. For that we need to replace our set
by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:
• In a probabilistic model, the system has a probability distribution
, which obeys
and

• In a quantum model, the system has a wavefunction
, which obeys

In probability theory, we integrate
over a set
to find out the probability that our systems state is in this set. In quantum theory we integrate
over the set to answer the same question.
We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set
into a measure space such that by definition,

In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.
In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:
• In probability theory, the probability distribution
is a vector in the space

• In quantum theory, the wavefunction
is a vector in the space

You may wonder why I defined
to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.
Now:
• The main thing we can do with elements of
, besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

• The main thing we can with elements of
, besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

This gives a map that’s linear in one slot and conjugate-linear in the other:

First came probability theory with
; then came quantum theory with
. Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on
In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to
and
today.
Stochastic versus unitary operators
Now let’s think about time evolution:
• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator
meaning a linear operator such that
and

• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry
meaning a linear operator such that
In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.
Infinitesimal stochastic versus self-adjoint operators
Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:
• In probability theory we often describe time evolution using a differential equation called the master equation:

whose solution is

• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:

whose solution is

In fact the appearance of
in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.
Let’s guess what properties an operator
should have to make
unitary for all
. We start by assuming it’s an isometry:

Then we differentiate this with respect to
and set
, getting

or in other words

Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they be. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’
. This is called Stone’s Theorem.
But now let’s copy this argument to guess what properties an operator
must have to make
stochastic. We start by assuming
is stochastic, so

and

We can differentiate the first equation with respect to
and set
, getting

for all
.
But what about the second condition,
It seems easier to deal with this in the special case when integrals over
reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.
In this case,
has a basis of ‘Kronecker delta functions’: The Kronecker delta function
vanishes everywhere except at one point
, where it equals 1. Using this basis, we can write any operator on
as a matrix.
As a warmup, let’s see what it means for an operator
to be stochastic in this case. We’ll take the conditions
and

and rewrite them using matrices. For both, it’s enough to consider the case where
is a Kronecker delta, say
.
In these terms, the first condition says

for each column
. The second says

for all
. So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)
Next, let’s see what we need for an operator
to have the property that
is stochastic for all
. It’s enough to assume
is very small, which lets us use the approximation

and work to first order in
. Saying that each column of this matrix sums to 1 then amounts to

which requires

Saying that each entry is nonnegative amounts to

When
this will be automatic when
is small enough, so the meat of this condition is

So, let’s say
is an infinitesimal stochastic matrix if its columns sum to zero and its off-diagonal entries are nonnegative.
I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.
In other words, when we get the details straightened out, any 1-parameter family of stochastic operators

obeying


and continuity:

should be of the form

for a unique ‘infinitesimal stochastic operator’
.
When
is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonnegative. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces
? Someone must have worked it out.
Luckily, for our work on stochastic Petri nets, we only need to understand the case where
is a countable set and our integrals are really just sums. This should be almost like the case where
is a finite set—but we’ll need to take care that all our sums converge.
The moral
Now we can see why a Hamiltonian like
is no good, while
is good. (I’ll ignore the rate constant
since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!
In this example, our set of states is the natural numbers:

The probability distribution

tells us the probability of having caught any specific number of fish.
The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having
fish now becomes the probability of having
fish. So, the probabilities remain nonnegative, and their sum over all
is unchanged. Those two conditions are all we need for a stochastic operator.
Using our fancy abstract notation, these conditions say:

and

So, precisely by virtue of being stochastic, the creation operator fails to be infinitesimal stochastic:

Thus it’s a bad Hamiltonian for our stochastic Petri net.
On the other hand,
is infinitesimal stochastic. Its off-diagonal entries are the same as those of
, so they’re nonnegative. Moreover:

precisely because

You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all!
But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. The lessons we’ve learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams to dozens of different subjects.