Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.
But first, let me quickly sketch why it could be worthwhile.
A Poisson process
Consider this stochastic Petri net with rate constant :
It describes an inexhaustible supply of fish swimming down a river, and getting caught when they run into a fisherman’s net. In any short time there’s a chance of about of a fish getting caught. There’s also a chance of two or more fish getting caught, but this becomes negligible by comparison as . Moreover, the chance of a fish getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.
Problem. Suppose we start out knowing for sure there are no fish in the fisherman’s net. What’s the probability that he has caught fish at time ?
At any time there will be some probability of having caught fish; let’s call this probability , or for short. We can summarize all these probabilities in a single power series, called a generating function:
Here is a formal variable—don’t ask what it means, for now it’s just a trick. In quantum theory we use this trick when talking about collections of photons rather than fish, but then the numbers are complex ‘amplitudes’. Now they are real probabilities, but we can still copy what the physicists do, and use this trick to rewrite the master equation as follows:
This describes how the probability of having caught any given number of fish changes with time.
What’s the operator ? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:
We can try to apply this to our fish. If at some time we’re 100% sure we have fish, we have
so applying the creation operator gives
One more fish! That’s good. So, an obvious wild guess is
where is the rate at which we’re catching fish. Let’s see how well this guess works.
If you know how to exponentiate operators, you know to solve this equation:
Since we start out knowing there are no fish in the net, we have
so with our guess for we get
But is the operator of multiplication by , so is multiplication by , so
So, if our guess is right, the probability of having caught fish at time is
Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is
We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!
This amounts to guessing that the probability of having caught fish by time is
And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.
So on the one hand our wild guess about was wrong, but on the other hand it was not so far off. We can fix it as follows:
The extra gives us the fudge factor we need.
So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?
What’s really going on is that , or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator using a Hamiltonian like this, probabilities won’t sum to 1! But is okay. So, we need to think about which Hamiltonians are okay.
In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.
Probability versus quantum theory
Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:
• In a probabilistic model, we may instead say that the system has a probability of being in any state . These probabilities are nonnegative real numbers with
• In a quantum model, we may instead say that the system has an amplitude of being in any state . These amplitudes are complex numbers with
Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.
We often want to replace the sums above by integrals. For that we need to replace our set by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:
• In a probabilistic model, the system has a probability distribution , which obeys and
• In a quantum model, the system has a wavefunction , which obeys
In probability theory, we integrate over a set to find out the probability that our systems state is in this set. In quantum theory we integrate over the set to answer the same question.
We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set into a measure space such that by definition,
In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.
In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:
• In probability theory, the probability distribution is a vector in the space
• In quantum theory, the wavefunction is a vector in the space
You may wonder why I defined to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.
• The main thing we can do with elements of , besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:
• The main thing we can with elements of , besides the besides the things we can do with vectors in any vector space, is take the inner product of two:
This gives a map that’s linear in one slot and conjugate-linear in the other:
First came probability theory with ; then came quantum theory with . Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to and today.
Stochastic versus unitary operators
Now let’s think about time evolution:
• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator
meaning a linear operator such that
• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry
meaning a linear operator such that
In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.
Infinitesimal stochastic versus self-adjoint operators
Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:
• In probability theory we often describe time evolution using a differential equation called the master equation:
whose solution is
• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:
whose solution is
In fact the appearance of in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.
Let’s guess what properties an operator should have to make unitary for all . We start by assuming it’s an isometry:
Then we differentiate this with respect to and set , getting
or in other words
Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they be. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’ . This is called Stone’s Theorem.
But now let’s copy this argument to guess what properties an operator must have to make stochastic. We start by assuming is stochastic, so
We can differentiate the first equation with respect to and set , getting
for all .
But what about the second condition,
It seems easier to deal with this in the special case when integrals over reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.
In this case, has a basis of ‘Kronecker delta functions’: The Kronecker delta function vanishes everywhere except at one point , where it equals 1. Using this basis, we can write any operator on as a matrix.
As a warmup, let’s see what it means for an operator
to be stochastic in this case. We’ll take the conditions
and rewrite them using matrices. For both, it’s enough to consider the case where is a Kronecker delta, say .
In these terms, the first condition says
for each column . The second says
for all . So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)
Next, let’s see what we need for an operator to have the property that is stochastic for all . It’s enough to assume is very small, which lets us use the approximation
and work to first order in . Saying that each column of this matrix sums to 1 then amounts to
Saying that each entry is nonnegative amounts to
When this will be automatic when is small enough, so the meat of this condition is
So, let’s say is an infinitesimal stochastic matrix if its columns sum to zero and its off-diagonal entries are nonnegative.
I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.
In other words, when we get the details straightened out, any 1-parameter family of stochastic operators
should be of the form
for a unique ‘infinitesimal stochastic operator’ .
When is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonnegative. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces ? Someone must have worked it out.
Luckily, for our work on stochastic Petri nets, we only need to understand the case where is a countable set and our integrals are really just sums. This should be almost like the case where is a finite set—but we’ll need to take care that all our sums converge.
Now we can see why a Hamiltonian like is no good, while is good. (I’ll ignore the rate constant since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!
In this example, our set of states is the natural numbers:
The probability distribution
tells us the probability of having caught any specific number of fish.
The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having fish now becomes the probability of having fish. So, the probabilities remain nonnegative, and their sum over all is unchanged. Those two conditions are all we need for a stochastic operator.
Using our fancy abstract notation, these conditions say:
So, precisely by virtue of being stochastic, the creation operator fails to be infinitesimal stochastic:
Thus it’s a bad Hamiltonian for our stochastic Petri net.
On the other hand, is infinitesimal stochastic. Its off-diagonal entries are the same as those of , so they’re nonnegative. Moreover:
You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all!
But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. The lessons we’ve learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams to dozens of different subjects.