Network Theory (Part 5)

Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.

But first, let me quickly sketch why it could be worthwhile.

A Poisson process

Consider this stochastic Petri net with rate constant r:

It describes an inexhaustible supply of fish swimming down a river, and getting caught when they run into a fisherman’s net. In any short time \Delta t there’s a chance of about r \Delta t of a fish getting caught. There’s also a chance of two or more fish getting caught, but this becomes negligible by comparison as \Delta t \to 0. Moreover, the chance of a fish getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.

Problem. Suppose we start out knowing for sure there are no fish in the fisherman’s net. What’s the probability that he has caught n fish at time t?

At any time there will be some probability of having caught n fish; let’s call this probability \psi(n,t), or \psi(n) for short. We can summarize all these probabilities in a single power series, called a generating function:

\Psi(t) = \sum_{n=0}^\infty \psi(n,t) \, z^n

Here z is a formal variable—don’t ask what it means, for now it’s just a trick. In quantum theory we use this trick when talking about collections of photons rather than fish, but then the numbers \psi(n,t) are complex ‘amplitudes’. Now they are real probabilities, but we can still copy what the physicists do, and use this trick to rewrite the master equation as follows:

\frac{d}{d t} \Psi(t) = H \Psi(t)

This describes how the probability of having caught any given number of fish changes with time.

What’s the operator H? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:

a^\dagger \Psi = z \Psi

We can try to apply this to our fish. If at some time we’re 100% sure we have n fish, we have

\Psi = z^n

so applying the creation operator gives

a^\dagger \Psi = z^{n+1}

One more fish! That’s good. So, an obvious wild guess is

H = r a^\dagger

where r is the rate at which we’re catching fish. Let’s see how well this guess works.

If you know how to exponentiate operators, you know to solve this equation:

\frac{d}{d t} \Psi(t) = H \Psi(t)

It’s easy:

\Psi(t) = \mathrm{exp}(t H) \Psi(0)

Since we start out knowing there are no fish in the net, we have

\Psi(0) = 1

so with our guess for H we get

\Psi(t) = \mathrm{exp}(r t a^\dagger) 1

But a^\dagger is the operator of multiplication by z, so \mathrm{exp}(r t a^\dagger) is multiplication by e^{r t z}, so

\Psi(t) = e^{r t z} = \sum_{n = 0}^\infty \frac{(r t)^n}{n!} \, z^n

So, if our guess is right, the probability of having caught n fish at time t is

\frac{(r t)^n}{n!}

Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is

\sum_{n=0}^\infty \frac{(r t)^n}{n!} = e^{r t}

We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!

This amounts to guessing that the probability of having caught n fish by time t is

\frac{(r t)^n}{n!} \, e^{-r t }

And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.

So on the one hand our wild guess about H was wrong, but on the other hand it was not so far off. We can fix it as follows:

H = r (a^\dagger - 1)

The extra -1 gives us the fudge factor we need.

So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?

What’s really going on is that a^\dagger, or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator \exp(t H) using a Hamiltonian like this, probabilities won’t sum to 1! But a^\dagger - 1 is okay. So, we need to think about which Hamiltonians are okay.

In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.

Probability versus quantum theory

Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set X of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:

• In a probabilistic model, we may instead say that the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

\sum_{x \in X} \psi(x) = 1

• In a quantum model, we may instead say that the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

\sum_{x \in X} | \psi(x) |^2 = 1

Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.

We often want to replace the sums above by integrals. For that we need to replace our set X by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:

• In a probabilistic model, the system has a probability distribution \psi : X \to \mathbb{R}, which obeys \psi \ge 0 and

\int_X \psi(x) \, d x = 1

• In a quantum model, the system has a wavefunction \psi : X \to \mathbb{C}, which obeys

\int_X | \psi(x) |^2 \, d x= 1

In probability theory, we integrate \psi over a set S \subset X to find out the probability that our systems state is in this set. In quantum theory we integrate |\psi|^2 over the set to answer the same question.

We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set X into a measure space such that by definition,

\int_X \psi(x) \, dx = \sum_{x \in X} \psi(x)

In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.

In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:

• In probability theory, the probability distribution \psi is a vector in the space

L^1(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)| \, d x < \infty \}

• In quantum theory, the wavefunction \psi is a vector in the space

L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)|^2 \, d x < \infty \}

You may wonder why I defined L^1(X) to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

Now:

• The main thing we can do with elements of L^1(X), besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

\int : L^1(X) \to \mathbb{C}

• The main thing we can with elements of L^2(X), besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

\langle \psi, \phi \rangle = \int_X \overline{\psi}(x) \phi(x) \, d x

This gives a map that’s linear in one slot and conjugate-linear in the other:

\langle - , - \rangle :  L^2(X) \times L^2(X) \to \mathbb{C}

First came probability theory with L^1(X); then came quantum theory with L^2(X). Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on L^3(X). In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to L^1 and L^2 today.

Stochastic versus unitary operators

Now let’s think about time evolution:

• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator

U : L^1(X) \to L^1(X)

meaning a linear operator such that

\int U \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad U \psi \ge 0

• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry

U : L^2(X) \to L^2(X)

meaning a linear operator such that

\langle U \psi , U \phi \rangle = \langle \psi , \phi \rangle

In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.

Infinitesimal stochastic versus self-adjoint operators

Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:

• In probability theory we often describe time evolution using a differential equation called the master equation:

\frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

\psi(t) = \exp(t H)\psi(0)

• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:

i \frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

\psi(t) = \exp(-i t H)\psi(0)

In fact the appearance of i in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.

Let’s guess what properties an operator H should have to make \exp(-i t H) unitary for all t. We start by assuming it’s an isometry:

\langle \exp(-i t H) \psi, \exp(-i t H) \phi \rangle = \langle \psi, \phi \rangle

Then we differentiate this with respect to t and set t = 0, getting

\langle -iH \psi, \phi \rangle +  \langle \psi, -iH \phi \rangle = 0

or in other words

\langle H \psi, \phi \rangle = \langle \psi, H \phi \rangle

Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they be. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’ \exp(-i t H). This is called Stone’s Theorem.

But now let’s copy this argument to guess what properties an operator H must have to make \exp(t H) stochastic. We start by assuming \exp(t H) is stochastic, so

\int \exp(t H) \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0

We can differentiate the first equation with respect to t and set t = 0, getting

\int H \psi = 0

for all \psi.

But what about the second condition,

\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0?

It seems easier to deal with this in the special case when integrals over X reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.

In this case, L^1(X) has a basis of ‘Kronecker delta functions’: The Kronecker delta function \delta_i vanishes everywhere except at one point i \in X, where it equals 1. Using this basis, we can write any operator on L^1(X) as a matrix.

As a warmup, let’s see what it means for an operator

U: L^1(X) \to L^1(X)

to be stochastic in this case. We’ll take the conditions

\int U \psi = \int \psi

and

\psi \ge 0 \quad \implies \quad U \psi \ge 0

and rewrite them using matrices. For both, it’s enough to consider the case where \psi is a Kronecker delta, say \delta_j.

In these terms, the first condition says

\sum_{i \in X} U_{i j} = 1

for each column j. The second says

U_{i j} \ge 0

for all i, j. So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)

Next, let’s see what we need for an operator H to have the property that \exp(t H) is stochastic for all t \ge 0. It’s enough to assume t is very small, which lets us use the approximation

\exp(t H) = 1 + t H + \cdots

and work to first order in t. Saying that each column of this matrix sums to 1 then amounts to

\sum_{i \in X} \delta_{i j} + t H_{i j} + \cdots  = 1

which requires

\sum_{i \in X} H_{i j} = 0

Saying that each entry is nonnegative amounts to

\delta_{i j} + t H_{i j} + \cdots  \ge 0

When i = j this will be automatic when t is small enough, so the meat of this condition is

H_{i j} \ge 0 \; \mathrm{if} \; i \ne j

So, let’s say H is an infinitesimal stochastic matrix if its columns sum to zero and its off-diagonal entries are nonnegative.

I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.

In other words, when we get the details straightened out, any 1-parameter family of stochastic operators

U(t) : L^1(X) \to L^1(X) \qquad   t \ge 0

obeying

U(0) = I

U(t) U(s) = U(t+s)

and continuity:

t_i \to t \quad \implies \quad U(t_i) \psi \to U(t)\psi

should be of the form

U(t) = \exp(t H)

for a unique ‘infinitesimal stochastic operator’ H.

When X is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonnegative. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces X? Someone must have worked it out.

Luckily, for our work on stochastic Petri nets, we only need to understand the case where X is a countable set and our integrals are really just sums. This should be almost like the case where X is a finite set—but we’ll need to take care that all our sums converge.

The moral

Now we can see why a Hamiltonian like a^\dagger is no good, while a^\dagger - 1 is good. (I’ll ignore the rate constant r since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!

In this example, our set of states is the natural numbers:

X = \mathbb{N}

The probability distribution

\psi : \mathbb{N} \to \mathbb{C}

tells us the probability of having caught any specific number of fish.

The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having n fish now becomes the probability of having n+1 fish. So, the probabilities remain nonnegative, and their sum over all n is unchanged. Those two conditions are all we need for a stochastic operator.

Using our fancy abstract notation, these conditions say:

\int a^\dagger \psi = \int \psi

and

\psi \ge 0 \; \implies \; a^\dagger \psi \ge 0

So, precisely by virtue of being stochastic, the creation operator fails to be infinitesimal stochastic:

\int a^\dagger \psi \ne 0

Thus it’s a bad Hamiltonian for our stochastic Petri net.

On the other hand, a^\dagger - 1 is infinitesimal stochastic. Its off-diagonal entries are the same as those of a^\dagger, so they’re nonnegative. Moreover:

\int (a^\dagger - 1) \psi = 0

precisely because

\int a^\dagger \psi = \int \psi

You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all!

But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. The lessons we’ve learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams to dozens of different subjects.

42 Responses to Network Theory (Part 5)

  1. David Corfield says:

    \langle - , - \rangle : L^1(X) \to \mathbb{C}

    should be

    \langle - , - \rangle : L^2(X) \times L^2(X) \to \mathbb{C} ?

    And you have a ‘the the’ somewhere.

  2. John Baez says:

    Thanks! I’ll fix those!

    By the way, I was expecting you to say “we’re not just changing from L^2 to L^1 when we go from quantum theory to probability theory: we’re also changing the rig of scalars from \mathbb{C} to the nonnegative reals”. That’s why I muttered:

    In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

    I just didn’t feel like getting into ‘rigs’ in this supposedly more ‘practical’ article.

  3. David Corfield says:

    Yep. And then there’s your observation that the normalization of distributions can be captured with Durov’s generalized rings.

    Oh, there’s another L^1 for L^2 change needed after “This is described using an isometry…”

    • John Baez says:

      Thanks, fixed.

      Yes, it would be very nice to polish all the math to a glossy sheen using category theory. But in my new persona, I want to focus more on insights that are lurking in various different kinds of applied work, using math as needed to take insights from specific subjects and apply them more generally, without sailing so high into the abstractosphere that only mathematicians and particle physicists pay attention.

      This is part of the ‘green mathematics’ dream: a kind of math where you don’t need to build an enormous particle accelerator to get good ideas from the physical world.

      For example, the literature on chemical reaction networks is full of astounding ideas that deserve to be stated in a way non-chemists can appreciate, because they’re not particularly about chemistry. That’s one direction I want to head soon.

      • David Corfield says:

        But you surely hold that category theory isn’t just for buffing up some mathematics to make it shine, but also for locating and analogizing insights from specific subjects. Of course the abstract methods used in the process can be kept hidden from sight.

      • John Baez says:

        David wrote:

        But you surely hold that category theory isn’t just for buffing up some mathematics to make it shine, but also for locating and analogizing insights from specific subjects.

        Right, my remark came across as overly dismissive, but that’s a sign of internal struggle: I’m fighting against my desire to keep making the analogy between quantum theories and probabilistic theories nicer and nicer using more and more math. Making the analogies nicer is not just cosmetic: it’s important for digger deeper into what’s really going on, which after all is very mysterious. If the difference between probability theory and quantum theory were really just one between L^1 and L^2, you’d have to see what L^3 could do for you… but I really doubt that’s the way to go (though somebody should try). We could dig a bit deeper by thinking about commutative versus noncommutative C*-algebras, or perhaps matrix mechanics over different rigs…

        However, all this would take me in a different direction than I’m trying to go. I want to work through lots of applications of ‘diagrams’ and ‘networks’ in different fields—population biology, systems ecology, chemistry and biochemistry, genomics, queuing theory, statistical inference and machine learning, and so on—and fit them together into a nice framework that’ll help people in any one field profit from the work in other fields. To do this I need help from people in all these different subjects. This makes me keenly aware that the benefits of fancy mathematics come with a painful cost: they limit the number of people who can join the conversation!

        It’s very sad how the people who hold the most powerful tools in mathematics have become so isolated from the people who need them. Sad both for the people who need them, but also for those holding the tools.

        Of course the abstract methods used in the process can be kept hidden from sight.

        I’ve got category theory in my bones by now, but it’s not as if I have notebooks full of category-theoretic reflections on diagrams and networks that I’m hiding from my readers here. Right now I’m mainly struggling to explain the relation between stochastic Petri nets, the chemical master equation and the Feynman diagram approach to quantum field theory.

  4. DavidTweed says:

    Something’s a bit fishy here: are you sure you’re not producing about a Lapin process rather than a Poisson process? (Sorry, incredibly poor joke.)

    The question that leaps up as someone who knows little about physics, does the kind of quantum theory that’s being used here model any kind of non-static equilibirum behaviours in physics (so that this modelling could eventually be used for general process modelling)? (So far nothing you’ve written is seems to be assuming equilibrium state.)

    • John Baez says:

      David wrote:

      … does the kind of quantum theory that’s being used here model any kind of non-static equilibirum behaviours in physics (so that this modelling could eventually be used for general process modelling)?

      Yes!

      More precisely: I’m setting up an analogy between probability theory and quantum theory, which then specializes to an analogy between stochastic Petri nets and the theory of Feynman diagrams.

      Stochastic Petri nets are a very general tool for describing stochastic processes involving bunches of things of various kinds that can turn into other bunches of things. Feynman diagrams are a very general tool for describing quantum processes involving bunches of things of various kinds that turn into other bunches of things. The only really big differences are the differences I’ve described here: L^1 versus L^2, stochastic operators versus unitary operators, etcetera. But these differences are so systematic that it’s easy to take wads of stuff from one theory and translate it over to the other—as I’ll soon explain.

      In neither case is there any restriction to equilibrium states. Indeed, Feynman diagrams are most commonly used for computing the amplitude that two elementary particles will turn into something else when you smash them into each other in a particle accelerator—that’s about as far from equilibrium as you can get!

      • What if one tries to use square roots of measures. Wouldn’t that make it much closer to QM ?

        • John Baez says:

          You mean use square roots of finite measures instead of L^2 functions as a formulation of quantum mechanics? Yes, that would be fine. But then I’d want to use a space finite measures instead of L^1 with respect to a fixed measure… to keep the analogy nice. And I’m trying to avoid polishing the math to such a glossy shine that nobody but mathematicians can stand the glare. I’m mainly trying to generalize some quantum mechanics techniques (annihilation and creation operators, Feynman diagrams) to stochastic Petri nets, and here the measure space in question is always a countable set with counting measure. So, my more general remarks about measure spaces are already looking ahead a bit too far…

    • John Baez says:

      David wrote:

      … are you sure you’re not producing about a Lapin process rather than a Poisson process? (Sorry, incredibly poor joke.)

      It’s got the seeds of a good joke in it. If Jacob and I ever write up these blog entries as more formal ‘lecture notes’, we’ll change ‘rabbit’ to ‘fish’ in the main example here, so that catching fish will be a Poisson process. As long as we don’t draw undue attention to that, it should be funny.

      (I think puns are fun when they seem to arise effortlessly, but provoke groans when they require an obvious stretch. If so, the trick is to put a lot of work into the setup.)

  5. Tim van Beek says:

    John wrote:

    Borrowing a trick from quantum theory, we can summarize all these probabilities in a single power series…

    I’d like to mention that this is a well known trick – the introduction of a generating function – and that there is a book called “generatingfunctionology” by Herbert Wilf free for download from the author’s webpage.

    • John Baez says:

      I love that stuff.

      I taught a year-long course on generating functions and their use in quantum theory, starting here. The key idea was to think of generating functions as decategorified versions of ‘stuff types’. You could say what I’m doing now is warming up to apply some of the same ideas to classical probability theory.

  6. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces X? Someone must have worked it out.

    In One-Parameter Semigroups of Positive Operators, Springer-Verlag, Berlin 1986, ISBN 978-3-540-16454-8 the authors investigate C0-semigroups on Banach lattices (e.g. L_1(X)). Things are more complicated than in the Hilbert space situation and to get the desired characterization one most likely has to apply the standard Hille-Yosida generation theorem plus some condition on the resolvent depending on what exactly you mean by “stochastic”.

  7. Florifulgurator says:

    By chance today is one of the rare happy days when I made it to the math library…

    But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces X? Someone must have worked it out.

    For (un)bounded positive symmetric operators H on L^2 there are the 2 Beurling-Deny criteria, telling in terms of the quadratic form \langle H \cdot, \cdot\rangle when e^{-tH} is positivity preserving resp. L^\infty contractive. See M. Reed, B. Simon, Methods of Modern Math. Physics Vol. IV, Thm. XIII.50 resp. XIII.51 (p. 209-211). Both criteria can be amalgamated into one characterising L^2 generators of Markovian semigroups. This criterion is the cornerstone of the theory of “Dirichlet forms”, kind of generalizing Brownian motion on Riemannian manifolds to less smooth spaces. See M. Fukushima, Y. Oshima, M. Takeda, Dirichlet Forms and Symmetric Markov Processes, pp. 3-5.

    This stuff is much related to Kato’s inequality (cf. Reed, Simon, Vol. II, Thm. X.27, p.183 and following pages for some real Schrödinger operator stuff). Also known as “diamagnetic inequality” (B. Simon) for magnetic Schrödinger stuff, or “semigroup domination” for estimations of Hodge heat flow in differential geometry.

    • John Baez says:

      Thanks a million, Florifulgurator!

      I’m so glad you pointed out that Dirichlet forms are relevant here! I should have realized that. I first learned about Dirichlet forms when working on an earlier phase of my ‘networks’ project: electrical circuits made of resistors. But there I was mainly interested in Dirichlet forms on finite-dimensional vector spaces, so I had them filed in a different mental compartment than this ‘semigroups on L^1(X)‘ business.

      This could be a hint of something big: two different aspects of my networks project (electrical circuits made of resistors, and stochastic Petri nets) turn out to require the same math: Dirichlet forms.

      Wow, now I think I see how they’re related! Stupidly, that had never occurred to me! This is exactly the sort of unification I’m looking for in this game. It keeps happening…

      Here’s a bit about Dirichlet forms, from week297:

      Last week I discussed electrical circuits made of (linear) resistors and "grounds" – places where wires touch an object whose electrostatic potential is zero. I want to fill in some missing pieces today.

      Suppose we have such a circuit with n wires dangling out of it. I’ve been calling these "inputs" and "outputs" – but today I don’t care which ones are inputs and which ones are outputs, so let’s call them all "terminals".

      We saw last time that our circuit gives a function

      Q: \mathbb{R}^n \to \mathbb{R}

      This tells you how much power the circuit uses as a function of the electrostatic potential at each terminal.

      It’s pretty easy to see that Q is a "quadratic form", meaning that

      Q(\phi) = \sum_{i,j} Q_{ij} \phi_i \phi_j

      for some matrix Q_{ij}, which we can assume is symmetric. And it’s easy to see that Q is "nonnegative", meaning

      Q(\phi) \ge 0

      I wildly guessed that every nonnegative quadratic form comes from a circuit made of resistors and grounds. Since then I’ve learned a few things, thanks to Ben Tilly and Tom Ellis.

      For starters, which nonnegative quadratic forms do we get from circuits built only from resistors? We certainly don’t get all of them. For example, if n = 2, every circuit built from just resistors has

      Q(\phi) = c (\phi_1 - \phi_2)^2

      for some nonnegative number c. So, we’ll never get this quadratic form:

      Q(\phi) = (\phi_1 + \phi_2)^2

      even though it’s nonnegative. In general, for any n, we can get a lot of quadratic forms just by connecting each terminal to each other with a resistor. Such circuits give precisely these quadratic forms:

      Q(\phi) = \sum_{i,j} c_{ij} (\phi_i - \phi_j)^2

      where the numbers c_{ij} are nonnegative. We can assume without loss of generality that c_{ii} = 0. The numbers c_{ij} are reciprocals of resistances, so we’re allowing resistors with infinite resistance, but not with zero resistance.

      It turns out that quadratic forms of the above type are famous: they’re called "Dirichlet forms". People have characterized them in lots of ways. Here’s one: they’re the nonnegative quadratic forms that vanish when \phi is constant:

      \forall i, j \; \; \phi_i = \phi_j \quad \implies Q(\phi) = 0

      and also satisfy the "Markov property":

      Q(\phi) \ge Q(\psi)

      when \psi_i is the minimum of \phi_i and 1. This characterization is Proposition 1.7 here:

      5) Christophe Sabot, Existence and uniqueness of diffusions on finitely ramified self-similar fractals, Section 1: Dirichlet forms on finite sets and electrical networks, Annales Scientifiques de l’École Normale Supérieure, Sér. 4, 30 (1997), 605-673. Available at http://www.numdam.org/numdam-bin/item?id=ASENS_1997_4_30_5_605_0

      Sabot doesn’t prove this result, which he considers "well known". Instead, he points us to this book, which is not only fun to read, but also free:

      6) P. G. Doyle and J. L. Snell, Random Walks and Electrical Circuits, Mathematical Association of America, 1984. Also available at http://www.math.dartmouth.edu/~doyle/

      You may wonder what random walks and diffusions on fractals have to do with electrical circuits! The idea is that we can take a limit of electrical circuits that get more and more complicated and get a fractal. The electrical conductivity of this fractal can be reinterpreted as heat conductivity, using the analogies described back in "week289". And then we can study the heat equation on this fractal. This equation says how heat diffuses with the passage of time.

      But there’s nothing special about heat. We can use the heat equation to describe the diffusion of just about anything. We could even use it to describe the diffusion of tiny drunken men who stumble around aimlessly on our fractal! And that’s where "random walks" come in.

      It turns out that in situations like this, the heat equation is completely determined by a quadratic form called a "Dirichlet form". But it’s not a quadratic form on Rn anymore: it’s a quadratic form on a space of real-valued functions on our fractal.

      In fact Dirichlet forms were first studied, not for finite sets or fractals, but for nice regions in Euclidean space – the sort of regions you’d normally consider when studying the heat equation. In this case the Dirichlet form arises from the Laplacian:

      Q(\phi) = - \int \phi \nabla^2 \phi

      where \phi is a function on our region. The moral is that we should think of any Dirichlet form as a generalized Laplacian!

      There’s a huge literature on Dirichlet forms. Most of it focuses on analytical subleties that don’t matter for our pathetically simple examples. For a little taste, try this review of two books on Dirichlet forms:

      7) Review by Daniel Stroock, Bull. Amer. Math. Soc. 33 (1996) 87-92. Also available at
      http://www.ams.org/journals/bull/1996-33-01/S0273-0979-96-00617-9/

      Among other things, he mentions a simpler characterization of Dirichlet forms. We’re only considering quadratic forms

      Q: \mathbb{R}^n \to \mathbb{R}

      and it turns out such a form is Dirichlet iff

      Q(\phi) \ge Q(\psi)

      whenever

      |\phi_i - \phi_j| \ge |\psi_i - \psi_j|

      for all i,j. It’s a fun exercise to see that this is equivalent to our previous characterization. And there’s a simple physical idea behind this one: a circuit made of resistors will use more power when the potentials at different terminals differ by bigger amounts!

      Okay… I’m digressing a bit. Let’s get back on track.

      • Florifulgurator says:

        Stroock’s re/overview is superb. (Back then I was TA/RA in Germany. I handed it out as introductory reading for my seminar on Dirichlet forms.)

      • John Baez says:

        You mentioned a book Dirichlet Forms and Symmetric Markov Processes, and that reminded me: many of the Markov processes I’m interested in these days (e.g. those described by stochastic Petri nets) are nonsymmetric. Is there a good generalization of Dirichlet forms to handle those? Of course when working on a manifold we can consider Brownian motion plus drift, with the drift given by a vector field… but what if we’re just on an arbitrary measure space?

        • Florifulgurator says:

          Oh yeah, I forgot about the symmetry question while happily browsing the old books again after a decade… – what is a “symmetric Petri net” … ?

          I’ve never cared about the nonsymmetric case. But others did:

          • Ma Z.M. and Röckner M.: Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer-Verlag, Berlin-Heidelberg-New York, 1992.

          The field of generalized Dirichlet forms seems still quite active. Next time I’m at the library I’ll start catch up a little and first read this (or what google books hides of it): S. Albeverio, Theory of Dirichlet Forms and applications, in Lecture Notes in Math 1816 (Ecole d’été de probabilités de Saint-Flour XXX, 2000). Has a huge list of references on all kinds of variations of this theme.

        • John Baez says:

          Florifulgurator wrote:

          Oh yeah, I forgot about the symmetry question while happily browsing the old books again after a decade… – what is a “symmetric Petri net” … ?

          Nice question! I’m not sure it’s a standard term, but it has an obvious meaning:

          A symmetric Petri net is one where every transition is reversible.

          A bit more precisely: for each transition with a given list of inputs and a given list of outputs, there’s a transition with those inputs as its outputs, and those outputs as its inputs. We call this other transition the reverse of the original transition.

          More precisely still: a Petri net consists of a set S of states, a set T of transitions, and input and output functions

          i: S \times T \to \mathbb{N}

          o: S \times T \to \mathbb{N}

          saying for each transition how many times each state appears as input or output. Every Petri net has a reversed Petri net where we switch the functions i and o. A Petri net is symmetric if it’s isomorphic to its reversed version. Or, much better, if it’s equipped with an isomorphism to its reversed version. Then each transition has another transition called its reverse. Then we can define what it means for a stochastic Petri net to be symmetric: now the rate constant of each transition must equal that of its reverse.

          Any stochastic Petri net gives rise to a Markov process. If the stochastic Petri net is symmetric, so is its Markov process.

          I’ve never cared about the nonsymmetric case. But others did:

          • Ma Z.M. and Röckner M.: Introduction to the Theory of (Non-Symmetric) Dirichlet Forms, Springer-Verlag, Berlin-Heidelberg-New York, 1992.

          Great! The wonderful thing about math is that there are tons of people working on tons of obscure-sounding subjects, so that when you get interested in something you usually find an entire book has been written about it already! Books that seem completely boring suddenly become fascinating when you need to know what’s in them.

  8. DavidTweed says:

    I’ve been trying to unpick some of the notation (to figure out something that’s probably obvious if you’ve studied how this setup works for quantum physics). Here’s my understanding, in case it is in fact wrong. Firstly, lets change the “formal power series variable” used in the first section from x to y just to be a bit clearer. The statement

    \psi \ge 0

    is at this point viewing \psi(x) as a function of values x in the space X (not a function of time in this particular bit) into the complex numbers, which function must be non-negative for all arguments. Now I think that in this case the space is the discrete space X= \{y^n | n=0 \dots\}, ie, powers of the formal variable. So while this is a statement about a function, it’s effectively a statement about individual coefficients in the formal power series that says that probabilities are non-negative. (This combined with the integral condition also gives probabilities must be at most 1.)

    If I’m understanding correctly, \psi(x) means essentially coefficentOf(x,\psi), whilst \psi(t) denotes essentially replacing the formal variable y with t y', where y' is a new formal variable. (I’m used to generating functions and pgf’s, but analytic time varying pgf’s are new country to me.)

    • John Baez says:

      I see what’s confusing you. Your guesses look right. There’s a general situation and a particular situation here.

      • In the general situation, we have a set of states X. The probability distribution \psi is a function assigning to each state x \in X a probability \psi(x). Since probabilities are never negative, we have \psi(x) \ge 0 for all x \in X, or

      \psi \ge 0

      for short.

      • In the particular situation, we’re talking about rabbits. Here X is the set of natural numbers:

      X = \{ 0,1,2,3,\dots \}

      and the natural number n means ‘the state of having caught n rabbits’. In this particular situation, I wrote \psi_n to mean the probability of having caught n rabbits. And in this particular situation, I defined a power series in some formal variable x—having nothing to do with the x mentioned above!—whose coefficients are the probabilities \psi_n. This is called a ‘generating function’. Just to confuse you further, I also called this generating function \psi:

      \psi(x) = \sum_{n = 0}^\infty \psi_n x^n

      There is of course no need for this function to be nonnegative.

      So, my use of \psi and x was quite different in the general situation and the particular situation. I’m so used to rapidly switching between them that I never noticed how confusing it could be! I’ll fix this now.

      … but analytic time varying pgf’s are new country to me.

      I’ll guess that by ‘pgf’ you mean a ‘probability-generating function’, namely this thing:

      \psi(x) = \sum_{n = 0}^\infty \psi_n x^n

      Yes, this thing and everything else I’m calling \psi will in general be a function of time as well.

      Believe it or not, I’d never heard of ‘probability-generating functions’ before, though I’ve spent a lot of time on generating functions. Those who don’t know history are condemned to repeat it. So, thanks for pointing out that buzzword!

    • John Baez says:

      Okay, David, I’ve tried to make the article clearer. It was shocking to realize how incredibly confusing it would be to someone who couldn’t read my mind! Now:

      • In the general situation we have a set of states X and a probability distribution \psi which assigns to each state x \in X a probability \psi(x).

      • In the particular situation of rabbits we have X = \mathbb{N}, and I write \psi(n) for the probability of having n rabbits. I’m switching to the variable n instead of x here for purely traditional rather than conceptual reasons. I see the authors of Wikipedia article probability-generating function avoid this sentimental attitude: they use x to stand for a natural number! Years of conditioning have made this unpalatable to me.

      More importantly, starting from \psi in this particular situation I define a probability-generating function

      \Psi = \sum_{n = 0}^\infty \psi(n) z^n

      using a new letter z to distinguish the variable here from any other variable, and a new font \Psi to distinguish the generating function from the probability distribution \psi.

      Thanks!

      • DavidTweed says:

        Thanks for clarifying things John. The article was actually already a very clear overview, it just required a little bit of effort to figure out the fine details. It’s a lot clearer in the fine details now.

      • DavidTweed says:

        I probably ought to point out for other people that part of my confusion was that in probability one often uses \Psi by looking at its various derivatives (which are related to the distribution’s moments) in z so it is very much being viewed as a function, not just a neat way of writing an inifinite vector, which is why I initially thought that interpretation might have been the function being considered. (I imagine this may arise later in the story.)

        • Nico Poppelier says:

          I’m trying to understand this new article, but get stuck at the start, where you define the generating function? What is z? And shouldn’t you write \Psi(z, t) = … in the above, instead of \Psi(t) = ...?

        • John Baez says:

          David wrote:

          (I imagine this may arise later in the story.)

          Indeed! Derivatives of the generating function

          \Psi(z) = \sum_{n = 0}^\infty \psi(n) z^n

          will soon become very important in story here. In quantum field theory, the operator of differentation is called the annihilation operator:

          a \Psi = \frac{d}{dz} \Psi

          and it’s used hand-in-hand with the creation operator we’ve already seen:

          a^\dagger \Psi = z \Psi

          We’ll see that they play similar roles in the study of stochastic Petri nets: the creation operator describes how something appears as an output of some transition, while the annihilation operator can be used to describe how something gets used as an input.

          We can compute moments using the number operator

          N = a^\dagger a.

          For example, if the probability of having n things is \psi(n), the mean value of the number of things is

          \sum_{n=0}^\infty n \psi(n)

          But

          \begin{array}{ccl} N \Psi &=& z \frac{d}{d z} \sum_{n = 0}^\infty \psi(n) z^n \\ \\ &=& \sum_{n = 0}^\infty n \psi(n) z^n \end{array}

          so the mean value of the number of things is obtained by evaluating this at z = 1.

          Similarly, the kth moment of the number of things is

          \sum_{n=0}^\infty n^k \psi(n)

          which is N^k \Psi evaluated at 1, since

          \begin{array}{ccl} N^k \Psi &=& (z \frac{d}{d z})^k \sum_{n = 0}^\infty \psi(n) z^n \\  \\ &=& \sum_{n = 0}^\infty n^k \psi(n) z^n \end{array}

          In general, this ‘evaluation at 1’ business is just a way of computing the sum of the coefficients of a power series:

          \Psi = \sum_{n=0}^\infty \psi(n) z^n  \; \implies \; \Psi(1) = \sum_{n=0}^\infty \psi(n)

          Indeed, \Psi(1) is the same as what I was calling

          \int \psi

          in my blog article: in this situation the ‘integral’ of \psi means the sum of \psi(n) over all the possibilities n = 0,1,2,\dots.

          So, in very terse notation which I haven’t quite defined yet but perhaps you can understand anyway, the kth moment of the number of things can be written as

          \int N^k \psi

          and this will come in handy.

          To me all these tricks seem like mutant versions of familiar tricks in the quantum theory of the harmonic oscillator. To you they may seem like oddly worded restatements of familiar probability theory tricks!

        • John Baez says:

          Nico wrote:

          I’m trying to understand this new article, but get stuck at the start, where you define the generating function. What is z?

          It’s a ‘formal variable’, or ‘indeterminate’:

          \Psi(t) = \sum_{n = 0}^\infty \psi(n,t) z^n

          is a time-dependent formal power series.

          You can think of z as a real number if that helps, but that’s not quite right. The idea of a formal power series is that we don’t care if it converges; it’s just an abstract expression of the form

          a_0 + a_1 z + a_2 z^2 + \cdots

          where the coefficients a_i are (in this article) real numbers. So, for example,

          \sum_{n=0}^\infty 2^{2^n} z^n

          is a perfectly acceptable formal power series even though it doesn’t converge when we try to set z to some real number (other than 0).

          If you haven’t seen formal power series, they may seem strange, but they’re a very common and useful trick. You can think of a formal power series

          a_0 + a_1 z + a_2 z^2 + \cdots

          as nothing but a cute way of packaging the infinite sequence a_0, a_1, a_2, \dots. Any sequence gives a formal power series, called its generating function. Herbert Wilf wrote:

          A generating function is a clothesline on which we hang up a sequence of numbers for display.

          But generating functions are extremely useful, since operations like addition, multiplication, and differentiation of formal power series are efficient ways to manipulate sequences.

          In an earlier comment, Tim van Beek pointed out Wilf’s book, which is free online:

          • Herbert Wilf, Generatingfunctionology.

          It’s a great book!

          And shouldn’t you write \Psi(z, t) = … in the above, instead of \Psi(t) = ...?

          Maybe, but not necessarily. Note that

          \sum_{n = 0}^\infty \psi(n,t) z^n

          isn’t a function of z. I’s just a formal power series in the variable z which is a function of time, t. It’s often handy to pretend that formal power series are functions, and if I were in that mood I’d write

          \Psi(z,t) = \sum_{n = 0}^\infty \psi(n,t) z^n

          But I preferred to honestly admit that \Psi is a formal-power-series-valued function of time, and write

          \Psi(t) = \sum_{n = 0}^\infty \psi(n,t) z^n

          For each time t, \Psi(t) is a formal power series in the formal variable z.

          Actually I agonized over these issues quite a bit, but I never considered writing \Psi(z,t). I was agonizing over whether to write \Psi(t) or simply \Psi. I like short notations, and physicists (and some other people) often don’t explicitly write the variables that a function depends on. But I decided to write \Psi(t) because some of my remarks concerned a time-dependent formal power series while other more general remarks, like the definition of creation operator:

          a^\dagger \Psi = z \Psi

          apply equally well to time-independent and time-dependent ones—so there I used \Psi.

  9. John F says:

    Sorry to be more dense than usual. Can’t the probabilities of the quantum models simply obey the same equations as the probabilistic model? Then the difference is that the quantum phases are important, and have interesting equations, while the probabilistic phases are boring.

    • John Baez says:

      John F wrote:

      Can’t the probabilities of the quantum models simply obey the same equations as the probabilistic model?

      That seems tough. In a probabilistic model time evolution is linear in the probabilities. In a quantum model time evolution is linear in the amplitudes, and we square the absolute value of an amplitude to get a probability.

      So, the challenge is to reconcile our requirement that time evolution be linear in both probabilities and amplitudes with the further requirement that probabilities depend nonlinearly on amplitudes!

      More mathematically, what I’m saying is this. In a quantum model of the sort I’m talking about, time evolution of amplitudes is described by linear maps

      U(t): L^2(X) \to L^2(X)

      In a probabilistic model, time evolution of probability distributions is described by linear maps

      V(t): L^1(X) \to L^1(X)

      Taking the square of the absolute values gives a map from L^2 to L^1 which sends amplitudes to probabilities. Let’s call it F:

      \begin{array}{ccl} F : L^2(X) &\to& L^1(X) \\ \psi &\mapsto& |\psi|^2 \end{array}

      However, it’s hard to achieve

      V(t) F = F U(t)

      except for some very special cases. The only case I know is when both U(t) and V(t) come from a 1-parameter family of measure-preserving maps

      \varphi_t : X \to X

      from our measure space to itself. It works like this:

      (U(t)\psi)(x) = \psi(\varphi_{-t}(x))

      (V(t)\psi)(x) = \psi(\varphi_{-t}(x))

      These maps \varphi_t describe a deterministic time evolution of states in X. The probability functions and wavefunctions just get dragged along by this deterministic time evolution. So, I know how to reconcile probabilistic and quantum models in the way you seem to be wanting… but only in the rather dull case where the dynamics is really deterministic!

      This trick, with X taken to be a classical ‘phase space’, can be used to make classical mechanics look superficially like quantum mechanics. It’s called ‘Koopmanism’ and it’s used in ergodic theory.

  10. […] Last time I told you what happens when we stand in a river and catch fish as they randomly swim past. Let me remind you of how that works. But today let’s use rabbits. […]

  11. Giampiero Campa says:

    I think you forgot the keyword “latex” in $a^\dagger$ in the text 3 formulas before the end of the article.

  12. […] We need to recall the analogy we began sketching in Part 5, and push it a bit further. The idea is that stochastic mechanics differs from quantum mechanics in two ways […]

  13. Arjun Jain says:

    Sir,

    Sorry if these questions are too elementary:

    1. What is that way to make any set X into a measure space? Also, why do you say that integrals are more general than sums?

    2. What is the meaning of conjugate linear?

    3. Is there any difference between ‘maps’ and ‘functions’?

    4. How do we know that L^2 is a vector space?
    |a+b|^2\le (|a|^2+|b|^2+2|ab|)– the first two terms are finite according to the definition of $L^2$, but how are we sure that the inner product will always be finite? Is there some other way to prove this?

    5. The net-fish process in the beginning- The corresponding rate equation will be d(Nfish)/dt=r as the only species is fish and that does not appear as an input, so it’s power will be 0- Right?

    • John Baez says:

      Arjun wrote:

      1. What is that way to make any set X into a measure space?

      Use counting measure.

      Also, why do you say that integrals are more general than sums?

      An integral with respect to counting measure is a sum. It sounds like you want to learn a bit about measure theory and Lebesgue integration, which is the theory of integration on general measure spaces.

      2. What is the meaning of conjugate linear?

      Like most math terms, this is defined in Wikipedia. When I want to learn the meaning of a math term, I type it into Google, and usually a Wikipedia article shows up near the very top of the choices.

      3. Is there any difference between ‘maps’ and ‘functions’?

      It depends on context. A ‘map’ is a function preserving whatever structure is on hand. For example if we talked about a map between topological spaces, we’d mean a continuous function. If we talked about a map between manifolds, we’d mean a smooth function. But a map between sets means a function.

      (To add to the confusion, in the French tradition ‘function’ means a function taking values in the real or complex numbers, while a
      general function is called a ‘map’.)

      4. How do we know that L^2 is a vector space? — |a+b|^2\le (|a|^2+|b|^2+2|ab|) — the first two terms are finite according to the definition of L^2, but how are we sure that the inner product will always be finite? Is there some other way to prove this?

      The Cauchy–Schwarz inequality holds for L^2 and it implies the triangle inequality, which proves L^2 is a vector space.

      5. The net-fish process in the beginning — The corresponding rate equation will be d(Nfish)/dt=r as the only species is fish and that does not appear as an input, so its power will be 0 — Right?

      Right. Earlier you were asking about transitions with no inputs. I pointed you to this example.

  14. Arjun Jain says:

    1. Why should the Us be linear operators?

    2. ” It seems easier to deal with this in the special case when integrals over X reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.

    In this case, L^1(X) has a basis of ‘Kronecker delta functions’…”

    – Can you help me understand this better?

    3. I did not understand the paragraphs before the moral paragraph:

    “…The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.

    In other words, when we get the details straightened out, any 1-parameter family of stochastic operators…

    …Someone must have worked it out.”

    4. The whole paragraph on Infinitesimal stochastic versus self-adjoint operators is not very clear to me, though I have read it 6-7 times.

    • John Baez says:

      1. Why should the U be linear operators?

      I’ll only talk about the case where U is stochastic, because I don’t want to explain why quantum mechanics is linear. I’ll just talk about why probability theory is linear.

      Suppose we’re playing a game like this. If you give me $1, I’ll give you $1 with probability 90%, and nothing with probability 10%. If you give me nothing, I will give you nothing with probability 90%, and $1 with probability 10%.

      Suppose you give me $1 with probability \psi_1 and nothing with probability \psi_2 = 1 - \psi_1. After you play this game, you will have $1 with probability \phi_1 and nothing with probability \phi_2.

      Calculate the vector \phi = (\phi_1, \phi_2) as a function of \psi = (\psi_1, \psi_2). In other words, write \phi = U(\psi) and tell me what the function U is.

      If you do this example, it will help you see why we want U to be linear.

      Can you help me understand this better?

      You have to ask me a specific question for me to help you understand something better.

      I did not understand the paragraphs before the moral paragraph…

      Again, that’s not a question. It’s a statement.

      I hope you get the basic idea: if we have a bunch of linear operators U(t) = \exp(t H) we call H the infinitesimal generator of those operators. I want to know which operators H make U(t) = \exp(t H) be stochastic for all t \ge 0. I say a lot more about this in Part 12.

      The whole paragraph on Infinitesimal stochastic versus self-adjoint operators is not very clear to me, though I have read it 6-7 times.

      Again, that’s not a question. Have you tried doing some computations with examples? There are some things we can only learn by doing. We’ll do lots of calculations with infinitesimal stochastic operators in this course; for self-adjoint operators you’d do lots of calculations in a course on quantum mechanics.

      • Arjun Jain says:

        Sorry for being so vague in my questions/statements- would not happen again. The paragraph is much clearer now.

        Why do you say that ‘It is enough to assume t is very small’, when discussing about the nature of H for exp(tH) to be stochastic for all t\ge 0? After that, how can we assume that the infinitesimal stochastic operator H will work to make exp(tH) stochastic for all times? Has it got something to do with these conditions on U:
        U(t)U(s) = U(t+s) and t_i \to t \quad \implies \quad U(t_i) \psi \to U(t)\psi.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

This site uses Akismet to reduce spam. Learn how your comment data is processed.