Network Theory (Part 11)

jointly written with Brendan Fong

Noether proved lots of theorems, but when people talk about Noether’s theorem, they always seem to mean her result linking symmetries to conserved quantities. Her original result applied to classical mechanics, but today we’d like to present a version that applies to ‘stochastic mechanics’—or in other words, Markov processes.

What’s a Markov process? We’ll say more in a minute—but in plain English, it’s a physical system where something hops around randomly from state to state, where its probability of hopping anywhere depends only on where it is now, not its past history. Markov processes include, as a special case, the stochastic Petri nets we’ve been talking about.

Our stochastic version of Noether’s theorem is copied after a well-known quantum version. It’s yet another example of how we can exploit the analogy between stochastic mechanics and quantum mechanics. But for now we’ll just present the stochastic version. Next time we’ll compare it to the quantum one.

Markov processes

We should and probably will be more general, but let’s start by considering a finite set of states, say X. To describe a Markov process we then need a matrix of real numbers H = (H_{i j})_{i, j \in X}. The idea is this: suppose right now our system is in the state i. Then the probability of being in some state j changes as time goes by—and H_{i j} is defined to be the time derivative of this probability right now.

So, if \psi_i(t) is the probability of being in the state i at time t, we want the master equation to hold:

\displaystyle{ \frac{d}{d t} \psi_i(t) = \sum_{j \in X} H_{i j} \psi_j(t) }

This motivates the definition of ‘infinitesimal stochastic’, which we recall from Part 5:

Definition. Given a finite set X, a matrix of real numbers H = (H_{i j})_{i, j \in X} is infinitesimal stochastic if

i \ne j \implies H_{i j} \ge 0

and

\displaystyle{ \sum_{i \in X} H_{i j} = 0 }

for all j \in X.

The inequality says that if we start in the state i, the probability of being found in some other state, which starts at 0, can’t go down, at least initially. The equation says that the probability of being somewhere or other doesn’t change. Together, these facts imply that that:

H_{i i} \le 0

That makes sense: the probability of being in the state $i$, which starts at 1, can’t go up, at least initially.

Using the magic of matrix multiplication, we can rewrite the master equation as follows:

\displaystyle{\frac{d}{d t} \psi(t) = H \psi(t) }

and we can solve it like this:

\psi(t) = \exp(t H) \psi(0)

If H is an infinitesimal stochastic operator, we will call \exp(t H) a Markov process, and H its Hamiltonian.

(Actually, most people call \exp(t H) a Markov semigroup, and reserve the term Markov process for another way of looking at the same idea. So, be careful.)

Noether’s theorem is about ‘conserved quantities’, that is, observables whose expected values don’t change with time. To understand this theorem, you need to know a bit about observables. In stochastic mechanics an observable is simply a function assigning a number O_i to each state i \in X.

However, in quantum mechanics we often think of observables as matrices, so it’s nice to do that here, too. It’s easy: we just create a matrix whose diagonal entries are the values of the function O. And just to confuse you, we’ll also call this matrix O. So:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

One advantage of this trick is that it lets us ask whether an observable commutes with the Hamiltonian. Remember, the commutator of matrices is defined by

[O,H] = O H - H O

Noether’s theorem will say that [O,H] = 0 if and only if O is ‘conserved’ in some sense. What sense? First, recall that a stochastic state is just our fancy name for a probability distribution \psi on the set X. Second, the expected value of an observable O in the stochastic state \psi is defined to be

\displaystyle{ \sum_{i \in X} O_i \psi_i }

In Part 5 we introduced the notation

\displaystyle{ \int \phi = \sum_{i \in X} \phi_i }

for any function \phi on X. The reason is that later, when we generalize X from a finite set to a measure space, the sum at right will become an integral over X. Indeed, a sum is just a special sort of integral!

Using this notation and the magic of matrix multiplication, we can write the expected value of O in the stochastic state \psi as

\int O \psi

We can calculate how this changes in time if \psi obeys the master equation… and we can write the answer using the commutator [O,H]:

Lemma. Suppose H is an infinitesimal stochastic operator and O is an observable. If \psi(t) obeys the master equation, then

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

Proof. Using the master equation we have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O \frac{d}{d t} \psi(t) = \int O H \psi(t) } \qquad \qquad \qquad \; (1)

But since H is infinitesimal stochastic,

\displaystyle{ \sum_{i \in X} H_{i j} = 0  }

so for any function \phi on X we have

\displaystyle{ \int H \phi = \sum_{i, j \in X} H_{i j} \phi_j = 0 }

and in particular

\int H O \psi(t) = 0   \quad \; \qquad \qquad \qquad \qquad   \qquad \qquad \qquad \qquad (2)

Since [O,H] = O H - H O , we conclude from (1) and (2) that

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int [O,H] \psi(t) }

as desired.   █

The commutator doesn’t look like it’s doing much here, since we also have

\displaystyle{ \frac{d}{d t} \int O \psi(t) = \int O H \psi(t) }

which is even simpler. But the commutator will become useful when we get to Noether’s theorem!

Noether’s theorem

Here’s a version of Noether’s theorem for Markov processes. It says an observable commutes with the Hamiltonian iff the expected values of that observable and its square don’t change as time passes:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int O\psi(t) = 0 }

and

\displaystyle{ \frac{d}{d t} \int O^2\psi(t) = 0 }

for all \psi(t) obeying the master equation.

If you know Noether’s theorem from quantum mechanics, you might be surprised that in this version we need not only the observable but also its square to have an unchanging expected value! We’ll explain this, but first let’s prove the theorem.

Proof. The easy part is showing that if [O,H]=0 then \frac{d}{d t} \int O\psi(t) = 0 and \frac{d}{d t} \int O^2\psi(t) = 0. In fact there’s nothing special about these two powers of t; we’ll show that

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) = 0 }

for all n. The point is that since H commutes with O, it commutes with all powers of O:

[O^n, H] = 0

So, applying the Lemma to the observable O^n, we see

\displaystyle{ \frac{d}{d t} \int O^n \psi(t) =  \int [O^n, H] \psi(t) = 0 }

The backward direction is a bit trickier. We now assume that

\displaystyle{ \frac{d}{d t} \int O\psi(t) = \frac{d}{d t} \int O^2\psi(t) = 0 }

for all solutions \psi(t) of the master equation. This implies

\int O H\psi(t) = \int O^2 H\psi(t) = 0

or since this holds for all solutions,

\displaystyle{ \sum_{i \in X} O_i H_{i j} = \sum_{i \in X} O_i^2H_{i j} = 0 }  \qquad \qquad \qquad \qquad  \qquad \qquad (3)

We wish to show that [O,H]= 0.

First, recall that we can think of O is a diagonal matrix with:

O_{i j} = \left\{ \begin{array}{ccl}  O_i & \textrm{if} & i = j \\ 0 & \textrm{if} & i \ne j  \end{array} \right.

So, we have

\begin{array}{ccl} [O,H]_{i j} &=& \displaystyle{ \sum_{k \in X} (O_{i k}H_{k j} - H_{i k} O_{k j}) } \\ \\ &=& O_i H_{i j} - H_{i j}O_j \\ \\ &=& (O_i-O_j)H_{i j} \end{array}

To show this is zero for each pair of elements i, j \in X, it suffices to show that when H_{i j} \ne 0, then O_j = O_i. That is, we need to show that if the system can move from state j to state i, then the observable takes the same value on these two states.

In fact, it’s enough to show that this sum is zero for any j \in X:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} }

Why? When i = j, O_j-O_i = 0, so that term in the sum vanishes. But when i \ne j, (O_j-O_i)^2 and H_{i j} are both non-negative—the latter because H is infinitesimal stochastic. So if they sum to zero, they must each be individually zero. Thus for all i \ne j, we have (O_j-O_i)^2H_{i j}=0. But this means that either O_i = O_j or H_{i j} = 0, which is what we need to show.

So, let’s take that sum and expand it:

\displaystyle{ \sum_{i \in X} (O_j-O_i)^2 H_{i j} = \sum_i (O_j^2 H_{i j}- 2O_j O_i H_{i j} +O_i^2 H_{i j}) }

which in turn equals

\displaystyle{  O_j^2\sum_i H_{i j} - 2O_j \sum_i O_i H_{i j} + \sum_i O_i^2 H_{i j} }

The three terms here are each zero: the first because H is infinitesimal stochastic, and the latter two by equation (3). So, we’re done!   █

Markov chains

So that’s the proof… but why do we need both O and its square to have an expected value that doesn’t change with time to conclude [O,H] = 0? There’s an easy counterexample if we leave out the condition involving O^2. However, the underlying idea is clearer if we work with Markov chains instead of Markov processes.

In a Markov process, time passes by continuously. In a Markov chain, time comes in discrete steps! We get a Markov process by forming \exp(t H) where H is an infinitesimal stochastic operator. We get a Markov chain by forming the operator U, U^2, U^3, \dots where U is a ‘stochastic operator’. Remember:

Definition. Given a finite set X, a matrix of real numbers U = (U_{i j})_{i, j \in X} is stochastic if

U_{i j} \ge 0

for all i, j \in X and

\displaystyle{ \sum_{i \in X} U_{i j} = 1 }

for all j \in X.

The idea is that U describes a random hop, with U_{i j} being the probability of hopping to the state i if you start at the state j. These probabilities are nonnegative and sum to 1.

Any stochastic operator gives rise to a Markov chain U, U^2, U^3, \dots . And in case it’s not clear, that’s how we’re defining a Markov chain: the sequence of powers of a stochastic operator. There are other definitions, but they’re equivalent.

We can draw a Markov chain by drawing a bunch of states and arrows labelled by transition probabilities, which are the matrix elements U_{i j}:

Here is Noether’s theorem for Markov chains:

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int O U \psi = \int O \psi }

and

\displaystyle{ \int O^2 U \psi = \int O^2 \psi }

for all stochastic states \psi.

In other words, an observable commutes with U iff the expected values of that observable and its square don’t change when we evolve our state one time step using U.

You can probably prove this theorem by copying the proof for Markov processes:

Puzzle. Prove Noether’s theorem for Markov chains.

But let’s see why we need the condition on the square of observable! That’s the intriguing part. Here’s a nice little Markov chain:

where we haven’t drawn arrows labelled by 0. So, state 1 has a 50% chance of hopping to state 0 and a 50% chance of hopping to state 2; the other two states just sit there. Now, consider the observable O with

O_i = i

It’s easy to check that the expected value of this observable doesn’t change with time:

\displaystyle{  \int O U \psi = \int O \psi }

for all \psi. The reason, in plain English, is this. Nothing at all happens if you start at states 0 or 2: you just sit there, so the expected value of O doesn’t change. If you start at state 1, the observable equals 1. You then have a 50% chance of going to a state where the observable equals 0 and a 50% chance of going to a state where it equals 2, so its expected value doesn’t change: it still equals 1.

On the other hand, we do not have [O,U] = 0 in this example, because we can hop between states where O takes different values. Furthermore,

\displaystyle{  \int O^2 U \psi \ne \int O^2 \psi }

After all, if you start at state 1, O^2 equals 1 there. You then have a 50% chance of going to a state where O^2 equals 0 and a 50% chance of going to a state where it equals 4, so its expected value changes!

So, that’s why \int O U \psi = \int O \psi for all \psi is not enough to guarantee [O,U] = 0. The same sort of counterexample works for Markov processes, too.

Finally, we should add that there’s nothing terribly sacred about the square of the observable. For example, we have:

Theorem. Suppose H is an infinitesimal stochastic operator and O is an observable. Then

[O,H] =0

if and only if

\displaystyle{ \frac{d}{d t} \int f(O) \psi(t) = 0 }

for all smooth f: \mathbb{R} \to \mathbb{R} and all \psi(t) obeying the master equation.

Theorem. Suppose U is a stochastic operator and O is an observable. Then

[O,U] =0

if and only if

\displaystyle{  \int f(O) U \psi = \int f(O) \psi }

for all smooth f: \mathbb{R} \to \mathbb{R} and all stochastic states \psi.

These make the ‘forward direction’ of Noether’s theorem stronger… and in fact, the forward direction, while easier, is probably more useful! However, if we ever use Noether’s theorem in the ‘reverse direction’, it might be easier to check a condition involving only O and its square.

46 Responses to Network Theory (Part 11)

  1. Roger Witte says:

    In the first defiintion you need sigma H = 1, not sigma H = 0

    • John Baez says:

      Thanks, but no: I think you’re mixing up ‘infinitesimal stochastic’ and ‘stochastic’: see Part 5 for both those concepts.

      H is infinitesimal stochastic, so it should obey

      \sum_i H_{i j} = 0

      Later in this post you’ll see a stochastic matrix U, which obeys

      \sum_i U_{i j} = 1

      H is infinitesimal stochastic iff \exp(t H) is stochastic for all t \ge 0. It’s a bit like self-adjoint versus unitary.

  2. Nice post. However I have some issues with the terminology used:

    1) “the number H_{ij} describes the probability per unit time of hopping from the state j to the state i” is not quite correct. H_{ij} is the rate at which the system hops from state j to state i. In other words, for an infinitesimal time dt the probability for jumping is H_{ij}dt. The difference between the two is similar to the difference between the interest rate and the AER (annual equivalent rate, for non-UK readers).

    2) “Together, they imply that the probability of staying in the same place goes down: H_{ii}\leq 0.” is not clear to me. What is this “probability of staying in the same place” and it is going down as what changes? I think it would be clearer to say that -H_{ii} is the rate at which a system in state i leaves that state.

    3) “we call exp(tH) a Markov process”. The term “Markov process” already has a different well-defined mathematical meaning. The group of operators \{exp(tH), t\geq 0 is often referred to as the “Markov semigroup”. I am happy with the term “Hamiltonian” for the generator of this semigroup.

    In principle I like the approach taken here of giving physicists’s quantum mechanical names to probability theory concepts. It makes the theory more accessible to physicists. In that vein I would go even further and use Dirac’s bra-ket notation rather than the integral notation you employ. So, for example, instead of \int O\psi I would write \langle |O|\psi\rangle where \langle | satisfies \langle |a^\dagger=0. I think the integral notation is unsatisfying to both mathematicians and physicists (physicists will be wondering where the dx went and mathematicians will want to know what measure is used).

    • John Baez says:

      Gustav wrote:

      “the number H_{ij} describes the probability per unit time of hopping from the state j to the state i” is not quite correct. H_{ij} is the rate at which the system hops from state j to state i. In other words, for an infinitesimal time dt the probability for jumping is H_{ij}dt.

      This issue comes up over and over when I write about these things. I feel I have trouble explaining this concept both accurately and very quickly.

      I completely understand the problem with what I said: it might seem like H_{ji} the probability that the system hops from state j to state i in, say, one second. As you know, what I really mean is to take the probability that the system hopes from state j to state i in \Delta t seconds, divide it by \Delta t, and then take the limit as \Delta t \to 0. But that takes a while to say!

      I’m always afraid that calling this quantity “the rate at which the system hops from state j to state i” will confuse people, because this description doesn’t mention probabilities. This rate is a “probabilistic rate”, and the phrase “probabilistic rate” is not part of everyday English. I guess say things like “on average, a bus comes by every 10 minutes”. But if you say “the rate at which buses come by is 1 per hour”, I bet they won’t guess you mean a probabilistic rate.

      I think that leaving out the word “unit” would help: “the number H_{ij} describes the probability per time of hopping from the state j to the state i.”

      Or I could say “the number H_{ij} describes the average rate at which the system hops from the state j to the state i.”

      What do people think is clearest? This time I’ll add a lengthy precise description, but I don’t want to always have to give such a long description. I want a clear short description that nonexperts can understand.

      • gustavdelius says:

        John, this discussion has been very instructive for me. It opened my eyes to the problems one runs in to when one wants to be precise and colloquial at the same time. In particular this is difficult if one wants a blog post to be readable to someone who has not read the previous blog posts in the series.

        I would vote for “probabilistic rate”. You are right that this is not part of everyday English. Therefore I suspect it also does not carry any misleading connotations. Most readers will probably just swallow it and the curious ones will be tempted to read your earlier posts with the precise explanations. Initially I had felt that “stochastic rate” might work, but I now realise that the word “stochastic” might sound technical.

      • Graham says:

        One of my books on probability uses “probability intensity of transition from state i to state j”. What you call the Hamiltonian I would call a “rate matrix”, or an “instantaneous rate matrix” if I thought the former was likely to confuse.

      • John Baez says:

        “Probability intensity” is an interesting phrase. I’m not sure most people would instantly understand it, but they could learn.

        “Rate matrix” is certainly clearer than “Hamiltonian”, so I should mention that in the (dreamt-of) final polished version of these notes. “Hamiltonian” is mainly good for helping physicists see that all this stuff is a lot like quantum mechanics. In quantum mechanics we have

        \displaystyle{ \frac{d }{d t} \psi(t)  = - i H \psi(t) }

        while here we have

        \displaystyle{ \frac{d}{d t} \psi(t) =  H \psi(t) }

        While we’re comparing conventions, I should add that lots of people prefer

        \displaystyle{ \frac{d}{d t} \psi(t) =  - H \psi(t) }

        and this indeed has advantages. But I thought that sticking a minus sign would seem peculiar to beginners.

  3. John Baez says:

    Gustav wrote:

    “Together, they imply that the probability of staying in the same place goes down: H_{ii}\leq 0.” is not clear to me. What is this “probability of staying in the same place” and it is going down as what changes?

    These comments are really useful, because while you and I both know what I really meant to say, I plan to turn these posts into a paper or book someday, and then it’s important that they be clear.

    So, here’s what I mean. The probability of staying in some particular place, say place i, is the matrix element

    \exp(tH)_{i i}

    and this goes down as time passes:

    \left. \frac{d}{d t} \exp(tH)_{i i} \right|_{t = 0} = H_{i i} \le 0

    But again, the problem comes when I try to say this very quickly and informally but still clearly.

    I think it would be clearer to say that -H_{ii} is the rate at which a system in state i leaves that state.

    Again, I avoided saying this because this rate is a “probabilistic rate”, a rate of change of probabilities, and what you say here doesn’t make that clear. “The rate at which the probability of staying in the state i diminishes” is perhaps more precise—but it sounds stilted, not conversational.

    Also, the minus sign also looks like it’s inserted ad hoc when we say things this way. What’s uniformly true is that

    \exp(tH)_{i j}

    is the probability of hopping from state j to state i after time t, and

    \left. \frac{d}{d t} \exp(tH)_{i j} \right|_{t = 0} = H_{i j}

    So H_{i j} is the “instantaneous rate of change, at t = 0, of the probability of hopping from state j to state i after time t.”

    But I’d like a way to say this that’s quick, informal, yet clear. Of course I need to explain this idea clearly and patiently somewhere. But then there will be times I need to remind people of it—and those reminders should be terse but not misleading.

    • gustavdelius says:

      Thank you John, that was helpful. While reading that sentence that gave me difficulties I had not realised that the probability you were talking about was given as a simple exponential and therefore I had not made the connection between the decrease in that probability and the sign of H_{ii}.

      I know I am in a pedantic mood. But I think being pedantic is fun. So here I go again. There is a difference between two probabilities. exp(t H)_{ii} gives the probability of _being_ in state i at time t given that we start in state i at time 0. The probability of staying (in the sense of never leaving) in state i until at least time t is /exp(t H_{ii}). Luckily they both have the same derivative H_{ii} at t=0, so they both go down at t=0. The probability of staying has the added benefit of going down also at t>0. The probability of being in the state on the other hand could conceivably go up again later. So it is good that you chose to talk of the probability of staying.

  4. John Baez says:

    Gustav wrote:

    “we call \exp(tH) a Markov process”. The term “Markov process” already has a different well-defined mathematical meaning. The group of operators \{ \exp(tH), t\geq 0 \} is often referred to as the “Markov semigroup”.

    Hmm, when I read the definition of Markov process, it sounds like a long-winded way of describing a Markov semigroup. Isn’t there a one-to-one correspondence between Markov processes and Markov semigroups? If there is, I can just insert a little note saying that I’m abusing language a bit.

    (I’m a bit of a radical, I’m afraid: I think the world needs people who try using terminology in new ways… as long as they define it. Such people are nuisances, I know. But they provide the lubrication needed to eventually find the optimal terminology: otherwise things get locked in place at suboptimal local maxima.)

    I am happy with the term “Hamiltonian” for the generator of this semigroup.

    Good, because I want you to be happy, and that’s what I’m going to use.

    I would go even further and use Dirac’s bra-ket notation rather than the integral notation you employ.

    I can’t do that, because as I explained in Part 5, the fundamental structure here is not the Hilbert space L^2(X) but rather the space L^1(X), which doesn’t have an inner product on it! This is the big philosophical point I’m trying to make throughout these notes. I’ll quote myself and then say a bit more:

    Probability versus quantum theory

    Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set X of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:

    • In a probabilistic model, we may instead say that the system has a probability \psi(x) of being in any state x \in X. These probabilities are nonnegative real numbers with

    \sum_{x \in X} \psi(x) = 1

    • In a quantum model, we may instead say that the system has an amplitude \psi(x) of being in any state x \in X. These amplitudes are complex numbers with

    \sum_{x \in X} | \psi(x) |^2 = 1

    Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.

    We often want to replace the sums above by integrals. For that we need to replace our set X by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:

    • In a probabilistic model, the system has a probability distribution \psi : X \to \mathbb{R}, which obeys \psi \ge 0 and

    \int_X \psi(x) \, d x = 1

    • In a quantum model, the system has a wavefunction \psi : X \to \mathbb{C}, which obeys

    \int_X | \psi(x) |^2 \, d x= 1

    In probability theory, we integrate \psi over a set S \subset X to find out the probability that our systems state is in this set. In quantum theory we integrate |\psi|^2 over the set to answer the same question.

    We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set X into a measure space such that by definition,

    \int_X \psi(x) \, dx = \sum_{x \in X} \psi(x)

    In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.

    In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:

    • In probability theory, the probability distribution \psi is a vector in the space

    L^1(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)| \, d x < \infty \}

    • In quantum theory, the wavefunction \psi is a vector in the space

    L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)|^2 \, d x < \infty \}

    You may wonder why I defined L^1(X) to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

    Now:

    • The main thing we can do with elements of L^1(X), besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

    \int : L^1(X) \to \mathbb{C}

    • The main thing we can with elements of L^2(X), besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

    \langle \psi, \phi \rangle = \int_X \overline{\psi}(x) \phi(x) \, d x

    This gives a map that’s linear in one slot and conjugate-linear in the other:

    \langle - , - \rangle :  L^2(X) \times L^2(X) \to \mathbb{C}

    First came probability theory with L^1(X); then came quantum theory with L^2(X). Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on L^3(X). In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to L^1 and L^2 today.

    Privately I often use angle brackets like this:

    \langle - \rangle: L^1(X) \to \mathbb{C}

    to denote the operation I’m publicly calling the integral

    \int : L^1(X) \to \mathbb{C}

    This heightens the resemblance to Dirac’s bracket notation: quantum mechanics uses \langle \psi, O \psi \rangle for the expected value of an observable, while stochastic mechanics uses \langle O \psi \rangle.

    However, I’m sure that writing \langle O \psi \rangle for the expected value of an observable O in the state \psi would annoy lots of people. For one thing, lots of people use \langle O \rangle, sweeping \psi under the carpet. This is sort of stupid, but it’s completely entrenched.

    So, for now I’m using \int O \psi instead. And this has the advantage of having a fairly self-evident meaning: I’m integrating the function O \psi over the space X.

  5. Eric says:

    The proof of Noether’s theorem is neat, but I wonder if it can be cleaned up? For example, is there a way to prove it without resorting to components?

  6. gustavdelius says:

    John, I am not yet sure why you do not like the notation \langle|O|\psi\rangle. I am using it simply in the spirit of Dirac, who wasn’t worrying about mathematical subtleties like whether wavefunctions live in L^1 or L^2.

    I guess, in the language of L^p spaces I would have to say that the kets live in L^1 and the bras in L^\infty, but for the purpose of these blog posts we can follow Dirac and ignore such details.

    • John Baez says:

      Everyone has their own notation and nobody likes anyone else’s. I take that for granted as a condition of life. I don’t want to know why other people hate my notation; I don’t expect them to care why I hate theirs. I prefer to discuss more interesting things. So, this comment will be somewhat grumpy in tone.

      I am trying to clarify and exploit the logical relation between probability theory and quantum theory. This involves noting the similarities but also respecting the differences.

      I am not at all interested in ‘following the spirit of Dirac’, if that means ‘glossing over mathematical subtleties’. However, I don’t want to scare my readers by introducing too much formalism too soon—especially if I haven’t worked out the details!

      So far in this series of posts, I’m pursuing the philosophy that quantum theory is about Hilbert spaces while probability theory is about vector spaces equipped with some other structure. This extra structure is something like that of an integration algebra… but that may not be quite right, so I’d rather not talk about it yet.

      So, instead, I’m saying that quantum theory is about L^2 while probability theory is about L^1. This is easier for everyone to understand.

      Given this, I want to write the integral of the function O \psi as \int O \psi, rather than trying to artificially force probability theory into looking like quantum theory by writing it as \langle | O | \psi \rangle.

      There is a certain quaint charm in using an integral sign to denote integration, after all.

      But if someone held a gun to my head and forced me to use Dirac notation here, I would write \langle 1 | O | \psi \rangle, which at least makes some sense: as you note, we can say we’re pairing the element 1 \in L^\infty with the element O \psi \in L^1.

      But if we try to understand the relation between quantum theory and probability theory this way, I believe we’ll get quite confused.

      Anyway, there are lots of interesting issues to discuss here, but I think it will be easiest if we decouple them from the question of what notation to use.

    • Blake Stacey says:

      John Baez wrote:

      But if someone held a gun to my head and forced me to use Dirac notation here, I would write \langle 1 | O | \psi \rangle, which at least makes some sense: as you note, we can say we’re pairing the element 1 \in L^\infty with the element O \psi \in L^1.

      A lot of people do this — it’s the first way I’d seen the technology set up, though of course that doesn’t mean it’s the best or the most illuminating choice.

      • John Baez says:

        Do any of these people reflect out loud about what this approach means? It means something like: there’s a god-given ‘default state’ called 1, and the expectation value of an observable O in the state \psi is the transition amplitude \langle 1 | O | \psi \rangle. But actually it’s weirder than that, since if we have

        \int \psi = 1

        then the $\psi$ will hardly ever count as a quantum state, since typically

        \int |\psi|^2 \ne 1

        and similarly, unless our measure space is a probability measure space the default state will be neither a stochastic state:

        \int 1 \ne 1

        nor a quantum state:

        \int |1|^2 \ne 1

        So it’s all very weird. Basically, it ignores the fact that quantum states should have

        \int |\psi|^2 = 1

        while stochastic states are very different beast, with

        \int \psi = 1, \qquad \psi \ge 0

        We can get a stochastic state from a quantum state \psi by forming |\psi|^2: we all learn about this in school, when people discuss the probability interpretation of the wavefunction.
        Conversely (though I never hear anyone talk about this) we can get a quantum state from a stochastic state \psi by forming \sqrt{\psi}. But in the approach where we talk about \langle 1 | O | \psi \rangle, it seems we are simply pretending a stochastic state is a quantum state, while neglecting all the problems this raises!

        Believe me, I’d be fascinated if someone could tell a coherent story about this… I’m not trying to nip an nascent idea in the bud… but so far all my thoughts about this suggest it’s a wrong road.

      • Blake Stacey says:

        Does it still count as a “nascent idea” if it’s been around since 1976? :-P

        More seriously, I think the main issue is that most of the people involved just weren’t that concerned with quantum-to-classical transitions. If the smallest thing you’re considering is a rabbit, a sand grain or even a clump of cells in a human neocortex, going from a probability distribution to a quantum density matrix or vice versa isn’t a top priority. So, while being able to lift tools out of the quantum toolbox is nice, relating a stochastic description of a system to a quantum description of the same physical system isn’t a goal.

        Cardy (1996) is typical:

        There are two immediately apparent differences from ordinary quantum field theory: first, there is no factor of i in the Schrödinger equation (3) — but this is familiar from euclidean formulations of conventional quantum theories; second, the hamiltonian is not hermitian. In many cases it will turn out that, nevertheless, its eigenvalues are real. (Complex eigenvalues correspond to oscillating states which are known to occur in some chemical reactions.) However, the most important difference is one of interpretation: expectation values of observables \mathcal{O} are not given by \langle \Psi | \mathcal{O} | \Psi \rangle, since this would be bilinear, rather than linear, in the probabilities p(\alpha; t). Instead, for an observable which is diagonal in the occupation number basis, its expectation value is of course

        \bar{\mathcal{O}} = \sum_{\{n_j\}} \mathcal{O}(\{n_j\}) p(\{n_j\}; t),

        and it is straightforward to show that this may be expressed as

        \bar{\mathcal{O}} = \langle 0 | e^{\sum_j a_j} \mathcal{O} e^{-Ht} | \Psi(0) \rangle,

        since the state \langle 0 | e^{\sum_j a_j} is a left eigenstate of all the a^\dag_j, with unit eigenvalue.

        Second, I think that the people who study diffusion-limited reactions, active-to-absorbing phase transitions, directed percolation and the like are generally eager to skip past the first steps of defining the formalism and get to a Lagrangian they can play with. A better notation at the beginning may obviate the need for a few awkwardnesses further along (e.g., field redefinitions); I’ll have to look into that. The stuff they seem to spend the most time worrying over comes after they’ve a stochastic Hamiltonian in the coherent-state representation: renormalization, estimating critical exponents, etc.

        • John Baez says:

          Blake wrote:

          Does it still count as a “nascent idea” if it’s been around since 1976?

          I’d say the mathematical trick has been around since 1976. The nascent idea lurking in this trick is that we can think of a probability distribution as a quantum state if we normalize it in a nonstandard way and promise to only ask about its transition amplitudes to a certain ‘default’ state | 1 \rangle. Mathematical tricks often conceal ideas that are too strange for people to say in words.

          More seriously, I think the main issue is that most of the people involved just weren’t that concerned with quantum-to-classical transitions.

          Yes, that’s one part of it. But even if we don’t try to describe the same system both classically and quantumly, there’s also the question of the logical relation between the classical and quantum descriptions: that’s what I’m especially interested in. But this is not the sort of question that ‘practical’ people tend to enjoy—perhaps because they can’t imagine what one might do with the answer.

          Second, I think that the people who study diffusion-limited reactions, active-to-absorbing phase transitions, directed percolation and the like are generally eager to skip past the first steps of defining the formalism and get to a Lagrangian they can play with.

          Right. For me the murky beginning steps are the most interesting part, because they hint at a relation between quantum mechanics and probability theory that seems a bit different than the ‘obvious’ one, where |\psi(x)|^2 rather than the wavefunction \psi(x) acts like a probability distribution. I’ve got a bunch of ideas about this that I’ll reveal as soon as I can.

  7. John Baez says:

    Could someone please go here and tell me whether you see what you should near the beginning of the section Markov processes, namely:

    i \ne j \implies H_{i j} \ge 0

    or what I see on my browser at work, namely

    i = j \implies H_{i j} \ge 0

    This is a really annoying bug!

    • I use Firefox 7.0.1 and see the first version.

    • Graham says:

      With Google chrome and IE, what I see is mathematically correct, but very messy. The slash through the equals sign is too far to the right, and the arrow is made of an equals sign and and arrow which don’t line up.

      • John Baez says:

        That’s what I used to get, back when things worked for me. I thought that was bad… but I don’t mind an ugly \ne sign nearly as much as one that looks exactly like =!

        By the way, I don’t know why your 3 attempts to post this ran into trouble.

    • Todd Trimble says:

      I get the first too, although it’s not nicely rendered. I’m using Firefox 7.0.

    • John Baez says:

      Thanks, guys! Does anyone see the &ne; as an equal sign? I now think it could be because on this computer I’ve downloaded fonts so that jsmath doesn’t need to grab them from somewhere else. I assume you guys are getting the little message on top, about jsmath?

    • Tom Leinster says:

      It looks fine to me (Firefox 3.6.22). The “not equals” sign is perfect. There’s a slight wobble in the shaft of the “implies” sign are slightly misaligned, but I’d hardly have noticed.

    • John Beattie says:

      Opera 10.0, which is notorious for getting maths wrong.

      It shows an extra ‘=’ prepended to the => sign (is this to make a long implication arrow?) In Opera itself the ‘=’ is at a slight angle but when I did a screen grab it came out straight, except that you can see the join, so I have some kind of optical illusion as well.

      Anyhow, I am wondering if the extra ‘=’ is being moved around somehow.

      • John Baez says:

        John Beattie wrote:

        It shows an extra ‘=’ prepended to the => sign (is this to make a long implication arrow?)

        Graham wrote:

        the arrow is made of an equals sign and and arrow which don’t line up.

        Tom wrote approximately:

        There’s a slight wobble in that the shaft of the “implies” sign is slightly misaligned, but I’d hardly have noticed.

        I think you’re all describing the same thing, which I’m also seeing on my laptop at home. I think that’s because I risked a \Longrightarrow instead of a mere \Rightarrow.

        Luckily, none of you see the ≠ coming out as an =, and neither do I, here at home. Only my computer at work commits that heinous crime!

        So I will relax, somewhat, and change the \Longrightarrow to \Rightarrow. (In case you’re wondering, jsmath doesn’t recognize \implies.)

        Thanks, everyone!!!

  8. John Baez says:

    Over on the Forum, Eric Forgy came up with a relative of Noether’s theorem that goes like this.

    Let’s assume that \psi(t) obeys the master equation, so

    \frac{d}{d t} \psi(t) = H \psi(t)

    Then we have

    \frac{d}{d t} O \psi(t) = O H \psi(t)

    Now, if O \psi also obeys the master equation then we also have

    \frac{d}{d t} O \psi(t) = H O \psi(t)

    From these we can conclude [H,O] \psi(t) = 0. Conversely if [H,O] \psi(t) = 0, then

    \frac{d}{d t} \int O \psi(t) =  O H  \psi(t) = H O \psi(t)

    so O \psi(t) obeys the master equation.

    So, we have:

    Proposition: if \psi(t) obeys the master equation, then [H,O]\psi(t) = 0 iff O \psi(t) obeys the master equation.

  9. John Baez says:

    Eric Forgy also lured me into thinking about the Schrödinger versus Heisenberg pictures in stochastic mechanics.

    So far I’ve been using time-independent observables and letting states evolve in time via

    \psi(t) = \exp(t H) \psi

    This is the Schrödinger picture. However, we may also use time-independent states and let observables evolve in time via

    O(t) = O \exp(t H)

    This is the Heisenberg picture. These pictures are compatible in that we may use either one to compute the expected value of an observable O measured in the state \psi after waiting a time $t$, and we get the same answer:

    \int O(t) \psi = \int O \psi(t)

    In the Schrödinger picture we have the master equation

    \displaystyle{ \frac{d}{d t} \psi(t) = H \psi(t) }

    while in the Heisenberg picture we have

    \displaystyle{ \frac{d}{d t} O(t) = O H }

    This is amusingly different than quantum mechanics. In quantum mechanics we define a time-dependent version of either the state \psi or the observable O by setting

    \psi(t) = \exp(-i t H) \psi

    and

    O(t) = \exp(i t H) O \exp(- i t H)

    and we obtain the compatibility equation

    \langle \psi(t), O \psi(t) \rangle = \langle \psi, O(t) \psi \rangle

    and also the equation

    \displaystyle{ \frac{d}{d t} O(t) = i [H, O(t) ] }

  10. John Baez says:

    There’s nothing like posting something publicly to stir up thoughts that make that post seem ill-considered and rash! I’ve changed my mind a bit about the Heisenberg picture in stochastic mechanics. While nothing I said above seems mathematically incorrect, it’s upsetting that while the product of observables is an observable, we have

    (AB)(t) \ne A(t) B(t)

    if, as above, we define

    A(t) = A \exp(t H)

    So, suppose H is infinitesimal stochastic. Also suppose our set X of states is finite, to avoid subtleties of analysis I’d rather postpone thinking about. Then \exp(t H) is defined for negative t as well as positive t, and while it’s usually not stochastic for negative times, we have

    \int \exp(tH) \phi = \int \phi

    for both positive and negative times.

    Then, we can either use time-independent observables and let states evolve in time via

    \psi(t) = \exp(t H) \psi

    or use time-independent states and let observables evolve in time via

    O(t) = \exp(-t H) O \exp(t H)

    These pictures are compatible in that we may use either one to compute the expected value of the observable O measured in the state \psi after waiting a time t, and we get the same answer:

    \int O(t) \psi = \int O \psi(t)

    The reason is that

    \begin{array}{ccl} \int O(t) \psi &=& \int \exp(-tH) O \exp(tH) \psi \\ &=& \int O \exp(tH) \psi \\ &=& O \psi(t) \end{array}

    where the second step uses the remarks I made earlier.

    Now we have

    \displaystyle{ \frac{d}{d t} O(t) = -[O, H]}

    and also

    (AB)(t) = A(t) B(t)

    for any observables A, B.

    There’s more to say, but not now! It’s dinnertime!

    • Eric says:

      Another option is to let \psi evolve as an operator. I’m writing some notes on that idea on the forum as we speak.

  11. tomate says:

    Dear John,

    I don’t get one thing. You write “That is, we need to show that if the system can move from state j to state i, then the observable takes the same value on these two states.”

    So, if I understand well, if the graph is connected, observable O will take the same value on all of the states. Otherwise, it will have different constant values in each component (but I do not consider disconnected graphs as really interesting, as each component is completely independent of another: each process happens on its own).

    So, maybe what you proved is not “Noether’s theorem”, but the (still nice) result that

    “If a time-independent observable’s average and variance do not vary in time, then the observable is uniform of the vertex set”.

    Observations:

    – I wonder if this could have anyhing to do with Discrete Analytic Functions (http://www.cs.elte.hu/~lovasz/analytic.pdf), which are constant on any compact discretized Riemann surface.

    – The fact that first and second moments play the crucial role for Markov processes resounds with the continuous variable case, where the underlying stochastic processes have at each time gaussian distributions – the gaussian has only first and second nonvanishing moments – and something similar happens in Pawula’s theorem for the truncation of the Kramers-Moyal expansion after the second term.

    • John Baez says:

      Tomate wrote:

      So, if I understand well, if the graph is connected, the observable O will take the same value on all of the states. Otherwise, it will have different constant values in each component…

      That’s right. By the way, for people who don’t understand what you said, let me add that you’re taking the points of our set X as the vertices of a directed graph, and drawing an edge from j to i whenever H_{i j} is nonzero.

      (but I do not consider disconnected graphs as really interesting, as each component is completely independent of another: each process happens on its own).

      Well, you may not consider it interesting, but that’s what a conserved quantity O does: it splits the set X into a disjoint union of subsets on which O takes different constant values, and our Markov process then becomes a ‘disjoint union’ of Markov processes on these subsets. It’s exactly like in quantum mechanics, where a conserved quantity splits the Hilbert space up as a direct sum of eigenspaces, and time evolution separately preserves each eigenspace.

      Personally I consider this very interesting: this is how conserved quantities let us simplify physics problems! And they arise quite often: for example, in the reversible reaction we considered last time:

      the total number of particles of types 1 and 2 is conserved. This explains how from a single Poisson equilibrium state we were able to extract a lot of different equilibrium states in which that number took different values. I’ll work out this example in detail sometime, for people who need a bit of help.

      • tomateomate says:

        OK, I buy it. But still I prefer the formulation “If a time-independent observable’s average and variance do not vary in time, then the observable is uniform over the vertex set (of a connected graph)”. I think it does have something deep in it related to the key role of the first and second moment for stochastic processes.

        (Also because in QM you can build entangled wave functions over factorized subspaces, while here the superposition between probabilties, or populations, is always what one would call a “mixture” in the QM case.)

  12. tomate says:

    On the Schrödinger/Heisenberg picture: I’ve seen people using a sort of “interaction picture”, where the hamiltonian H is split in the waiting time contribution H_0 = \delta_{ij} H_{ij} and an interaction hamiltonian H_I and then take care of these two pieces when exponentiating \exp(t H). It’s very useful for guessing the correct path measure, for example. I myself had a complete discussion of this procedure on my master thesis, but it is in italian. However, I’ve never seen it discussed in relation to the evolution of a conjugate observable O. It would be interesting to see what happens if one discharges part of the evolution (the “free” one) on an observable and part (the “interacting”) on the probability measure itself. Maybe it would make calculations easier.

    • John Baez says:

      Hi! Great to see you here again! James Dolan had suggested to me the idea of using an interaction picture of precisely this sort. I’d never seen it before. What’s I find amusing is that the particle’s probability of staying where it is decays before the particle jumps somewhere else… as if it’s dreaming of the jump before it goes:

      \exp(t H) = \exp(t H_0) + \int_0^t \exp((t-s) H_0) H_I \exp(s H_0) \, ds + \cdots

      This is different than the interaction picture in quantum mechanics, where H_0 by itself is already self-adjoint, so that the free evolution \exp(t H_0) is unitary between the ‘jumps’.

  13. tomate says:

    I’ve always been here – but with too little time for discussion.

    This is precisely what I had in mind. I can send you via email a couple of pages from my master thesis if you want: they are in Italian, but the formulas are quite clear. Funnily, I don’t have references… I wrote that chapter out of some personal notes of my professor, which didn’t have references neither. In the field, it’s like everybody knows about it but nobody knows exactly where it comes from…

    “as if it’s dreaming of the jump before it goes”: this is always the effect it has when we project statistical arguments onto the individuals, like when people play long-overdue numbers at the lotteries…

    • John Baez says:

      Tomate wrote:

      I’ve always been here…

      Oh, good! Sometimes I think everyone is leaving, or falling asleep.

      This is precisely what I had in mind. I can send you via email a couple of pages from my master thesis if you want: they are in Italian, but the formulas are quite clear.

      If it mainly says what we’ve already discussed, I guess I won’t make you bother. I guess this is some sort of ‘folk wisdom’.

      “as if it’s dreaming of the jump before it goes”: this is always the effect it has when we project statistical arguments onto the individuals, like when people play long-overdue numbers at the lotteries…

      Okay, good point!

  14. Brendan Fong proved the stochastic version of Noether’s theorem in Part 11. Now let’s do the quantum version […]

  15. John Baez says:

    Since nobody did the puzzle this time, I’ll have to do it myself.

    Puzzle. Suppose U is a stochastic operator and O is an observable. Show that O commutes with U iff the expected values of O and its square don’t change when we evolve our state one time step using U. In other words, show that

    [O,U] =0

    if and only if

    \displaystyle{  \int O U \psi = \int O \psi }

    and

    \displaystyle{ \int O^2 U \psi = \int O^2 \psi }

    for all stochastic states \psi.

    Answer. One direction is easy: if [O,U] = 0 then [O^n,U] = 0 for all n, so

    \displaystyle{ \int O^n U \psi = \int U O^n \psi = \int O^n \psi }

    where in the last step we use the fact that U is stochastic.

    For the converse direction we can use the same tricks that worked for Markov processes. Assume that

    \displaystyle{  \int O U \psi = \int O \psi }

    and

    \displaystyle{ \int O^2 U \psi = \int O^2 \psi }

    for all stochastic states \psi. These imply that

    \displaystyle{ \sum_{i \in X} O_i U_{i j} = O_j } \qquad \qquad \, (1)

    and

    \displaystyle{ \sum_{i \in X} O^2_i U_{i j} = O^2_j } \qquad \qquad (2)

    We wish to show that [O,U]= 0. Note that

    \begin{array}{ccl} [O,U]_{i j} &=& \displaystyle{ \sum_{k \in X} (O_{i k}U_{k j} - U_{i k} O_{k j}) } \\ \\  &=& (O_i-O_j)U_{i j} \end{array}

    To show this is always zero, we’ll show that when U_{i j} \ne 0, then O_j = O_i. This says that when our system can hop from one state to another, the observable O must take the same value on these two states.

    For this, in turn, it’s enough to show that the following sum vanishes for any j \in X:

    \displaystyle{ \sum_{i \in X} (O_j-O_i)^2 U_{i j} }

    Why? The matrix elements U_{i j} are nonnegative since U is stochastic. Thus the sum can only vanish if each term vanishes, meaning that O_j = O_i whenever U_{i j} \ne 0.

    To show the sum vanishes, let’s expand it:

    \begin{array}{ccl} \displaystyle{ \sum_{i \in X} (O_j-O_i)^2 U_{i j} } &=& \displaystyle{ \sum_i (O_j^2 U_{i j}- 2O_j O_i U_{i j} +O_i^2 U_{i j}) }  \\  \\ &=& \displaystyle{  O_j^2\sum_i U_{i j} - 2O_j \sum_i O_i U_{i j} + \sum_i O_i^2 U_{i j} } \end{array}

    Now, since (1) and (2) hold for all stochastic states \psi, this equals

    \displaystyle{  O_j^2\sum_i U_{i j} - 2O_j^2 + O_j^2  }

    But this is zero because U is stochastic, which implies

    \sum_i U_{i j} = 1

    So, we’re done!

  16. John Baez says:

    Word spreads fast! Here’s an announcement of a talk at the Oxford OASIS series. That stands for Oxford Advanced Seminar on Informatic Structures.

    Dear all,

    For this week’s OASIS seminar we have the pleasure of a talk by Harvey Brown, the professor in philosophy of physics at Oxford who is well-known for his work on the foundations of quantum mechanics, relativity theory, and the role of symmetry principles in physics, including several books. Moreover, he is a very clear and entertaining speaker! This Friday he will convince us that we need to take symmetries and their subtleties more seriously.

    Time and place: This Friday, 2pm, Lecture Theatre B, Department of Computer Science.

    Title: Noether’s famous 1918 symmetry theorem — what does it prove?

    Abstract: Recently, Brendan Fong and John Baez have provided an analogue in stochastic mechanics to what they call Noether’s theorem in quantum mechanics. Noether’s original theorem, relating symmetries and conservation principles, was the first in a series of theorems she proved in 1918 within a program in the calculus of variations, inspired by interpretational problems related to conservations laws in general relativity. I will sketch the background to Noether’s work and give special emphasis to the form and meaning of her “first” theorem. An unusual application of the theorem to quantum mechanics will be exploited.

    Philosophers of physics being as they are, the phrase “what they call” makes me afraid he’s planning to chide me for using the term “Noether’s theorem” in a very extended sense, not very close to that of her original 1918 paper. Physicists being as they are, such chiding wouldn’t stop me. But I’m curious to hear what he actually says. The talk will be videotaped and put on the OASIS website. Furthermore, Brendan is now at Oxford and can hear the talk in person!

  17. […] some applications of discrete calculus. In this post, I reformulate some of the material in Part 11 pertaining to Noether’s […]

  18. Arjun Jain says:

    In the 2 theorems at the end, why is f: \mathbb{R} \to \mathbb{R}, when f takes and gives only expressions in O ?
    Also, When we have f(O), does f being smooth mean that, it can be expanded in the form of a power series in O?

    • John Baez says:

      The functional calculus allows you to apply any function f: \mathbb{R} \to \mathbb{R} to any self-adjoint n \times n matrix (and thus any self-adjoint operator on a finite-dimensional Hilbert space), or any holomorphic function f: \mathbb{C} \to \mathbb{C} to any n \times n matrix (and thus to any linear operator on a finite-dimensional space).

      Also, when we have f(O), does f being smooth mean that, it can be expanded in the form of a power series in O?

      We can do that when f is holomorphic. This means that

      \displaystyle{ f(x) = \sum_{n = 0}^\infty   c_n x^n }

      and the power series converges for all x \in \mathbb{C}. You can then prove that

      \displaystyle{ \sum_{n = 0}^\infty   c_n O^n }

      converges for all matrices O, so we can define f(O) to be this.

      When O is self-adjoint we can go further, and define f(O) for any function f, simply by saying that f(O) has the same eigenvectors as O, and

      O \psi = \lambda \psi \Rightarrow f(O) \psi = f(\lambda) \psi

      We need O to be self-adjoint here because that guarantees that we can choose a basis of eigenvectors of O.

      For some reason I chose to focus on the case where O is smooth, but there was no need to do this. Since O is self-adjoint, I could have assumed f : \mathbb{R} \to \mathbb{R} is any function whatsoever.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

This site uses Akismet to reduce spam. Learn how your comment data is processed.