Oscar Dahlsten is visiting the Centre for Quantum Technologies, so we’re continuing some conversations about entropy that we started last year, back when the Entropy Club was active. But now Jamie Vicary and Brendan Fong are involved in the conversations.
I was surprised when Oscar told me that for a large class of random processes, the usual second law of thermodynamics is just one of infinitely many laws saying that various kinds of disorder increase. I’m annoyed that nobody ever told me about this before! It’s as if they told me about conservation of energy but not conservation of schmenergy, and phlenergy, and zenergy…
So I need to tell you about this. You may not understand it, but at least I can say I tried. I don’t want you blaming me for concealing all these extra second laws of thermodynamics!
Here’s the basic idea. Not all random processes are guaranteed to make entropy increase. But a bunch of them always make probability distributions flatter in a certain precise sense. This makes the entropy of the probability distribution increase. But when you make a probability distribution flatter in this sense, a bunch of other quantities increase too! For example, besides the usual entropy, there are infinitely many other kinds of entropy, called ‘Rényi entropies’, one for each number between 0 and ∞. And a doubly stochastic operator makes all the Rényi entropies increase! This fact is a special case of Theorem 10 here:
• Tim van Erven and Peter Harremoës, Rényi divergence and majorization.
Let me state this fact precisely, and then say a word about how this is related to quantum theory and ‘the collapse of the wavefunction’.
To keep things simple let’s talk about probability distributions on a finite set, though Erven and Harremoës generalize it all to a measure space.
How do we make precise the concept that one probability distribution is flatter than another? You know it when you see it, at least some of the time. For example, suppose I have some system in thermal equilibrium at some temperature, and the probabilities of it being in various states look like this:
Then say I triple the temperature. The probabilities flatten out:
But how can we make this concept precise in a completely general way? We can do it using the concept of ‘majorization’. If one probability distribution is less flat than another, people say it ‘majorizes’ that other one.
Here’s the definition. Say we have two probability distributions and on the same set. For each one, list the probabilities in decreasing order:
Then we say majorizes if
for all So, the idea is that the biggest probabilities in the distribution add up to more than the corresponding biggest ones in
In 1960, Alfred Rényi defined a generalization of the usual Shannon entropy that depends on a parameter If is a probability distribution on a finite set, its Rényi entropy of order is defined to be
where Well, to be honest: if is 0, 1, or we have to define this by taking a limit where we let creep up to that value. But the limit exists, and when we get the usual Shannon entropy
As I explained a while ago, Rényi entropies are important ways of measuring biodiversity. But here’s what I learned just now, from the paper by Erven and Harremoës:
Theorem 1. If a probability distribution majorizes a probability distribution its Rényi entropies are smaller:
And here’s what makes this fact so nice. If you do something to a classical system in a way that might involve some randomness, we can describe your action using a stochastic matrix. An matrix is called stochastic if whenever is a probability distribution, so is This is equivalent to saying:
• the matrix entries of are all and
• each column of sums to 1.
If is stochastic, it’s not necessarily true that the entropy of is greater than or equal to that of not even for the Shannon entropy.
Puzzle 1. Find a counterexample.
However, entropy does increase if we use specially nice stochastic matrices called ‘doubly stochastic’ matrices. People say a matrix doubly stochastic if it’s stochastic and it maps the probability distribution
to itself. This is the most spread-out probability distribution of all: every other probability distribution majorizes this one.
Why do they call such matrices ‘doubly’ stochastic? Well, if you’ve got a stochastic matrix, each column sums to one. But a stochastic operator is doubly stochastic if and only if each row sums to 1 as well.
Here’s a really cool fact:
Theorem 2. If is doubly stochastic, majorizes for any probability distribution Conversely, if a probability distribution majorizes a probability distribution then for some doubly stochastic matrix .
Taken together, Theorems 1 and 2 say that doubly stochastic transformations increase entropy… but not just Shannon entropy! They increase all the different Rényi entropies, as well. So if time evolution is described by a doubly stochastic matrix, we get lots of ‘second laws of thermodynamics’, saying that all these different kinds of entropy increase!
Finally, what does all this have to do with quantum mechanics, and collapsing the wavefunction? There are different things to say, but this is the simplest:
Theorem 3. Given two probability distributions , then majorizes if and only there exists a self-adjoint matrix with eigenvalues and diagonal entries
The matrix will be a density matrix: a self-adjoint matrix with positive eigenvalues and trace equal to 1. We use such matrices to describe mixed states in quantum mechanics.
Theorem 3 gives a precise sense in which preparing a quantum system in some state, letting time evolve, and then measuring it ‘increases randomness’.
How? Well, suppose we have a quantum system whose Hilbert space is If we prepare the system in a mixture of the standard basis states with probabilities we can describe it with a diagonal density matrix Then suppose we wait a while and some unitary time evolution occurs. The system is now described by a new density matrix
where is some unitary operator. If we then do a measurement to see which of the standard basis states our system now lies in, we’ll get the different possible results with probabilities the diagonal entries of But the eigenvalues of will still be the numbers So, by the theorem, majorizes !
So, not only Shannon entropy but also all the Rényi entropies will increase!
Of course, there are some big physics questions lurking here. Like: what about the real world? In the real world, do lots of different kinds of entropy tend to increase, or just some?
Of course, there’s a huge famous old problem about how reversible time evolution can be compatible with any sort of law saying that entropy must always increase! Still, there are some arguments, going back to Boltzmann’s H-theorem, which show entropy increases under some extra conditions. So then we can ask if other kinds of entropy, like Rényi entropy, increase as well. This will be true whenever we can argue that time evolution is described by doubly stochastic matrices. Theorem 3 gives a partial answer, but there’s probably much more to say.
I don’t have much more to say right now, though. I’ll just point out that while doubly stochastic matrices map the ‘maximally smeared-out’ probability distribution
to itself, a lot of this theory generalizes to stochastic matrices that map exactly one other probability distribution to itself. We need to work with relative Rényi entropy instead of Rényi entropy, and so on, but I don’t think these adjustments are really a big deal. And there are nice theorems that let you know when a stochastic matrix maps exactly one probability distribution to itself, based on the Perron–Frobenius theorem.
I already gave you a reference for Theorem 1, namely the paper by Erven and Harremoës, though I don’t think they were the first to prove this particular result: they generalize it quite a lot.
What about Theorem 2? It goes back at least to here:
• Barry C. Arnold, Majorization and the Lorenz Order: A Brief Introduction, Springer Lecture Notes in Statistics 43, Springer, Berlin, 1987.
The partial order on probability distributions given by majorization is also called the ‘Lorenz order’, but mainly when we consider probability distributions on infinite sets. This name presumably comes from the Lorenz curve, a measure of income inequality. This curve shows for the bottom x% of households, what percentage y% of the total income they have:
Puzzle 2. If you’ve got two different probability distributions of incomes, and one majorizes the other, how are their Lorenz curves related?
When we generalize majorization by letting some other probability distribution take the place of
it seems people call it the ‘Markov order’. Here’s a really fascinating paper on that, which I’m just barely beginning to understand:
What about Theorem 3? Apparently it goes back to here:
• A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 20 (1971), 633.
though I only know this thanks to a more recent paper:
• Michael A. Nielsen, Conditions for a class of entanglement transformations, Phys. Rev. Lett. 83 (1999), 436–439.
By the way, Nielsen’s paper contains another very nice result about majorization! Suppose you have states and of a 2-part quantum system. You can trace out one part and get density matrices describing mixed states of the other part, say and . Then Nielsen shows you can get from to using ‘local operations and classical communication’ if and only if majorizes . Note that things are going backwards here compared to how they’ve been going in the rest of this post: if we can get from to , then all forms of entropy go down when we go from to ! This ‘anti-second-law’ behavior is confusing at first, but familiar to me by now.
When I first learned all this stuff, I naturally thought of the following question—maybe you did too, just now. If are probability distributions and
for all , is it true that majorizes ?
Apparently the answer must be no, because Klimesh has gone to quite a bit of work to obtain a weaker conclusion: not that majorizes , but that majorizes for some probability distribution He calls this catalytic majorization, with serving as a ‘catalyst’:
I thank Vlatko Vedral here at the CQT for pointing this out!
Finally, here is a good general introduction to majorization, pointed out by Vasileios Anagnostopoulos:
• T. Ando, Majorization, doubly stochastic matrices, and comparison of eigenvalues, Linear Algebra and its Applications 118 (1989), 163-–248.