I’m wondering if people talk about this. Maybe you know?

Given a self-adjoint operator H that’s bounded below and a density matrix D on some Hilbert space, we can define for any \beta > 0 a new density matrix

\displaystyle{ D_\beta = \frac{e^{-\beta H/2} \, D \, e^{-\beta H/2}}{\mathrm{tr}(e^{-\beta H/2} \, D \, e^{-\beta H/2})} }

I would like to call this the thermalization of D when H is a Hamiltonian and \beta = 1/kT where T is the temperature and k is Boltzmann’s constant.

For example, in the finite-dimensional case we can take D to be the identity matrix, normalized to have trace 1. Then D_\beta is the Gibbs state at temperature T: that is, the state of thermal equilibrium at temperature T.

But I want to know if you’ve seen people do this thermalization trick starting from some other density matrix D.

30 Responses to Thermalization

  1. JIP says:

    What if you perform a wick rotation of your thermalization?

    • John Baez says:

      If you just replace -\beta with it you get junk, namely

      \displaystyle{ \frac{e^{it H/2} \, D \, e^{itH/2}}{\mathrm{tr}(e^{it H/2} \, D \, e^{it H/2})} }

      The right way to evolve a density matrix in time gives

      \displaystyle{ \frac{e^{it H} \, D \, e^{-itH}}{\mathrm{tr}(e^{it H} \, D \, e^{-it H})} = e^{it H} \, D \, e^{-itH}   }

      where the equation holds by the cyclicity of the trace and \mathrm{tr}(D) = 1. I’m not so worried about the factor of 2; it’s the signs that are the serious issue here.

      But we definitely don’t want to look at

      \displaystyle{ \frac{e^{-\beta H/2} \, D \, e^{\beta H/2}}{\mathrm{tr}(e^{-\beta H/2} \, D \, e^{\beta H/2})} }

      because that’s not a density matrix! It’s not self-adjoint.

      • Toby Bartels says:

        You could make it self-adjoint by averaging it with its adjoint (which is analogous to making the other version normalized by dividing by its trace). But that wouldn't give you the Gibbs state, so it's not what you want.

  2. ben says:

    Probably not useful, but there is a variant of this that shows up in chaos, where half or quarter gibbs factors are interspersed with operator insertions (see e.g. the beginning of [1503.01409]). However the object they study is not a density matrix (it’s not normalized) and they are primarily interested in its trace, as a proxy for the thermal expectation value of the operators. The “thermalizers” have the effect of separating otherwise-coincident operator insertions by a thermal translation.

    I’m not an expert, but I haven’t seen this specific object in the literature. It is not obvious to me that correlators traced against the thermalization would have some simple relation to correlators traced against D. Do you have such an (or any) interpretation of this object?

    • John Baez says:


      Do you have such an (or any) interpretation of this object?

      The main reason it’s interesting is that it’s the simplest way you can take an arbitrary density matrix and multiply it by something like \exp(-\beta H) and get another density matrix. \exp(-\beta H) D can’t be a density matrix since it’s not self-adjoint, but \exp(-\beta H/2) D \exp(-\beta H/2) is self-adjoint and it becomes a density matrix after you normalize it.

      I think the interpretation is roughly that you start with the ‘prior’ D and adjust it, saying now you want to multiply the chance of being in a state of energy E by the Boltzmann factor \exp(- \beta E).

      • ben says:

        Yes, if D is thermal then “thermalizing” will just change the temperature. But in general,

        \mathrm{tr}[D_{\beta} O] = \sum_{m,n} e^{-\beta E_m/2} D_{mn} e^{-\beta E_n/2} O_{nm}

        so the interpretation for general D seems more subtle

  3. dummyt says:

    to make it more legit, probably one can cookup a channel or a Lindblad evolution to arrive asymptotically at this thermalized density from a given initial density…

  4. Peter Morgan says:

    Just working through this from a (very) slightly different perspective, the density matrix \hat D corresponds to the state \rho(\hat A)=\mathsf{Tr}[\hat A\hat D]. Your transformation is to \rho'(\hat A)=\frac{\rho(\mathrm{e}^{-\beta\hat H/2}\hat A\mathrm{e}^{-\beta\hat H/2})}{\rho(\mathrm{e}^{-\beta\hat H})}, so we can, as always, take the transformation as much to be of the measurement as it is of the state.
    I’m not sure how useful it is, but this brings to mind a formal measurement theory context, where the square root of operators is sometimes used as a primitive, as, for example, by Stanley Gudder in a recent preprint,

  5. Tez says:

    I guess it’s a natural thing to do when computing the fidelity between D and the Gibbs state (i.e. the fidelity is the trace norm of your D_\beta) which makes me think the “quantum info meets quantum thermo” community may have looked at it…

  6. Blake Stacey says:

    Hmm, it resembles a thing I’ve seen people do in trying to quantify “quantum coherence”, nonzero off-diagonal elements in a particular basis taken as a signifier of nonclassicality. One way to do that is to Procrusteanize a density matrix by simply discarding everything off the diagonal and then studying what’s left. (What little I know of the topic is gathered in the bibliography here.) This looks a bit like that, with the basis of interest being the eigenbasis of the Hamiltonian, and with the extra step of adjusting the weights afterward by the Boltzmann factors.

  7. Javier Prieto says:

    This is superficially similar to the quantum conditioning operation defined by Leifer and Spekkens (see table III and equations 9-14). Since your construction can be interpreted as a belief update maybe there’s some relation?

    • John Baez says:

      Thanks—yes, the operation I’m talking about is some sort of “belief update” where we exponentially decrease the likelihood of high-energy states. I’ll check out that paper.

      • ben says:

        This seems to be the right way to think about it (as a belief update that exponentially decreases the likelihood of high-energy states).

        The following is probably already clear to you, but since I was curious I thought I’d finish fleshing this out: if H is the hamiltonian, then

        \mathrm{tr}[D O] = \sum_{m,n} D_{mn} O_{nm}


        \mathrm{tr}[D_\beta O] = \sum_{m,n} e^{-\beta(E_m+E_n)} D_{mn} O_{nm},

        so the consequence of “thermalizing” is that the contribution of the mn matrix element of O to \langle O\rangle_{D_{\beta}} is suppressed, relative to its contribution to \langle O\rangle_D, by the product of boltzmann factors at the two energies.

        I am not certain that this is enough to justify the name “thermalization”, though. If D is any thermal density matrix (including the identity) then this process just changes the temperature, but for general D the resulting density matrix is not thermal.

        • John Baez says:

          Thanks! I am currently leaning toward a word like ‘chilling’, ‘cooling’, or ‘quenching’ rather than ‘thermalization’. ‘Quenching’ sounds really cool (pardon the pun), but its existing meaning in physics may be too well-established.

          By the way, I advocate the word ‘coolness’ for the parameter \beta = 1/kT.

        • ben says:

          (replying to John below)

          I agree that it is more like a kind of chilling or cooling, though not necessarily with respect to the original notion of temperature defined by D.

          Actually, I think your procedure is a special case of something closely analogous to what I would ordinarily call a quench (a sudden change in the hamiltonian). The only difference here is that the sudden change is in the modular hamiltonian, from K = -\log D to K_{\beta} = -\log D_{\beta}, i.e. it is a quench in modular rather than physical time. There is probably a way to think about this more physically in terms of the experience of an accelerating observer, whose proper time is modular time.

        • John Baez says:

          I forget what “modular time” is, though I know about Rindler spacetime. In particular I don’t know how a “quench in modular time” is, or how it would achieve what we want here.

        • ben says:

          By modular time, I mean unitarily evolving operators using the modular hamiltonian rather than the ordinary hamiltonian (which is often referred to as “modular flow”):

          O(s) \equiv \rho^{is} O(0) \rho^{-is},

          where O is any operator on the hilbert space on which \rho acts and I have suppressed all spacetime labels, which just go along for the ride. One nice property of modular flow is that it preserves the algebra of operators associated to any causal diamond. My impression is that this was first discussed in the 70s in the axiomatic QFT literature but has recently found use in holography and in proving the quantum null energy condition.

          Anyways, to connect to rindler: when \rho is the density matrix obtained by tracing the minkowski vacuum over a half-space, the modular hamiltonian is proportionate to the generator of boosts around the splitting surface. In this special case modular flow acts geometrically on the algebra, just like ordinary time evolution, except with the evolution generated by boosts instead. I’m not aware of a geometric interpretation in the general case.

          Re: “quench in modular time”, I may have been too glib: in your procedure, there is no sense in which the sudden change of the modular hamiltonian occurs at a particular value of the modular flow parameter. So it’s not literally a quench in modular time.

          Let me try to give a more accurate description instead. Consider some system that is controlled by an idealized experimentalist. At some particular lab time t_0, suppose the experimentalist changes the system’s density matrix from D to D_{\beta}, without changing the hamiltonian describing the system. At this lab time the modular hamiltonian suddenly changes, so one might be tempted to call what happened a quench in the modular hamiltonian. I’m not sure how useful this language is though, since it is not really a quench in the usual sense of suddenly changing the generator of dynamical evolution at some point along its flow (by pouring a bucket of icewater on the system, for example)

          Thanks for the nickname for \beta, by the way. Glad to see that zero temperature corresponds to infinite coolness!

        • John Baez says:

          I’ll have to think about what you’re saying. What’s \rho? I need to take what you’re saying and remove everything about spacetime, causal diamonds and see what’s left. All I’ve got is a density matrix D and a bounded-below self-adjoint operator H on the same Hilbert space. What’s “modular time” or the “modular Hamiltonian” in this context?

          Glad to see that zero temperature corresponds to infinite coolness!

          Yes: now the law of thermodynamics that used to say “it would require an infinite number of steps to reach absolute zero” becomes “it would require an infinite number of steps to reach infinite coolness”, which seems quite plausible.

        • ben says:

          Ah crap, my bad: apparently I’ve been hard-wired to use \rho for the density matrix. To conform to your notation, \rho should be replaced by D everywhere in my previous post. Anyways, given a quantum system described by the density matrix D, the modular hamiltonian K is defined to be K = -\log D, while “modular time” is the parameter I called s in my previous post. By this def’n, different density matrices on the same hilbert space have different associated modular hamiltonians and correspondingly generate different modular flows.

          Spacetime doesn’t play any role here except in the rindler discussion, which I only mentioned because I find it helpful in building some geometric intuition for the action of modular flow. For example, in QM you could take an EPR pair and consider the reduced density matrix obtained by tracing over the first partner. Modular flow will then mix the pauli matrices acting on the second partner.

        • John Baez says:

          Okay, now I get it. I like this “modular flow” idea a lot.

    • John Baez says:

      Cool, their operation in equation (10):

      M \star N = N^{1/2} \, M \, N^{1/2}

      is what I’m talking about here.

      Nitpick: they call this “conjugation” of M by N^{1/2}, but conjugation would give

      N^{1/2} \, M \, N^{-1/2}

      so that’s not the right terminology.

      • Toby Bartels says:

        I want to understand this operation as a natural operation on positive-semidefinite self-adjoint operators. Given two such operators A and B, we'd like to use their product AB. Disaster strikes immediately, as this may be neither semidefinite nor self-adjoint. To make something self-adjoint, a common way is to take its average with its adjoint, so we would be using \frac12AB+\frac12BA. Unfortunately, this is still not semidefinite, so we need something different.

        However, the arithmetic mean is not the only way to average things. We could instead use the geometric mean, \sqrt{AB}\sqrt{BA}. Of course, defining the principal square root of a matrix is a little tricky if it’s not semidefinite, so let's use \sqrt A\sqrt B\sqrt B\sqrt A instead. Now this is well-defined, positive-semidefinite, and self-adjoint (as you can verify from these properties of A and B, remembering that a principal square root is also positive-semidefinite). Of course, this formula simplifies to \sqrt A B\sqrt A, which is the formula above for B\star A, or B ⋆ A in plain text.

        This is not a kind of conjugation but rather a kind of multiplication. You can think of it as multiplying square roots. Of course, \sqrt A\sqrt B is not the principal square root of B ⋆ A (nor of A ⋆ B, which is different). But besides its principal square root, every positive-semidefinite self-adjoint operator A has various hermitian square roots: operators a such that a^\dag a = A. And \sqrt A \sqrt B is a hermitian square root of A ⋆ B; that is basically how A ⋆ B is defined. (More generally, if a is a hermitian square root of A, then a\sqrt B is a hermitian square root of A ⋆ B. But this isn't quite general enough to be interesting.)

        Now, A ⋆ B is not quite what you (John) want; to get a density operator, you still have to normalize this by dividing by its trace. Unfortunately, even if A and B are density operators, the trace of A ⋆ B may well be zero. (Essentially, this happens when A and B, thought of as information about a quantum system, are mutually contradictory.) But when one of these is a thermal density matrix, then this cannot happen, and so then you can normalize it. (Leifer & Spekkens apply this in a different context, where A is not a density operator at all but instead satisfies a trace condition related to B that guarantees that A ⋆ B is a density operator; thus, they have no need to normalize.)

  8. Bernat says:

    This is used in numerical calculations of quantum systems: starting with the free density matrix at high temperature the formula is recursively applied, with H being the perturbed Hamiltonian, to obtain the density matrix of the system at lower temperatures.

    See chapter 3.2.1 of Statistical Mechanics: Algorithms and Computations, by Werner Krauth, where it’s given the name “Trotter formula”.

    This method was first applied by Storer (1968) Path-integral calculation of quantum-statistical density matrix for attractive Coulomb forces, Journal of Mathematical Physics 9, 964–970

    • John Baez says:

      Yay, thanks! I have my own reasons for wanting to be interested in this construction, but it’s reassuring to see that people already use it for something.

    • Toby Bartels says:

      So the point is that the Hamiltonian is temperature dependent, so you can’t get the correct thermal state using e^{-\beta H}, so instead you use a lot of different factors of the form e^{-\Delta\beta H(\beta)}? And then the right way to do this is not \prod_{i=1}^n e^{-\Delta\beta_i H(\beta_i)} (which is not self-adjoint) but \prod_{i=n}^1 e^{-\Delta\beta_i H(\beta_i)/2} \prod_{i=1}^n e^{-\Delta\beta_i H(\beta_i)/2} (notice the reversed order in the first product). (And then you have to divide by the trace to normalize, which I’ve been ignoring.)

      If this is correct, then the theoretical state that we're approximating by this numerical technique may be written using product integrals (which I denote as \wp in place of the usual sum integral \int) as \wp_\beta^0 e^{\frac12H(\beta)d\beta} \wp_0^\beta e^{-\frac12H(\beta)d\beta}. (Since d\beta is not the \Delta\beta_i chosen by the approximation, but rather determined by the integral, it gets an extra minus sign when running backwards. Or use (\wp_0^\beta e^{-\frac12H(\beta)d\beta})^\dag (\wp_0^\beta e^{-\frac12H(\beta)d\beta}) if that seems clearer.)

  9. John Baez says:

    I think instead of calling this transformation of states “thermalization” I should call it “chilling” or “cooling” or “quenching”, since it’s the identity at \beta = 0, and then as we increase the parameter \beta it drives the system down to the lowest-energy state (or states).

    • Blake Stacey says:

      “Thermalization” sounds a bit too much like the end state is guaranteed to be thermal, or that it’s a time-evolution process where the system is coupled to a heat bath. I like the sound of “chilling”.

  10. This topic has been addressed by Alain Connes in:
    Entropy and the spectral action

    Click to access 1809.02944.pdf

  11. John Baez says:

    You can read my current thoughts on “thermalization” in Section 5 of this paper:

    Getting to the bottom of Noether’s theorem.

    though it may not make sense without reading Section 4.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.