What is Entropy?

I wrote a little book about entropy; here’s the current draft:

What is Entropy?

If you see typos and other mistakes, or have trouble understanding things, please let me know!

An alternative title would be 92 Tweets on Entropy, but people convinced me that title wouldn’t age well: in decade or two few people may remember what ‘tweets’ were.

Here is the foreword, which explains the basic idea.

Foreword

Once there was a thing called Twitter, where people exchanged short messages called ‘tweets’. While it had its flaws, I came to like it and eventually decided to teach a short course on entropy in the form of tweets. This little book is a slightly expanded version of that course.

It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of information we don’t know about a situation, which in principle we could learn. But how can we make this idea precise and quantitative? To focus the discussion I decided to tackle a specific puzzle: why does hydrogen gas at room temperature and pressure have an entropy corresponding to about 23 unknown bits of information per molecule? This gave me an excuse to explain these subjects:

• information
• Shannon entropy and Gibbs entropy
• the principle of maximum entropy
• the Boltzmann distribution
• temperature and coolness
• the relation between entropy, expected energy and temperature
• the equipartition theorem
• the partition function
• the relation between expected energy, free energy and entropy
• the entropy of a classical harmonic oscillator
• the entropy of a classical particle in a box
• the entropy of a classical ideal gas.

I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book! I have also avoided the role of entropy in biology, black hole physics, etc. Thus, the aspects of entropy most beloved by physics popularizers will not be found here. I also never say that entropy is ‘disorder’.

I have tried to say as little as possible about quantum mechanics, to keep the physics prerequisites low. However, Planck’s constant shows up in the formulas for the entropy of the three classical systems mentioned above. The reason for this is fascinating: Planck’s constant provides a unit of volume in position-momentum space, which is necessary to define the entropy of these systems. Thus, we need a tiny bit of quantum mechanics to get a good approximate formula for the entropy of hydrogen, even if we are trying our best to treat this gas classically.

Since I am a mathematical physicist, this book is full of math. I spend more time trying to make concepts precise and looking into strange counterexamples than an actual ‘working’ physicist would. If at any point you feel I am sinking into too many technicalities, don’t be shy about jumping to the next tweet. The really important stuff is in the boxes. It may help to reach the end before going back and learning all the details. It’s up to you.

Acknowledgements

I thank Jacopo Bertolotti for the animation at the top of this article. I’ve also gotten important corrections from many people, some listed in the comments below.

51 Responses to What is Entropy?

  1. Toby Bartels says:


    That’s a great definition of entropy! But I would think so, since you taught me what entropy is (not in class, but in This Week’s Finds week 27).

    • John Baez says:

      Thanks! That’s interesting: I don’t even remember what’s in week27. I’ll check it out. But a lot of the ideas here have been brewing very slowly ever since my grad school pal gave me a copy of Reif’s book on thermodynamics and stat mech.

      • Toby Bartels says:

        The only definition of entropy in Week 27 is -\mathrm{tr}(\rho \ln \rho). Still, the discussion of free and mixed states there set me on the road to understanding.

        • Toby Bartels says:

          (Did the backslashes all get stripped? Well, you can probably figure out what I meant.)

      • John Baez says:

        It’s possible the new fancier WordPress interface for entering comments automatically destroys LaTeX. Also “operatorname” doesn’t work. But I fixed all that.

  2. Marc Nardmann says:

    Typos on page 7: kilobyte -> kilobit [twice] (alternatively: 1024 -> 8192 [twice]), can related -> can relate, coints -> coins

  3. Bill Bottenberg says:

    John! What a pleasure this was to read. Now I have to go back and do it piece by piece, step by step. I love the inversion of getting at Thermodynamics from inside out. So much to think about, Thanks, Bill

    • John Baez says:

      Thanks! A lot of people say thermodynamics only made sense to them after they learned statistical mechanics, and statistical mechanics only made sense to me after I learned information theory, so here I started with information theory. I would start even earlier, with probability theory, but that still doesn’t make sense to me. (Kinda kidding, but read the section “what is probability?”)

  4. Mark James says:

    On page 7 you equate 1024 bits with 1 kilobyte. That’s not right as you need 8192 bits for a kilobyte. Also, there’s a typo “1024 fair coints”.

  5. Bill Bottenberg says:

    I went back and read up on the Sackur-Tetrode equation in Wikipedia since there was something that was bothering me about my understanding of it all. I know in PChem at UCR in 1966 that it was mentioned in passing and it was rementioned several times through my graduate career but I never really dug into it. The primary bothering comes from the issue of the thermal de Broglie length and Planck’s constant showing up in the S-T eqn. Sackur and Tetrode independently developed their equation in 1919 well before de Broglie got the ball rolling in 1924 with the wave nature of material particles. Wikipedia makes an ‘enigmatic’ comment about using a tiny volume in phase space as a trick to go from a classical continuous space to a discrete space to enable calculations. This all was regularized as the quick minds in the 20s figured out quantum mechanics. Now, I wonder if Sackur or Tetrode used h as the symbol for the discrete volume element and how quickly did h get used if they didn’t.

    I guess I have two comments about all this. 1) is how useful an analysis and structuring you have made here is for going back and enabling one to work through in detail how the stuff goes together. I’m blown away by how little I really thought through the physics and how much fun it is to finally get around to it. 2) is how amazing the period from 1900 to1935 was for physics. Beyond too, but that’s another matter…

    (I pick 1935 as when EPR came out. It took a long time to digest that one.)

  6. Bill Bottenberg says:

    Wholey Moley, just now reading “On the 100th anniversary of the Sackur–Tetrode equation”,
    W. Grimus. 04 March 2013, Ann. der Phys…

    Obviously I got 1919 wrong as to the date, 1911-1912 is correct and even more astounding!

    I not going to pay for the original articles, but Grimus’ analysis does the job. It is clear that both regarded Planck’s constant as fundamental to the argument for discretizing phase space and both used actual data to figure out that their initial use of (zh) as the discretizing parameter yielded a value of ~1 for z, establishing Planck’s constant as fundamental. This is a wow moment in physics for me. Not only did they independently approach a problem with great theoretical insight, but they used experimental data to make their point. And it all starts with S = k ln(W).

    • John Baez says:

      It is indeed fun stuff! It sounds like you’re enjoying my book for the same reason I enjoyed writing it: there are a lot of interesting subtle issues lurking in physics, and often courses are in too much of a rush to cover the material to dig into these subtleties. I was amazed when I realized that the formula for the entropy of a ‘classical’ ideal gas involves Planck’s constant.

      The paper by W. Grimus is free on the arXiv:

      • W. Grimus, On the 100th anniversary of the Sackur-Tetrode equation.

  7. Greg Egan says:

    p3 while keeping it thermal equilibrium

    should be

    keeping it (in or at) thermal equilibrium

  8. Paul Schwahn says:

    Awesome book! On p. 15 the characteristic function is redundant – you probably want to integrate chi_S over the entire real line.

  9. Jan says:

    This book is truly a great contribution to science, not in that it contains new results, but in that it provides young new scientists (and the older ones too) with a fantastic source and entry into this important notion of Entropy.

    • John Baez says:

      Thanks! Yes, I’m just trying to explain things clearly—adding careful sentences to explain the usual equations.


    • I‘ll read it as soon as I‘m finished with The Oxford Handbook of the History of Quantum Interpretations. I recently read Susskind‘s Theoretical Minimum on GR (the only one of the series I‘ve read, so far). It was surprisingly good. Volume 5 will be on cosmology and volume 6 on thermodynamics and statistical physics.

  10. hwold says:

    Is there a better justification than “well, it’s the only candidate I can think of, and it works” for the introduction of the Planck constant for making dp dq dimensionless ?

    • John Baez says:

      If you’re going to try to find a constant with units of action whose role is to let you chop position-momentum space into small ‘cells’ that you count to count states, I can’t imagine anything better than Planck’s constant h. As I mention in the section ‘Entropy and the thermal wavelength’, this is exactly what Bohr and Sommerfeld did in their early approach to quantization, which was later made rigorous in the study of geometric quantization. Check out this:

      • Wikipedia, Old quantum theory: thermal properties of the harmonic oscillator.

      It’s also interesting to look at the history. Sackur and Tetrode were forced to choose a unit of volume in phase space when they first computed the entropy of an ideal gas. They chose one based on Planck’s constant, and it gave the right answer!

      I recommend this:

      • Walter Grimus, 100th anniversary of the Sackur–Tetrode equation.

      • Andrew Poelstra says:

        I think it would be an improvement to the paper to add a sentence or two, perhaps citing that paper, saying that you can actually determine what this “h” must be, experimentally, and that it turns out to be Planck’s constant. The climax of the linked paper, where the authors derive hbar to within 1% just using standard thermodynamic data tables, was pretty cool!

        I think the real “aha” moment here is: in a classical setting (hbar = 0) you would get that entropy as defined is infinite. So presumably if you tried to determine what it was by experiments relating energy and temperature, you would get nonsensical/inconsistent/material-dependent/whatever results. But in fact, we CAN measure this quantity, so real life DOES seem to have a quantized phase space, AND this quantization matches the quantization we would later derive from quantum mechanics. AND that this was noticed by Sackur-Tetrode a decade before QM.

        As written, the text says that h is a natural choice, that “this turns out to be correct”, and then it launches into a multi-page computation that isn’t obviously connected to any physical experiment. So the reader is given the impression that you’ve introduced h as a “fudge factor”, presumably by knowing in advance about quantum mechanics. Which is not very satisfying, and gives the impression that you’re spending multiple pages doing some sort of “fudged classical mechanics” for unclear reasons.

      • John Baez says:

        Thanks, I agree with your suggested improvements. Something like your version of the story will clarify, both historically and conceptually, how Planck’s constant entered what was superficially a purely classical computation of the entropy of an ideal gas.

        I hadn’t included a link to Walter Grimus’ nice article because I’d only seen it on the phys.org website and I wasn’t sure that would be a stable location in the long run. But now I see it’s on the arXiv, so I will link to it in my book.

        It will take me a few days to do this, since I’m trying to finish off another paper.

  11. This is fascinating and I appreciate it. I look forward to reading. Eric

  12. Jay Somedon says:


    Hi, is the file server down? I can’t download from the link.

    • John Baez says:

      It’s working fine for me now. I don’t know what you experienced, but give it another try. It would be funny if you were trying to download it right when I was uploading a new version.

  13. scentoni says:

    I’m enjoying seeing this all assembled together!
    Typo p. 47:
    > Indeed, the when people tried

    • Scott Centoni says:

      Typo p. 53:
      > energy depends depends quadratically

    • Scott Centoni says:

      Typo p. 54:
      > as soon as T gets exceeds about 90 kelvin.

    • Scott Centoni says:

      Typo p.56:
      > so ħω/k units of temperature!

    • Scott Centoni says:

      Typo p. 72:
      > Here’s the game plane.

    • John Baez says:

      Thanks for all these corrections, Scott! I’ve made all those changes, along with lots of other changes suggested by other people, and you can download a new improved version.

      • Scott Centoni says:

        I’m happy to help. I also found these in the most recent version:

        p.99: We use the subscript i for a gas of N distinguishable particles.

        p.100: the entropy entropy of an ideal gas

        p.109: high densities V/N

        I’ve taken thermodynamics courses approached from different disciplines and I really liked Kittel and Kroemer’s Thermal Physics. Their notation is to use σ = S/k and τ = k T in place of S and T such that T S = kT S/k = τ σ, manifesting the basic truth from statistical mechanics that entropy is dimensionless and temperature has the same units as energy. (Much like a basic truth of SR that time has the same units as space and c is just a unit conversion factor.)

        • Scott Centoni says:

          And of course, similar to your use of H and β.

        • John Baez says:

          Thanks, Scott—those errors are fixed in the latest version!

          I think it’s important to have a notation for dimensionless entropy, so I’m glad Kittel and Kroemer have one, but people often call Shannon entropy H, so I used that—and argued that Shannon entropy doesn’t need to be computed in base 2.

          I felt much less need for a symbol for dimensionless temperature, because its reciprocal, dimensionless coolness, is already called β, and I felt like arguing that coolness is more fundamental than temperature.

        • John Baez says:

          By the way, the new more fancy WordPress interface for entering comments seems to break LaTeX and even HTML, making it impossible for commenters to enter Greek letters and have them look decent.

          LaTeX: beta

          HTML: β

          I can fix them up retroactively, but it’s sad.

        • Toby Bartels says:


          I just write 𝑆 and 𝑇 for dimensionless entropy and temperature. The kelvin is a unit of energy just as much as the joule or the calorie per mole, and Boltzmann’s constant is 1.

          You can make a Greek β by directly entering the Unicode character. (Well, I’m assuming this works as I make this comment, which has some other non-ASCII Unicode characters too, so we’ll see.)

        • John Baez says:

          I just write 𝑆 and 𝑇 for dimensionless entropy and temperature. The kelvin is a unit of energy just as much as the joule or the calorie per mole, and Boltzmann’s constant is 1.

          This is a fine attitude as long as you don’t talk to experimental physicists or engineers who are wedded to the SI system. In my textbook I stuck with the SI system because I wanted to show we really can use the math to predict some experimental measurements that you see tabulated on NIST’s website. In my theoretical work I sometimes set Planck’s constant and Boltzmann’s constant to 1, but other times I leave them as variable because it’s interesting to study the ‘deformation theory’ involved in the limit \hbar \to 0 and k \to 0 (which turn out to be formally analogous in ways that I’m fascinated by).

        • Toby Bartels says:


          Deformation theory is different. But setting that aside, you can still use SI units; it’s just that you have a choice of such units. It can actually be nice to have everything given in both units, like giving an everyday quantity in both American and metric, or giving something from particle physics in both kg and eV.

  14. Matt says:

    Hi John,

    An acquaintance of mine asked Google’s AI-powered Notebook LM to create a short podcast about your book, might you take the time to listen and tell if it’s an accurate summary? The podcast is displayed as a video in this LinkedIn post:
    https://www.linkedin.com/posts/benoitraphael_%C3%A9coutez-ce-podcast-qui-parle-vous-allez-ugcPost-7242393124386308097-nzXy
    Thanks a lot,

    Matt

    • John Baez says:

      I don’t even have time to listen to math or physics podcasts made by experts — it’s much faster to browse their papers, and I have my own work to do — much less a podcast made by some machine summarizing what I said.

      Skip the podcast: read my book, or even the first few pages of the book.

  15. John Baez says:

    Someone emailed me and wrote:

    Could you define the “classical” harmonic oscillator for the benefit of those who are thinking of a simple spring or pendulum for which momentum and position are sinusoidal functions of time, temperature would seem to play no role and probability distributions are nowhere to be seen?

    The harmonic oscillator is not just about springs. Almost any system that vibrates can be approximately modeled as a harmonic oscillator with n position and n momentum coordinates! These include vibrations of a crystal, radiation in a box, the surface of a drum, molecule (when treated classically), a violin string, a classical string in string theory, etc. etc. So, the harmonic oscillator is fundamental to physics.

    But yes, a classical harmonic oscillator can be used to model a rock of mass m hanging on a spring of spring constant k.  It oscillates with a frequency

    \omega = \sqrt{k/m}

    In Hamiltonian mechanics you derive its equations of motion from the formula for its energy, or Hamiltonian:

    H = p^2/2m + kq^2/2

    where p is its momentum and q is its position (e.g. how high the rock is above its equilibrium position).  With an area-preserving change of coordinates we can write

    H = \omega^2 (p^2 + q^2)/2

    for some new coordinates p and q.  This simplifies things so this is a standard way to think about the harmonic oscillator, especially when we introduce more position and momentum coordinates.

    In statistical mechanics we assume we know nothing about our system except a few facts, e.g. the expected value of the energy, and use the principle of maximum entropy to derive a probability distribution of states.  The stuff in the book about the principle of maximum entropy was supposed to explain the overall methodology here. I mostly apply this methodology to systems where we know only the expected value of the energy, since this introduces the concept of temperature.   The harmonic oscillator is a great example to illustrate this.

    Note that in classical mechanics we seek to predict the future state of a system given complete knowledge of its present state, while in the simplest part of classical statistical mechanics we seek to find the best probability distribution of states now based on partial knowledge now. So the two subjects look almost disjoint at first: one is all about change and no probability theory, while the is all about probability theory and doesn’t mention change at all—at first. If you go further these subjects become unified, but I wanted to keep my book short, so I left out all discussions of time and change until a quick bit about the Second Law near the end.

    Maybe we mean a spring where we never look and don’t know its p & q at any particular time, so we model the oscillator’s state as random (that doesn’t seem quite right)?

    Yes, that’s almost right, but: if we don’t know anything about the oscillator’s state the principle of maximum entropy attempts to give us the ‘uniform probability distribution’ on the pq plane, which doesn’t exist.   If we know the expected energy we get a nice Gaussian probability distribution on the p q plane, which I compute.

    Do we apply thermodynamics to a classical spring only to understand thermodynamics better, or does this also help us to understand springs better?

    We do it to understand the thermodynamic behavior of any vibrating system: how its expected energy is related to its temperature, entropy, and free energy.

    I’m sure lots of people will be freaked out by how I start by applying statistical mechanics to a harmonic oscillator—it’s somewhat radical to start with such a simple system, but as you’ll see, it’s the perfect lead-in to the ideal gas. I hope this helps a little.

    • John Baez says:

      A further exchange with my correspondent:

      (I was one of those “poor benighted souls” who thought temperature was nothing but energy per degree of freedom).

      Oh no!  Well, it’s never too late to learn the truth.

      Teachers emphasize the equipartition theorem too much, and don’t emphasize how rarely its hypotheses hold. They fail to point out that obviously an ice cube or boiling pot of water must massively violate these hypotheses, since it takes a lot of energy to raise its temperature a wee bit.

      Even my book spends too much time on systems that obey the hypotheses of this theorem.  At least I state the hypotheses (on page 47) and point out that they don’t always hold. But I don’t emphasize how ridiculously rare it is for them to hold! I say it doesn’t hold for quantum systems, but some people may think those are exotic, rather than almost everything we see, like ice cubes and boiling pots of water. And I don’t point out that the systems with negative temperature, which I discussed earlier, don’t have temperature proportional to energy.

      Many years after my college days I had the thought that if I could just understand entropy, etc., in the most simple case that I could think of (a single particle in a one-dimensional box), then the fog in my understanding might start to clear; alas, I couldn’t work it out. So, I was excited when I saw your book and found that you had explained exactly that problem! 

      It’s well-known, but I worked it out myself while writing the tweets that became my book. It’s surprisingly subtle, mainly in how Planck’s constant gets involved.

      Your explanation of the difference in viewpoint between classical and statistical mechanics in the context of a classical harmonic oscillator was very helpful. I’m still a bit confused about the probability distribution: in the case of a simple classical spring in motion, the energy E is constant and p & q vary with time over a finite range; what do I have wrong?

      Nothing is wrong, so I don’t know what you’re worrying about. I’ll just say some stuff.

      I’ll take the perspective of statistical mechanics:

      1) If you know the energy E exactly and know nothing more, to maximize entropy p and q will be described by a probability distribution evenly spread out over the ellipse

      p^2/2m + kq^2/2 = E

      In the nicer coordinates I mentioned earlier, this becomes a circle, and we get the usual rotationally invariant probability distribution on the circle.

      2) But my book is all about what happens if you know the expected value of the energy.  I explain how this is the same as knowing the temperature.  If we maximize entropy under this constraint we get a Gaussian probability distribution in the pq plane.

      In physics, 1) is called a microcanonical ensemble and I don’t talk about it at all in my book.  2) is called a canonical ensemble or Boltzmann distribution or Gibbs distribution.

       In the case of a real macroscopic spring does the ambient T really cause E to vary in any meaningful way?

      Sure! Like most systems, a spring at fixed temperature will have energy fluctuations: they’re called thermal fluctuations.

      Practically speaking, if you have any system at nonzero temperature, energy will go between that system and its surroundings in a random way, so its energy will fluctuate randomly.

      But in my book I deliberately avoid talking about ‘mechanisms’ that make quantities vary randomly, because this can become a complicated can of worms. Instead I take Jaynes’ viewpoint that if we have limited information about a situation we should model it using the maximum entropy probability distribution subject to the constraints of what we know. Knowing the temperature is just a fancy name for knowing the expected energy—not that they’re proportional, of course!—so at fixed temperature we will usually get a probability distribution of energies with nonzero standard deviation.

      Here’s an exercise which I should add to the book:

      Puzzle. Suppose you have a classical harmonic oscillator of mass 1 kilogram and spring constant 1 newton/meter at temperature 300 kelvin (about room temperature). Compute the expected value of its energy and the standard deviation of its energy!

      I give the formula for the probability distribution of p and q at any temperature T, so you can use those to figure out the probability distribution of energy and the rest is calculus.

      I’ll give a hint: the expected energy is tiny (because our oscillator has just one degree of freedom, and the equipartition theorem applies) and the standard deviation is tiny too.

      I should also give this exercise for a mole of an ideal monatomic gas at 300 kelvin. Here the expected energy is much bigger (since we have 3 times Avogadro’s number degrees of freedom, and the equipartition theorem still applies)… but the standard deviation in energy is vastly smaller than the expected energy! It should be roughly 1/\sqrt{N} as big as the expected energy, where N is Avogadro’s number. So these fluctuations are absurdly hard to detect until you look at systems with far fewer atoms (as people now do).

  16. Niraj Venkat says:

    I enjoyed the way in which entropy was introduced in the form of a central mystery (namely hydrogen) and developed into a hierarchy of goals that led to solving this mystery. Thank you for writing this book, it was an excellent journey.

    I have found some minor errata:

    Pg 14 small typo in the box: countable collection of subets -> subsets
    Pg 14 second line from the bottom, should be: If S,T ∈ M and S ⊆ T then m(T) = m(S) + m(T − S).
    Pg 27 higher-dimensional version of a tetahedron -> tetrahedron
    Pg 35 put it in an magnetic field -> a magnetic field
    Pg 58 number of acccessible states -> accessible states
    Pg 73 In Puzzle 44, entropy of an energetic set is negative of what we derived in Pg 71
    Pg 80 with a a free classical particle -> remove one “a”
    Pg 92 integral of \exp(−\vec{p} \cdot \vec{p}/2m) -> missing a factor of \beta in the exponential

    • John Baez says:

      Thanks, I’m glad you liked the “plot” of the book. I thought it would be fun to make a whole book about a single puzzle, where part of the puzzle is understanding what the answer to the puzzle means.

      And thanks for catching those mistakes! I’ll fix them now. I don’t know why that one formula doesn’t parse, but I can read it behind the scenes here.

    • John Baez says:

      Okay, the fixed version is on my website at the usual place. By the way, there’s also a version using STIX fonts, which some say looks better on a cell phone.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Your email address will not be published. Required fields are marked *