I wrote a little book about entropy; here’s the current draft:
If you see typos and other mistakes, or have trouble understanding things, please let me know!
An alternative title would be 92 Tweets on Entropy, but people convinced me that title wouldn’t age well: in decade or two few people may remember what ‘tweets’ were.
Here is the foreword, which explains the basic idea.
Foreword
Once there was a thing called Twitter, where people exchanged short messages called ‘tweets’. While it had its flaws, I came to like it and eventually decided to teach a short course on entropy in the form of tweets. This little book is a slightly expanded version of that course.
It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of information we don’t know about a situation, which in principle we could learn. But how can we make this idea precise and quantitative? To focus the discussion I decided to tackle a specific puzzle: why does hydrogen gas at room temperature and pressure have an entropy corresponding to about 23 unknown bits of information per molecule? This gave me an excuse to explain these subjects:
• information
• Shannon entropy and Gibbs entropy
• the principle of maximum entropy
• the Boltzmann distribution
• temperature and coolness
• the relation between entropy, expected energy and temperature
• the equipartition theorem
• the partition function
• the relation between expected energy, free energy and entropy
• the entropy of a classical harmonic oscillator
• the entropy of a classical particle in a box
• the entropy of a classical ideal gas.
I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book! I have also avoided the role of entropy in biology, black hole physics, etc. Thus, the aspects of entropy most beloved by physics popularizers will not be found here. I also never say that entropy is ‘disorder’.
I have tried to say as little as possible about quantum mechanics, to keep the physics prerequisites low. However, Planck’s constant shows up in the formulas for the entropy of the three classical systems mentioned above. The reason for this is fascinating: Planck’s constant provides a unit of volume in position-momentum space, which is necessary to define the entropy of these systems. Thus, we need a tiny bit of quantum mechanics to get a good approximate formula for the entropy of hydrogen, even if we are trying our best to treat this gas classically.
Since I am a mathematical physicist, this book is full of math. I spend more time trying to make concepts precise and looking into strange counterexamples than an actual ‘working’ physicist would. If at any point you feel I am sinking into too many technicalities, don’t be shy about jumping to the next tweet. The really important stuff is in the boxes. It may help to reach the end before going back and learning all the details. It’s up to you.
Acknowledgements
I thank Jacopo Bertolotti for the animation at the top of this article. I’ve also gotten important corrections from many people, some listed in the comments below.


That’s a great definition of entropy! But I would think so, since you taught me what entropy is (not in class, but in This Week’s Finds week 27).
Thanks! That’s interesting: I don’t even remember what’s in week27. I’ll check it out. But a lot of the ideas here have been brewing very slowly ever since my grad school pal gave me a copy of Reif’s book on thermodynamics and stat mech.
The only definition of entropy in Week 27 is
. Still, the discussion of free and mixed states there set me on the road to understanding.
(Did the backslashes all get stripped? Well, you can probably figure out what I meant.)
It’s possible the new fancier WordPress interface for entering comments automatically destroys LaTeX. Also “operatorname” doesn’t work. But I fixed all that.
Typos on page 7: kilobyte -> kilobit [twice] (alternatively: 1024 -> 8192 [twice]), can related -> can relate, coints -> coins
Thanks—fixed! I’m also making other updates: improvements in the exposition.
The “can related” typo is still there.
Whoops! Fixed. I also added a new section: “Entropy comes in two parts”.
John! What a pleasure this was to read. Now I have to go back and do it piece by piece, step by step. I love the inversion of getting at Thermodynamics from inside out. So much to think about, Thanks, Bill
Thanks! A lot of people say thermodynamics only made sense to them after they learned statistical mechanics, and statistical mechanics only made sense to me after I learned information theory, so here I started with information theory. I would start even earlier, with probability theory, but that still doesn’t make sense to me. (Kinda kidding, but read the section “what is probability?”)
On page 7 you equate 1024 bits with 1 kilobyte. That’s not right as you need 8192 bits for a kilobyte. Also, there’s a typo “1024 fair coints”.
Thanks—fixed!
I went back and read up on the Sackur-Tetrode equation in Wikipedia since there was something that was bothering me about my understanding of it all. I know in PChem at UCR in 1966 that it was mentioned in passing and it was rementioned several times through my graduate career but I never really dug into it. The primary bothering comes from the issue of the thermal de Broglie length and Planck’s constant showing up in the S-T eqn. Sackur and Tetrode independently developed their equation in 1919 well before de Broglie got the ball rolling in 1924 with the wave nature of material particles. Wikipedia makes an ‘enigmatic’ comment about using a tiny volume in phase space as a trick to go from a classical continuous space to a discrete space to enable calculations. This all was regularized as the quick minds in the 20s figured out quantum mechanics. Now, I wonder if Sackur or Tetrode used h as the symbol for the discrete volume element and how quickly did h get used if they didn’t.
I guess I have two comments about all this. 1) is how useful an analysis and structuring you have made here is for going back and enabling one to work through in detail how the stuff goes together. I’m blown away by how little I really thought through the physics and how much fun it is to finally get around to it. 2) is how amazing the period from 1900 to1935 was for physics. Beyond too, but that’s another matter…
(I pick 1935 as when EPR came out. It took a long time to digest that one.)
Wholey Moley, just now reading “On the 100th anniversary of the Sackur–Tetrode equation”,
W. Grimus. 04 March 2013, Ann. der Phys…
Obviously I got 1919 wrong as to the date, 1911-1912 is correct and even more astounding!
I not going to pay for the original articles, but Grimus’ analysis does the job. It is clear that both regarded Planck’s constant as fundamental to the argument for discretizing phase space and both used actual data to figure out that their initial use of (zh) as the discretizing parameter yielded a value of ~1 for z, establishing Planck’s constant as fundamental. This is a wow moment in physics for me. Not only did they independently approach a problem with great theoretical insight, but they used experimental data to make their point. And it all starts with S = k ln(W).
It is indeed fun stuff! It sounds like you’re enjoying my book for the same reason I enjoyed writing it: there are a lot of interesting subtle issues lurking in physics, and often courses are in too much of a rush to cover the material to dig into these subtleties. I was amazed when I realized that the formula for the entropy of a ‘classical’ ideal gas involves Planck’s constant.
The paper by W. Grimus is free on the arXiv:
• W. Grimus, On the 100th anniversary of the Sackur-Tetrode equation.
p3 while keeping it thermal equilibrium
should be
keeping it (in or at) thermal equilibrium
Thanks! This is now fixed in the version on my website.
Awesome book! On p. 15 the characteristic function is redundant – you probably want to integrate chi_S over the entire real line.
Whoops! Yes. Thanks! This was the world’s sketchiest explanation of Lebesgue integration, but I want to integrate over
[…] Article URL: https://johncarlosbaez.wordpress.com/2024/07/20/what-is-entropy/ […]
This book is truly a great contribution to science, not in that it contains new results, but in that it provides young new scientists (and the older ones too) with a fantastic source and entry into this important notion of Entropy.
Thanks! Yes, I’m just trying to explain things clearly—adding careful sentences to explain the usual equations.
I‘ll read it as soon as I‘m finished with The Oxford Handbook of the History of Quantum Interpretations. I recently read Susskind‘s Theoretical Minimum on GR (the only one of the series I‘ve read, so far). It was surprisingly good. Volume 5 will be on cosmology and volume 6 on thermodynamics and statistical physics.
Is there a better justification than “well, it’s the only candidate I can think of, and it works” for the introduction of the Planck constant for making dp dq dimensionless ?
If you’re going to try to find a constant with units of action whose role is to let you chop position-momentum space into small ‘cells’ that you count to count states, I can’t imagine anything better than Planck’s constant h. As I mention in the section ‘Entropy and the thermal wavelength’, this is exactly what Bohr and Sommerfeld did in their early approach to quantization, which was later made rigorous in the study of geometric quantization. Check out this:
• Wikipedia, Old quantum theory: thermal properties of the harmonic oscillator.
It’s also interesting to look at the history. Sackur and Tetrode were forced to choose a unit of volume in phase space when they first computed the entropy of an ideal gas. They chose one based on Planck’s constant, and it gave the right answer!
I recommend this:
• Walter Grimus, 100th anniversary of the Sackur–Tetrode equation.
I think it would be an improvement to the paper to add a sentence or two, perhaps citing that paper, saying that you can actually determine what this “h” must be, experimentally, and that it turns out to be Planck’s constant. The climax of the linked paper, where the authors derive hbar to within 1% just using standard thermodynamic data tables, was pretty cool!
I think the real “aha” moment here is: in a classical setting (hbar = 0) you would get that entropy as defined is infinite. So presumably if you tried to determine what it was by experiments relating energy and temperature, you would get nonsensical/inconsistent/material-dependent/whatever results. But in fact, we CAN measure this quantity, so real life DOES seem to have a quantized phase space, AND this quantization matches the quantization we would later derive from quantum mechanics. AND that this was noticed by Sackur-Tetrode a decade before QM.
As written, the text says that h is a natural choice, that “this turns out to be correct”, and then it launches into a multi-page computation that isn’t obviously connected to any physical experiment. So the reader is given the impression that you’ve introduced h as a “fudge factor”, presumably by knowing in advance about quantum mechanics. Which is not very satisfying, and gives the impression that you’re spending multiple pages doing some sort of “fudged classical mechanics” for unclear reasons.
Thanks, I agree with your suggested improvements. Something like your version of the story will clarify, both historically and conceptually, how Planck’s constant entered what was superficially a purely classical computation of the entropy of an ideal gas.
I hadn’t included a link to Walter Grimus’ nice article because I’d only seen it on the phys.org website and I wasn’t sure that would be a stable location in the long run. But now I see it’s on the arXiv, so I will link to it in my book.
It will take me a few days to do this, since I’m trying to finish off another paper.
This is fascinating and I appreciate it. I look forward to reading. Eric
Hi, is the file server down? I can’t download from the link.
It’s working fine for me now. I don’t know what you experienced, but give it another try. It would be funny if you were trying to download it right when I was uploading a new version.
I’m enjoying seeing this all assembled together!
Typo p. 47:
> Indeed, the when people tried
Typo p. 53:
> energy depends depends quadratically
Typo p. 54:
> as soon as T gets exceeds about 90 kelvin.
Typo p.56:
> so ħω/k units of temperature!
Typo p. 72:
> Here’s the game plane.
Thanks for all these corrections, Scott! I’ve made all those changes, along with lots of other changes suggested by other people, and you can download a new improved version.
I’m happy to help. I also found these in the most recent version:
p.99: We use the subscript i for a gas of N distinguishable particles.
p.100: the entropy entropy of an ideal gas
p.109: high densities V/N
I’ve taken thermodynamics courses approached from different disciplines and I really liked Kittel and Kroemer’s Thermal Physics. Their notation is to use σ = S/k and τ = k T in place of S and T such that T S = kT S/k = τ σ, manifesting the basic truth from statistical mechanics that entropy is dimensionless and temperature has the same units as energy. (Much like a basic truth of SR that time has the same units as space and c is just a unit conversion factor.)
And of course, similar to your use of H and β.
Thanks, Scott—those errors are fixed in the latest version!
I think it’s important to have a notation for dimensionless entropy, so I’m glad Kittel and Kroemer have one, but people often call Shannon entropy H, so I used that—and argued that Shannon entropy doesn’t need to be computed in base 2.
I felt much less need for a symbol for dimensionless temperature, because its reciprocal, dimensionless coolness, is already called β, and I felt like arguing that coolness is more fundamental than temperature.
By the way, the new more fancy WordPress interface for entering comments seems to break LaTeX and even HTML, making it impossible for commenters to enter Greek letters and have them look decent.
LaTeX:
HTML: β
I can fix them up retroactively, but it’s sad.
I just write 𝑆 and 𝑇 for dimensionless entropy and temperature. The kelvin is a unit of energy just as much as the joule or the calorie per mole, and Boltzmann’s constant is 1.
You can make a Greek β by directly entering the Unicode character. (Well, I’m assuming this works as I make this comment, which has some other non-ASCII Unicode characters too, so we’ll see.)
This is a fine attitude as long as you don’t talk to experimental physicists or engineers who are wedded to the SI system. In my textbook I stuck with the SI system because I wanted to show we really can use the math to predict some experimental measurements that you see tabulated on NIST’s website. In my theoretical work I sometimes set Planck’s constant and Boltzmann’s constant to 1, but other times I leave them as variable because it’s interesting to study the ‘deformation theory’ involved in the limit
and
(which turn out to be formally analogous in ways that I’m fascinated by).
Deformation theory is different. But setting that aside, you can still use SI units; it’s just that you have a choice of such units. It can actually be nice to have everything given in both units, like giving an everyday quantity in both American and metric, or giving something from particle physics in both kg and eV.
Hi John,
An acquaintance of mine asked Google’s AI-powered Notebook LM to create a short podcast about your book, might you take the time to listen and tell if it’s an accurate summary? The podcast is displayed as a video in this LinkedIn post:
https://www.linkedin.com/posts/benoitraphael_%C3%A9coutez-ce-podcast-qui-parle-vous-allez-ugcPost-7242393124386308097-nzXy
Thanks a lot,
Matt
I don’t even have time to listen to math or physics podcasts made by experts — it’s much faster to browse their papers, and I have my own work to do — much less a podcast made by some machine summarizing what I said.
Skip the podcast: read my book, or even the first few pages of the book.
Someone emailed me and wrote:
The harmonic oscillator is not just about springs. Almost any system that vibrates can be approximately modeled as a harmonic oscillator with
position and
momentum coordinates! These include vibrations of a crystal, radiation in a box, the surface of a drum, molecule (when treated classically), a violin string, a classical string in string theory, etc. etc. So, the harmonic oscillator is fundamental to physics.
But yes, a classical harmonic oscillator can be used to model a rock of mass
hanging on a spring of spring constant
It oscillates with a frequency
In Hamiltonian mechanics you derive its equations of motion from the formula for its energy, or Hamiltonian:
where
is its momentum and
is its position (e.g. how high the rock is above its equilibrium position). With an area-preserving change of coordinates we can write
for some new coordinates
and
. This simplifies things so this is a standard way to think about the harmonic oscillator, especially when we introduce more position and momentum coordinates.
In statistical mechanics we assume we know nothing about our system except a few facts, e.g. the expected value of the energy, and use the principle of maximum entropy to derive a probability distribution of states. The stuff in the book about the principle of maximum entropy was supposed to explain the overall methodology here. I mostly apply this methodology to systems where we know only the expected value of the energy, since this introduces the concept of temperature. The harmonic oscillator is a great example to illustrate this.
Note that in classical mechanics we seek to predict the future state of a system given complete knowledge of its present state, while in the simplest part of classical statistical mechanics we seek to find the best probability distribution of states now based on partial knowledge now. So the two subjects look almost disjoint at first: one is all about change and no probability theory, while the is all about probability theory and doesn’t mention change at all—at first. If you go further these subjects become unified, but I wanted to keep my book short, so I left out all discussions of time and change until a quick bit about the Second Law near the end.
Yes, that’s almost right, but: if we don’t know anything about the oscillator’s state the principle of maximum entropy attempts to give us the ‘uniform probability distribution’ on the
plane, which doesn’t exist. If we know the expected energy we get a nice Gaussian probability distribution on the
plane, which I compute.
We do it to understand the thermodynamic behavior of any vibrating system: how its expected energy is related to its temperature, entropy, and free energy.
I’m sure lots of people will be freaked out by how I start by applying statistical mechanics to a harmonic oscillator—it’s somewhat radical to start with such a simple system, but as you’ll see, it’s the perfect lead-in to the ideal gas. I hope this helps a little.
A further exchange with my correspondent:
Oh no! Well, it’s never too late to learn the truth.
Teachers emphasize the equipartition theorem too much, and don’t emphasize how rarely its hypotheses hold. They fail to point out that obviously an ice cube or boiling pot of water must massively violate these hypotheses, since it takes a lot of energy to raise its temperature a wee bit.
Even my book spends too much time on systems that obey the hypotheses of this theorem. At least I state the hypotheses (on page 47) and point out that they don’t always hold. But I don’t emphasize how ridiculously rare it is for them to hold! I say it doesn’t hold for quantum systems, but some people may think those are exotic, rather than almost everything we see, like ice cubes and boiling pots of water. And I don’t point out that the systems with negative temperature, which I discussed earlier, don’t have temperature proportional to energy.
It’s well-known, but I worked it out myself while writing the tweets that became my book. It’s surprisingly subtle, mainly in how Planck’s constant gets involved.
Nothing is wrong, so I don’t know what you’re worrying about. I’ll just say some stuff.
I’ll take the perspective of statistical mechanics:
1) If you know the energy
exactly and know nothing more, to maximize entropy
and
will be described by a probability distribution evenly spread out over the ellipse
In the nicer coordinates I mentioned earlier, this becomes a circle, and we get the usual rotationally invariant probability distribution on the circle.
2) But my book is all about what happens if you know the expected value of the energy. I explain how this is the same as knowing the temperature. If we maximize entropy under this constraint we get a Gaussian probability distribution in the pq plane.
In physics, 1) is called a microcanonical ensemble and I don’t talk about it at all in my book. 2) is called a canonical ensemble or Boltzmann distribution or Gibbs distribution.
Sure! Like most systems, a spring at fixed temperature will have energy fluctuations: they’re called thermal fluctuations.
Practically speaking, if you have any system at nonzero temperature, energy will go between that system and its surroundings in a random way, so its energy will fluctuate randomly.
But in my book I deliberately avoid talking about ‘mechanisms’ that make quantities vary randomly, because this can become a complicated can of worms. Instead I take Jaynes’ viewpoint that if we have limited information about a situation we should model it using the maximum entropy probability distribution subject to the constraints of what we know. Knowing the temperature is just a fancy name for knowing the expected energy—not that they’re proportional, of course!—so at fixed temperature we will usually get a probability distribution of energies with nonzero standard deviation.
Here’s an exercise which I should add to the book:
Puzzle. Suppose you have a classical harmonic oscillator of mass 1 kilogram and spring constant 1 newton/meter at temperature 300 kelvin (about room temperature). Compute the expected value of its energy and the standard deviation of its energy!
I give the formula for the probability distribution of
and
at any temperature
so you can use those to figure out the probability distribution of energy and the rest is calculus.
I’ll give a hint: the expected energy is tiny (because our oscillator has just one degree of freedom, and the equipartition theorem applies) and the standard deviation is tiny too.
I should also give this exercise for a mole of an ideal monatomic gas at 300 kelvin. Here the expected energy is much bigger (since we have 3 times Avogadro’s number degrees of freedom, and the equipartition theorem still applies)… but the standard deviation in energy is vastly smaller than the expected energy! It should be roughly
as big as the expected energy, where
is Avogadro’s number. So these fluctuations are absurdly hard to detect until you look at systems with far fewer atoms (as people now do).
I enjoyed the way in which entropy was introduced in the form of a central mystery (namely hydrogen) and developed into a hierarchy of goals that led to solving this mystery. Thank you for writing this book, it was an excellent journey.
I have found some minor errata:
Pg 14 small typo in the box: countable collection of subets -> subsets
-> missing a factor of
in the exponential
Pg 14 second line from the bottom, should be: If S,T ∈ M and S ⊆ T then m(T) = m(S) + m(T − S).
Pg 27 higher-dimensional version of a tetahedron -> tetrahedron
Pg 35 put it in an magnetic field -> a magnetic field
Pg 58 number of acccessible states -> accessible states
Pg 73 In Puzzle 44, entropy of an energetic set is negative of what we derived in Pg 71
Pg 80 with a a free classical particle -> remove one “a”
Pg 92 integral of
Thanks, I’m glad you liked the “plot” of the book. I thought it would be fun to make a whole book about a single puzzle, where part of the puzzle is understanding what the answer to the puzzle means.
And thanks for catching those mistakes! I’ll fix them now. I don’t know why that one formula doesn’t parse, but I can read it behind the scenes here.
Okay, the fixed version is on my website at the usual place. By the way, there’s also a version using STIX fonts, which some say looks better on a cell phone.