Stirling’s formula says
where means that the ratio of the two quantities goes to
as
Where does this formula come from? In particular, how does the number get involved? Where is the circle here?
To understand these things, I think a nonrigorous argument that can be made rigorous is more useful than a rigorous proof with all the ε’s dotted and the δ’s crossed. It’s important, I think, to keep the argument short. So let me do that.
The punchline will be that the comes from this formula:
And this, I hope you know, comes from squaring both sides and converting the left side into a double integral that you can do in polar coordinates, pulling out a factor of because the thing you’re integrating only depends on
not
Okay, here goes. We start with
This is easy to show using repeated integration by parts.
Next, we do this:
In first step we’re writing as
In the second we’re changing variables:
Next we use to bust things up:
All the hard work will come in showing this:
Given this, we get
and simplifying we get Stirling’s formulas:
Laplace’s method
So to prove Stirling’s formula, the big question is: how do we get
Let’s write it like this:
The trick is to note that as gets big, the integral will become dominated by the point where
is as small as possible. We can then approximate the integral by a Gaussian peaked at that point!
Notice that
so the function has a critical point at
and its second derivative is
there, so it’s a local minimum. Indeed this point is the unique minimum of our function on the whole interval
Then we use this:
Laplace’s Method. Suppose has a unique minimum at some point
and
Then
as
This says that asymptotically, the integral equals what we’d get if we replaced by the quadratic function whose value, first derivative and second derivative all match that of
at the point
With this quadratic replacing
you can do the integral by hand—it’s the integral of a Gaussian—and you get the right hand side.
Applying this formula to the problem at hand we get
where
and
So we get
and then letting we get what we want.
So, from this viewpoint—and there are others—the key to Stirling’s formula is Laplace’s method of approximating an integral like
with a Gaussian integral. And in the end, the crucial calculation is where we do that Gaussian integral, using
You can see the whole proof of Laplace’s method here:
• Wikipedia, Laplace’s method.
Physicists who have done quantum field theory will know that when push comes to shove it’s largely about Gaussian integrals. The limit we’re seeing here is like a ‘classical limit’ where
So they will be familiar with this idea.
There should be some deeper moral here, about how is related to a Gaussian process of some sort, but I don’t know it—even though I know how binomial coefficients approximate a Gaussian distribution. Do you know some deeper explanation, maybe in terms of probability theory and combinatorics, of why
winds up being asymptotically described by an integral of a Gaussian?
For a very nice account of some cruder versions of Stirling’s formula, try this blog article:
• Michael Weiss, Stirling’s formula: Ahlfors’ derivation, Diagonal Argument, 17 July 2019.
His ‘note’, which you can find there, will give you more intuition for why something like Stirling’s formula should be true. But I think the above argument explains the better than Ahlfors’ argument.
But in math, there are always mysteries within mysteries. Gaussians show up in probability theory when we add up lots of independent and identically distributed random variables. Could that be going on here somehow?
Yes! See this:
• Aditya Ghosh, A probabilistic proof of Stirling’s formula, Blog on Mathematics and Statistics, September 7, 2020.
Folks at the n-Category Café noticed more mysteries. is the probability that a randomly chosen function from an n-element set to itself is a permutation. Stirling’s formula is a cool estimate of this probability! Can we use this to prove Stirling’s formula? I don’t know!
So I don’t think we’ve gotten to the bottom of Stirling’s formula! Comments at the n-Category Café contain other guesses about what it might ‘really mean’. You can read them here. Also check out Section 3 here, which discusses many different articles on Stirling’s formula in the American Mathematical Monthly:
• Jonathan M. Borwein and Robert M. Corless, Gamma and factorial in the Monthly.
Judging by the number of articles in the Monthly on the subject, Stirling’s formula approximating n! for large n is by far the most popular aspect of the
function. There are “some remarks”, “notes”, more “remarks”; there are “simple proofs”, “direct proofs”, “new proofs”, “corrections”, “short proofs”, “very short proofs”, “elementary” proofs, “probabilistic” proofs, “new derivations”, and (our favourite title) “The (n+1)th proof”.
That should be “(n+1)st”.
I was reading a biography of grace hopper and when she was a math professor at vassar she had her students write an essay about stirling’s formula. Her reasoning was because they is no use of knowing math if you can’t teach it to others.
Interesting!
Whoa!! Mind definitely blown! That’s awesome; great post!
[…] Baez has a post outlining another derivation of the full Stirling formula, using Laplace’s method. It looks a lot easier than Ahlfors’ […]
You’re not going to compare Laplace’s method to stationary phase approximation to semiclassical limits of Feynman integrals? What self-control!
I thought I had! “Physicists who have done quantum field theory will know that when push comes to shove it’s largely about Gaussian integrals.” (Quantum field theorists typically Wick rotate to turn exp(ix2) into exp(-x2), but of course the idea is the same.)
Okay yeah I don’t know how I missed that line.
Perhaps a more transparent way to see how the Gaussian comes from is to parametrize
with
. Then
Therefore, we have

However, I don’t see how
is related to Gaussian process or binomial coefficients. Perhaps the binomial coefficients might be due to the very definition of
and its binomial expansion?
That’s nice.
I’m also discussing this on my other blog, the n-Category Café, and we’ve seen a nice probabilistic proof of Stirling’s formula using the Central Limit Theorem. It turns out that this statement, when translated into equations, is equivalent to Stirling’s formula:
This is stated a bit informally, but it’s about Poisson distributions.
Oh, yes! The form of integrand
itself suggests the Poisson distribution!
You can derive Stirling’s formula from the central limit theorem (CLT). Adding independent random variables means convolving their densities, which means multiplying their Fourier transforms. That is one approach to proving the CLT. If you have n independent random variables with an exponential density (with mean 1) you can do the convolutions directly, and get a gamma density, which has mean and variance equal to n. (You effectively did this calculation as part of your proof.) Then apply the CLT.
I’m not a physicist, but I think the CLT is analogous to the classical limit in quantum theory. The exponential density is a special case where exact calculations are possible. I can’t think of any other densities which convolve nicely, except the Cauchy, which is not covered by the CLT. It has an infrared divergence, perhaps.
Hah, I wrote the above in a text editor and pasted it in, before noticing John’s comment post about the CLT and the Poisson.
No problem. I actually don’t see how the CLT is analogous to the classical limit in quantum theory. That is, I can sense the analogy in a vague way, but I don’t see how the math would work. That sounds like something I should think about, since I’m busy exploring the connections between statistical mechanics and quantum mechanics!