If you like computer calculations, here’s a little challenge for you. Oscar Dahlsten may have solved it, but we’d love for you to check his work. It’s pretty important for the foundations of thermodynamics, but you don’t need to know any physics or even anything beyond a little algebra to tackle it! First I’ll explain it in really simple terms, then I’ll remind you a bit of why it matters.
We’re looking for two lists of nonnegative numbers, of the same length, listed in decreasing order:
that sum to 1:
and that obey this inequality:
for all (ignoring
), yet do not obey these inequalities:
for all
Oscar’s proposed solution is this:
Can you see if this works? Is there a simpler example, like one with lists of just 3 numbers?
This question came up near the end of my post More Second Laws of Thermodynamics. I phrased the question with a bit more jargon, and said a lot more about its significance. Suppose we have two probability distributions on a finite set, say and
We say
majorizes
if
for all when we write both lists of numbers in decreasing order. This means
is ‘less flat’ than
, so it should have less entropy. And indeed it does: not just for ordinary entropy, but also for Rényi entropy! The Rényi entropy of
is defined by
where or
. We can also define Rényi entropy for
by taking a limit, and at
we get the ordinary entropy
The question is whether majorization is more powerful than Rényi entropy as a tool to to tell when one probability distribution is less flat than another. I know that if majorizes
its Rényi entropy is less than than that of
for all
Your mission, should you choose to accept it, is to show the converse is not true.

I didn’t prove it rigorously, but a numerical plot of Oscar’s example looks extremely good. I’m plotting
so the desired statement is that the only zero is at
Here are plots for
and 
That looks convincing—thanks a lot, David! For those who haven’t thought about this at at all: we want the function graphed here to be negative for
and positive for 
By the way, there are simple formulas for the Rényi entropy at
and
which allow us to understand exactly what’s going on at these points. Oscar used them to construct his example. At
we get the min-entropy and at
we get the max-entropy, which is the logarithm of the number of
such that
.
Since both of Oscar’s probability distributions have the same max-entropy, the curve you graphed approaches zero as
just as it looks.
It approaches a certain negative value as
though the eye could be fooled into thinking it approaches zero.
The inequality seems to be true, for the given p and q. Here is code to compute the RHS – LHS of the beta inequality (and plot the results).
from math import log, pow from matplotlib import pyplot def entropy(p, beta): return 1. / (1. - beta) * log(sum(pow(p[i], beta) for i in range(len(p)))) def verify_p_q(p, q, steps=10000, a=0.0001, b=30): delta = float(b - a) / steps points = [] for step in range(0, steps): beta = a + step * delta diff = entropy(q, beta) - entropy(p, beta) print beta, diff points.append((beta, diff)) pyplot.plot([x for (x,y) in points], [y for (x,y) in points]) pyplot.show() if __name__ == "__main__": p = [0.4, 0.29, 0.29, 0.02] q = [0.39, 0.31, 0.2, 0.1] verify_p_q(p, q)John, I wrote some code to systematically find many examples, and it finds many. Let me know if you have any use for it.
Thanks! I’m not sure why I’d want lots of examples, but here’s something related.
Oscar was asking if this behavior—all Rényi entropies of p bigger than of q, but p not majorizing q—is unusual. Maybe without much extra work you could figure out roughly what ‘percentage’ of pairs (p,q) display this behavior, say for probability distributions on a 3-element set?
If you’ve already written code to decide if a given pair of probability distributions displays this behavior, then you just need to
1) decide what measure you’ll use to define the ‘percentage’ of pairs of probability distributions that display a given behavior,
2) figure out how to estimate that percentage, either using Monte Carlo estimation or some other procedure.
You can think of a probability distribution on a 3-element set as a triple
such that
and
The space of these is an equilateral triangle. Since the problem involves majorization and (implicitly) convex-linear maps, the symmetries are just the obvious symmetries of the triangle. So, it makes some sense to just use the obvious measure on this triangle, which is proportional to
So for Monte Carlo, you can just randomly pick
in a uniformly distributed way, then randomly pick
in a uniformly distributed way, then let
.
As you know, there’s also another measure on this equilateral triangle, coming from the Fisher information metric. That would give a different answer!
Here are some simple computations. For 10,000,000 randomly selected pairs of points in the simplex (n=3), 826,289 pairs had both p and q decreasing; of these, 360,420 satisfied the Renyi inequality, and of these pairs, 325,013 had p majorizing q. So the desired proportion is ~10%.
Here is one such pair if anyone wants to verify (Renyi for all beta and p majorizes q):
(0.702035380032679, 0.25269600971504746, 0.04526861025227358)
(0.6012357521664494, 0.32989515938159686, 0.06886908845195372)
Looks like
and
works with
.
Since we seem to be posting code, here’s Mathematica:
p = {0.6, 0.2, 0.2};
q = {0.35, 0.35, 0.3};
blah[x] = Sum[p[[j]]^x, {j, 1, 3}] – Sum[q[[j]]^x, {j, 1, 3}] ;
Plot[Evaluate@blah[x], {x, 0, 2}]
Sorry, that doesn’t break majorization at all. Too bad I can’t delete comments here.
I’ll delete ’em if you want.
I haven’t followed the previous post (yet), but if I’m not wrong, these 3-number lists seem to work as well:
p=(0.36 0.32 0.32)
q=(0.35 0.34 0.31)
The inequality seems to hold for these.
Ok, so the proposed examples by John and Romain seem to fail the partial sum constraint since
(and both have
).
Just for the sake of people who are too lazy to read all the comments, I’ll note that this remark of Marc’s was based on a misunderstanding that was cleared up here.
I also graphed the (exponentials of the) Rényi entropies of these, and agree that they work.
(0.6, 0.2, 0.2) and (0.42, 0.42, 1.6) seem to work.
Much as I love Python, this is easier to play with in Mathematica:
p = {0.4, 0.29, 0.29, 0.02}; q = {0.39, 0.31, 0.2, 0.1}; Plot[ 1/(1 - b) Log[Tr[q^b]] - 1/(1 - b) Log[Tr[p^b]], {b, 0.0, 20.0}]Ok, there may be some ambiguity in the constraint given by John:
To me, it meant that there must be at least one
for which this inequality does not hold.
In words, the constraint is that the cumulative sum of
must be strictly smaller than that of
. And actually, this cannot be true for all
because clearly, when
, then
(though this is a bit of a degenerate case).
I agree. If the interpretation is correct as you suggest (and I think it is), there are many solutions for n=3, including yours.
Yes, as soon as the inequality does not hold true for a given
then we cannot say that
majorizes
So, we have here two distributions
and
The Rényi entropy of
is always smaller than that of
yet
does not majorize
so that John’s conjecture is true that “majorize” is not equivalent to “have a smaller Rényi entropy” but is in fact a stronger statement.
Romain writes:
Right. I was asking for examples where these inequalities fail for at least one
‘Majorization’ occurs when these inequalities hold for all
I was looking for a probability distribution
that has all its Rényi entropies less than another
yet
fails to majorize 
I got these:
{0.41, 0.3, 0.29}
{0.4, 0.32, 0.28}
just by dropping last numbers from example in the post and shifting first 2 up by 0.02 to preserve ordering. As I see now romain got different ones so there must be many such examples.
John, I suspect this could for good turn into a memorable programming languages-and-implementations competition. The core of it would remain your exact problem statement. Maybe one or some of the early competitors could be seduced into implementing the infrastructure site? I guess your were unconsciously jealous of the attention London got for the Olympic Games :)
If you get a collection of code solving this problem this site would appreciate the donation if possible.
http://rosettacode.org/wiki/Rosetta_Code
“Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different”
The nicest pair of triples i was able to find is:
{0.5, 0.25, 0.25},
{0.45, 0.35, 0.2}
I used mathematica to check if the condition holds, sample code is below, unfortunately I have no idea how to paste code properly, will give latex a try but don’t know how well it will come out:
define vectors:




Check that partial sums violate majorization:
![PSum[v\_, n\_] := Sum[v[[i]], \{i, 1, n\}] PSum[v\_, n\_] := Sum[v[[i]], \{i, 1, n\}]](https://s0.wp.com/latex.php?latex=PSum%5Bv%5C_%2C+n%5C_%5D+%3A%3D+Sum%5Bv%5B%5Bi%5D%5D%2C+%5C%7Bi%2C+1%2C+n%5C%7D%5D&bg=ffffff&fg=333333&s=0)
![Array[PSum[p, \#] \&, Length[p]] Array[PSum[p, \#] \&, Length[p]]](https://s0.wp.com/latex.php?latex=Array%5BPSum%5Bp%2C+%5C%23%5D+%5C%26%2C+Length%5Bp%5D%5D&bg=ffffff&fg=333333&s=0)
![Array[PSum[q, \#] \&, Length[q]] Array[PSum[q, \#] \&, Length[q]]](https://s0.wp.com/latex.php?latex=Array%5BPSum%5Bq%2C+%5C%23%5D+%5C%26%2C+Length%5Bq%5D%5D&bg=ffffff&fg=333333&s=0)
Solve the equation to find for which beta entropies equal each other:
![F[\beta \_, v\_] := 1/(1 - \beta) Log[Sum[(v[[i]])^\beta, \{i, 1, Length[v]\}]] F[\beta \_, v\_] := 1/(1 - \beta) Log[Sum[(v[[i]])^\beta, \{i, 1, Length[v]\}]]](https://s0.wp.com/latex.php?latex=F%5B%5Cbeta+%5C_%2C+v%5C_%5D+%3A%3D+1%2F%281+-+%5Cbeta%29+Log%5BSum%5B%28v%5B%5Bi%5D%5D%29%5E%5Cbeta%2C+%5C%7Bi%2C+1%2C+Length%5Bv%5D%5C%7D%5D%5D&bg=ffffff&fg=333333&s=0)
![Reduce[F[\beta , p] == F[\beta, q], \beta, Reals] Reduce[F[\beta , p] == F[\beta, q], \beta, Reals]](https://s0.wp.com/latex.php?latex=Reduce%5BF%5B%5Cbeta+%2C+p%5D+%3D%3D+F%5B%5Cbeta%2C+q%5D%2C+%5Cbeta%2C+Reals%5D&bg=ffffff&fg=333333&s=0)
Mathematica finds two solutions for beta, one negative and one 0.
Plot both entropies to verify p is below q.
![Plot[\{F[\beta, p], F[\beta, q]\}, \{\beta, 0, 10\}, PlotStyle \to \{Red, Blue\}] Plot[\{F[\beta, p], F[\beta, q]\}, \{\beta, 0, 10\}, PlotStyle \to \{Red, Blue\}]](https://s0.wp.com/latex.php?latex=Plot%5B%5C%7BF%5B%5Cbeta%2C+p%5D%2C+F%5B%5Cbeta%2C+q%5D%5C%7D%2C+%5C%7B%5Cbeta%2C+0%2C+10%5C%7D%2C+PlotStyle+%5Cto+%5C%7BRed%2C+Blue%5C%7D%5D&bg=ffffff&fg=333333&s=0)
It is and since they are not crossing anywhere for beta>0 it follows that for all beta (except beta =1) p entropy is below q entropy.
Now I am certainly not an expert on computational methods and I don’t know how trustworthy mathematica reduce command is in this context so I cannot guarantee that it is correct.
I meant for all beta > 0.
By the way, why is only beta > 0 considered in Rényi entropy definition?
In a certain analogy to thermodynamics, which I used to relate Rényi entropy to free energy,
would correspond to temperatures below absolute zero. At negative temperatures, a system is more likely to be found in states of high energy than states of low energy. Negative temperatures are not completely absurd, but they only make sense for systems with an upper bound on their energy—just as positive temperatures only make sense for systems with a lower bound on their energy.
Nice examples, Arrow! I like your last example the best.
For including code you surround the code by something like this:
[ sourcecode language=”python”]
[ /sourcecode]
except without the space after the [. I don’t know what range of languages it takes. It seems to do something halfway reasonable if you leave the language choice out.
John, you might like to try this paper:
• G.P. Patil and C. Taillie, Diversity as a concept and its measurement, Journal of the American Statistical Association 77 (1982), 548-561.
For instance, Definition 5.1 defines what it means for one community to be “intrinsically more diverse” than another, in terms of transferring abundance from a more common species to a rarer one. That seems very close to majorization (which appears in Theorem 5.1).
To my knowledge, this is the first ever paper where the definition of “Tsallis” entropy is written down (some years before Tsallis, needless to say). That’s at the end of Section 3.2. It’s not merely written down, but developed. They notice that these entropies are related to the Hill numbers and Rényi entropies by increasing transformations (Section 3.3).
I don’t know whether they address the exact question that you’re asking here, but it seems quite plausible.
Thanks, Tom! That ‘intrinsically more diverse’ idea indeed seems closely connected to majorization.