If you like computer calculations, here’s a little challenge for you. Oscar Dahlsten may have solved it, but we’d love for you to check his work. It’s pretty important for the foundations of thermodynamics, but you don’t need to know any physics or even anything beyond a little algebra to tackle it! First I’ll explain it in really simple terms, then I’ll remind you a bit of why it matters.

We’re looking for two lists of nonnegative numbers, of the same length, listed in decreasing order:

that sum to 1:

and that obey this inequality:

for all (ignoring ), yet do not obey these inequalities:

for all

Oscar’s proposed solution is this:

Can you see if this works? Is there a simpler example, like one with lists of just 3 numbers?

This question came up near the end of my post More Second Laws of Thermodynamics. I phrased the question with a bit more jargon, and said a lot more about its significance. Suppose we have two probability distributions on a finite set, say and We say majorizes if

for all when we write both lists of numbers in decreasing order. This means is ‘less flat’ than , so it should have less entropy. And indeed it does: not just for ordinary entropy, but also for Rényi entropy! The Rényi entropy of is defined by

where or . We can also define Rényi entropy for by taking a limit, and at we get the ordinary entropy

The question is whether majorization is more powerful than Rényi entropy as a tool to to tell when one probability distribution is less flat than another. I know that if majorizes its Rényi entropy is less than than that of for all Your mission, should you choose to accept it, is to show the converse is not true.

That looks convincing—thanks a lot, David! For those who haven’t thought about this at at all: we want the function graphed here to be negative for and positive for

By the way, there are simple formulas for the Rényi entropy at and which allow us to understand exactly what’s going on at these points. Oscar used them to construct his example. At we get the min-entropy and at we get the max-entropy, which is the logarithm of the number of such that .

Since both of Oscar’s probability distributions have the same max-entropy, the curve you graphed approaches zero as just as it looks.

It approaches a certain negative value as though the eye could be fooled into thinking it approaches zero.

Thanks! I’m not sure why I’d want lots of examples, but here’s something related.

Oscar was asking if this behavior—all Rényi entropies of p bigger than of q, but p not majorizing q—is unusual. Maybe without much extra work you could figure out roughly what ‘percentage’ of pairs (p,q) display this behavior, say for probability distributions on a 3-element set?

If you’ve already written code to decide if a given pair of probability distributions displays this behavior, then you just need to

1) decide what measure you’ll use to define the ‘percentage’ of pairs of probability distributions that display a given behavior,

2) figure out how to estimate that percentage, either using Monte Carlo estimation or some other procedure.

You can think of a probability distribution on a 3-element set as a triple

such that

and

The space of these is an equilateral triangle. Since the problem involves majorization and (implicitly) convex-linear maps, the symmetries are just the obvious symmetries of the triangle. So, it makes some sense to just use the obvious measure on this triangle, which is proportional to

So for Monte Carlo, you can just randomly pick in a uniformly distributed way, then randomly pick in a uniformly distributed way, then let .

As you know, there’s also another measure on this equilateral triangle, coming from the Fisher information metric. That would give a different answer!

Here are some simple computations. For 10,000,000 randomly selected pairs of points in the simplex (n=3), 826,289 pairs had both p and q decreasing; of these, 360,420 satisfied the Renyi inequality, and of these pairs, 325,013 had p majorizing q. So the desired proportion is ~10%.

Here is one such pair if anyone wants to verify (Renyi for all beta and p majorizes q):
(0.702035380032679, 0.25269600971504746, 0.04526861025227358)
(0.6012357521664494, 0.32989515938159686, 0.06886908845195372)

Just for the sake of people who are too lazy to read all the comments, I’ll note that this remark of Marc’s was based on a misunderstanding that was cleared up here.

Ok, there may be some ambiguity in the constraint given by John:

…yet do not obey these inequalities:
for all …

To me, it meant that there must be at least one for which this inequality does not hold.

In words, the constraint is that the cumulative sum of must be strictly smaller than that of . And actually, this cannot be true for all because clearly, when , then (though this is a bit of a degenerate case).

Yes, as soon as the inequality does not hold true for a given then we cannot say that majorizes

So, we have here two distributions and The Rényi entropy of is always smaller than that of yet does not majorize so that John’s conjecture is true that “majorize” is not equivalent to “have a smaller Rényi entropy” but is in fact a stronger statement.

To me, it meant that there must be at least one k for which this inequality does not hold.

Right. I was asking for examples where these inequalities fail for at least one ‘Majorization’ occurs when these inequalities hold for all I was looking for a probability distribution that has all its Rényi entropies less than another yet fails to majorize

just by dropping last numbers from example in the post and shifting first 2 up by 0.02 to preserve ordering. As I see now romain got different ones so there must be many such examples.

John, I suspect this could for good turn into a memorable programming languages-and-implementations competition. The core of it would remain your exact problem statement. Maybe one or some of the early competitors could be seduced into implementing the infrastructure site? I guess your were unconsciously jealous of the attention London got for the Olympic Games :)

“Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different”

The nicest pair of triples i was able to find is:
{0.5, 0.25, 0.25},
{0.45, 0.35, 0.2}

I used mathematica to check if the condition holds, sample code is below, unfortunately I have no idea how to paste code properly, will give latex a try but don’t know how well it will come out:

define vectors:

Check that partial sums violate majorization:

Solve the equation to find for which beta entropies equal each other:

Mathematica finds two solutions for beta, one negative and one 0.

Plot both entropies to verify p is below q.

It is and since they are not crossing anywhere for beta>0 it follows that for all beta (except beta =1) p entropy is below q entropy.

Now I am certainly not an expert on computational methods and I don’t know how trustworthy mathematica reduce command is in this context so I cannot guarantee that it is correct.

would make the entropy undefined for probability distributions that vanish at some points, and it would probably have many other evil features as well. For example, low-probability events would contribute enormously to the entropy.

In a certain analogy to thermodynamics, which I used to relate Rényi entropy to free energy, would correspond to temperatures below absolute zero. At negative temperatures, a system is more likely to be found in states of high energy than states of low energy. Negative temperatures are not completely absurd, but they only make sense for systems with an upper bound on their energy—just as positive temperatures only make sense for systems with a lower bound on their energy.

Nice examples, Arrow! I like your last example the best.

For including code you surround the code by something like this:

[ sourcecode language="python"]

[ /sourcecode]

except without the space after the [. I don’t know what range of languages it takes. It seems to do something halfway reasonable if you leave the language choice out.

• G.P. Patil and C. Taillie, Diversity as a concept and its measurement, Journal of the American Statistical Association 77 (1982), 548-561.

For instance, Definition 5.1 defines what it means for one community to be “intrinsically more diverse” than another, in terms of transferring abundance from a more common species to a rarer one. That seems very close to majorization (which appears in Theorem 5.1).

To my knowledge, this is the first ever paper where the definition of “Tsallis” entropy is written down (some years before Tsallis, needless to say). That’s at the end of Section 3.2. It’s not merely written down, but developed. They notice that these entropies are related to the Hill numbers and Rényi entropies by increasing transformations (Section 3.3).

I don’t know whether they address the exact question that you’re asking here, but it seems quite plausible.

You can use HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it. Cancel reply

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.

I didn’t prove it rigorously, but a numerical plot of Oscar’s example looks extremely good. I’m plotting

so the desired statement is that the only zero is at Here are plots for and

That looks convincing—thanks a lot, David! For those who haven’t thought about this at at all: we want the function graphed here to be negative for and positive for

By the way, there are simple formulas for the Rényi entropy at and which allow us to understand exactly what’s going on at these points. Oscar used them to construct his example. At we get the

min-entropyand at we get themax-entropy, which is the logarithm of the number of such that .Since both of Oscar’s probability distributions have the same max-entropy, the curve you graphed approaches zero as just as it looks.

It approaches a certain negative value as though the eye could be fooled into thinking it approaches zero.

The inequality seems to be true, for the given p and q. Here is code to compute the RHS – LHS of the beta inequality (and plot the results).

John, I wrote some code to systematically find many examples, and it finds many. Let me know if you have any use for it.

Thanks! I’m not sure why I’d want lots of examples, but here’s something related.

Oscar was asking if this behavior—all Rényi entropies of p bigger than of q, but p not majorizing q—is unusual. Maybe without much extra work you could figure out roughly what ‘percentage’ of pairs (p,q) display this behavior, say for probability distributions on a 3-element set?

If you’ve already written code to decide if a given pair of probability distributions displays this behavior, then you just need to

1) decide what measure you’ll use to define the ‘percentage’ of pairs of probability distributions that display a given behavior,

2) figure out how to estimate that percentage, either using Monte Carlo estimation or some other procedure.

You can think of a probability distribution on a 3-element set as a triple

such that

and

The space of these is an equilateral triangle. Since the problem involves majorization and (implicitly) convex-linear maps, the symmetries are just the obvious symmetries of the triangle. So, it makes some sense to just use the obvious measure on this triangle, which is proportional to

So for Monte Carlo, you can just randomly pick in a uniformly distributed way, then randomly pick in a uniformly distributed way, then let .

As you know, there’s also another measure on this equilateral triangle, coming from the Fisher information metric. That would give a different answer!

Here are some simple computations. For 10,000,000 randomly selected pairs of points in the simplex (n=3), 826,289 pairs had both p and q decreasing; of these, 360,420 satisfied the Renyi inequality, and of these pairs, 325,013 had p majorizing q. So the desired proportion is ~10%.

Here is one such pair if anyone wants to verify (Renyi for all beta and p majorizes q):

(0.702035380032679, 0.25269600971504746, 0.04526861025227358)

(0.6012357521664494, 0.32989515938159686, 0.06886908845195372)

Looks like and works with .

Since we seem to be posting code, here’s Mathematica:

p = {0.6, 0.2, 0.2};

q = {0.35, 0.35, 0.3};

blah[x] = Sum[p[[j]]^x, {j, 1, 3}] – Sum[q[[j]]^x, {j, 1, 3}] ;

Plot[Evaluate@blah[x], {x, 0, 2}]

Sorry, that doesn’t break majorization at all. Too bad I can’t delete comments here.

I’ll delete ‘em if you want.

I haven’t followed the previous post (yet), but if I’m not wrong, these 3-number lists seem to work as well:

p=(0.36 0.32 0.32)

q=(0.35 0.34 0.31)

The inequality seems to hold for these.

Ok, so the proposed examples by John and Romain seem to fail the partial sum constraint since (and both have ).

Just for the sake of people who are too lazy to read all the comments, I’ll note that this remark of Marc’s was based on a misunderstanding that was cleared up here.

I also graphed the (exponentials of the) Rényi entropies of these, and agree that they work.

(0.6, 0.2, 0.2) and (0.42, 0.42, 1.6) seem to work.

Much as I love Python, this is easier to play with in Mathematica:

Ok, there may be some ambiguity in the constraint given by John:

To me, it meant that there must be at least one for which this inequality does not hold.

In words, the constraint is that the cumulative sum of must be strictly smaller than that of . And actually, this cannot be true for all because clearly, when , then (though this is a bit of a degenerate case).

I agree. If the interpretation is correct as you suggest (and I think it is), there are many solutions for n=3, including yours.

Yes, as soon as the inequality does not hold true for a given then we cannot say that majorizes

So, we have here two distributions and The Rényi entropy of is always smaller than that of yet does not majorize so that John’s conjecture is true that “majorize” is not equivalent to “have a smaller Rényi entropy” but is in fact a stronger statement.

Romain writes:

Right. I was asking for examples where these inequalities fail for at least one ‘Majorization’ occurs when these inequalities hold for all I was looking for a probability distribution that has all its Rényi entropies less than another yet fails to majorize

I got these:

{0.41, 0.3, 0.29}

{0.4, 0.32, 0.28}

just by dropping last numbers from example in the post and shifting first 2 up by 0.02 to preserve ordering. As I see now romain got different ones so there must be many such examples.

John, I suspect this could for good turn into a memorable programming languages-and-implementations competition. The core of it would remain your exact problem statement. Maybe one or some of the early competitors could be seduced into implementing the infrastructure site? I guess your were unconsciously jealous of the attention London got for the Olympic Games :)

If you get a collection of code solving this problem this site would appreciate the donation if possible.

http://rosettacode.org/wiki/Rosetta_Code

“Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different”

The nicest pair of triples i was able to find is:

{0.5, 0.25, 0.25},

{0.45, 0.35, 0.2}

I used mathematica to check if the condition holds, sample code is below, unfortunately I have no idea how to paste code properly, will give latex a try but don’t know how well it will come out:

define vectors:

Check that partial sums violate majorization:

Solve the equation to find for which beta entropies equal each other:

Mathematica finds two solutions for beta, one negative and one 0.

Plot both entropies to verify p is below q.

It is and since they are not crossing anywhere for beta>0 it follows that for all beta (except beta =1) p entropy is below q entropy.

Now I am certainly not an expert on computational methods and I don’t know how trustworthy mathematica reduce command is in this context so I cannot guarantee that it is correct.

I meant for all beta > 0.

By the way, why is only beta > 0 considered in Rényi entropy definition?

would make the entropy undefined for probability distributions that vanish at some points, and it would probably have many other evil features as well. For example, low-probability events would contribute enormously to the entropy.

In a certain analogy to thermodynamics, which I used to relate Rényi entropy to free energy, would correspond to temperatures below absolute zero. At negative temperatures, a system is more likely to be found in states of high energy than states of low energy. Negative temperatures are not completely absurd, but they only make sense for systems with an upper bound on their energy—just as positive temperatures only make sense for systems with a lower bound on their energy.

Nice examples, Arrow! I like your last example the best.

For including code you surround the code by something like this:

[ sourcecode language="python"]

[ /sourcecode]

except without the space after the [. I don’t know what range of languages it takes. It seems to do something halfway reasonable if you leave the language choice out.

John, you might like to try this paper:

• G.P. Patil and C. Taillie, Diversity as a concept and its measurement,

Journal of the American Statistical Association77 (1982), 548-561.For instance, Definition 5.1 defines what it means for one community to be “intrinsically more diverse” than another, in terms of transferring abundance from a more common species to a rarer one. That seems very close to majorization (which appears in Theorem 5.1).

To my knowledge, this is the first ever paper where the definition of “Tsallis” entropy is written down (some years before Tsallis, needless to say). That’s at the end of Section 3.2. It’s not merely written down, but developed. They notice that these entropies are related to the Hill numbers and Rényi entropies by increasing transformations (Section 3.3).

I don’t know whether they address the exact question that you’re asking here, but it seems quite plausible.

Thanks, Tom! That ‘intrinsically more diverse’ idea indeed seems closely connected to majorization.