An Entropy Challenge

If you like computer calculations, here’s a little challenge for you. Oscar Dahlsten may have solved it, but we’d love for you to check his work. It’s pretty important for the foundations of thermodynamics, but you don’t need to know any physics or even anything beyond a little algebra to tackle it! First I’ll explain it in really simple terms, then I’ll remind you a bit of why it matters.

We’re looking for two lists of nonnegative numbers, of the same length, listed in decreasing order:

p_1 \ge p_2 \ge \cdots \ge p_n \ge 0

q_1 \ge q_2 \ge \cdots \ge q_n \ge 0

that sum to 1:

p_1 + \cdots + p_n = 1

q_1 + \cdots + q_n = 1

and that obey this inequality:

\displaystyle{ \frac{1}{1 - \beta} \ln \sum_{i=1}^n p_i^\beta  \le  \frac{1}{1 - \beta} \ln \sum_{i=1}^n q_i^\beta }

for all 0 < \beta < \infty (ignoring \beta = 1), yet do not obey these inequalities:

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n.

Oscar’s proposed solution is this:

p = (0.4, 0.29, 0.29, 0.02)

q = (0.39, 0.31, 0.2, 0.1)

Can you see if this works? Is there a simpler example, like one with lists of just 3 numbers?

This question came up near the end of my post More Second Laws of Thermodynamics. I phrased the question with a bit more jargon, and said a lot more about its significance. Suppose we have two probability distributions on a finite set, say p and q. We say p majorizes q if

p_1 + \cdots + p_k \ge q_1 + \cdots + q_k

for all 1 \le k \le n, when we write both lists of numbers in decreasing order. This means p is ‘less flat’ than q, so it should have less entropy. And indeed it does: not just for ordinary entropy, but also for Rényi entropy! The Rényi entropy of p is defined by

\displaystyle{ H_\beta(p) = \frac{1}{1 - \beta} \ln \sum_{i=1}^n p_i^\beta  }

where 0 < \beta < 1 or 1 < \beta < \infty. We can also define Rényi entropy for \beta = 0, 1, \infty by taking a limit, and at \beta = 1 we get the ordinary entropy

\displaystyle{ H_1(p) = - \sum_{i = 1}^n p_i \ln (p_i) }

The question is whether majorization is more powerful than Rényi entropy as a tool to to tell when one probability distribution is less flat than another. I know that if p majorizes q, its Rényi entropy is less than than that of q for all 0 \le \beta \le \infty. Your mission, should you choose to accept it, is to show the converse is not true.

30 Responses to An Entropy Challenge

  1. David Speyer says:

    I didn’t prove it rigorously, but a numerical plot of Oscar’s example looks extremely good. I’m plotting

    \sum p_i^{\beta} - \sum q_i^{\beta},

    so the desired statement is that the only zero is at \beta=1. Here are plots for 0 \leq \beta \leq 2 and 0.8 \leq \beta \leq 5:

    • John Baez says:

      That looks convincing—thanks a lot, David! For those who haven’t thought about this at at all: we want the function graphed here to be negative for 0 < \beta < 1 and positive for \beta > 1.

      By the way, there are simple formulas for the Rényi entropy at \beta = 0 and \beta = \infty, which allow us to understand exactly what’s going on at these points. Oscar used them to construct his example. At \beta = 0 we get the min-entropy and at \beta = \infty we get the max-entropy, which is the logarithm of the number of i such that p_i > 0.

      Since both of Oscar’s probability distributions have the same max-entropy, the curve you graphed approaches zero as \beta \to \infty, just as it looks.

      It approaches a certain negative value as \beta \to 0, though the eye could be fooled into thinking it approaches zero.

  2. Marc Harper says:

    The inequality seems to be true, for the given p and q. Here is code to compute the RHS – LHS of the beta inequality (and plot the results).

    from math import log, pow
    from matplotlib import pyplot
    def entropy(p, beta):
        return 1. / (1. - beta) * log(sum(pow(p[i], beta) for i in range(len(p))))
    def verify_p_q(p, q, steps=10000, a=0.0001, b=30):
        delta = float(b - a) / steps
        points = []
        for step in range(0, steps):
            beta = a + step * delta
            diff = entropy(q, beta) - entropy(p, beta)
            print beta, diff
            points.append((beta, diff))
        pyplot.plot([x for (x,y) in points], [y for (x,y) in points])
    if __name__ == "__main__":
        p = [0.4, 0.29, 0.29, 0.02]
        q = [0.39, 0.31, 0.2, 0.1]
        verify_p_q(p, q)
    • Marc Harper says:

      John, I wrote some code to systematically find many examples, and it finds many. Let me know if you have any use for it.

      • John Baez says:

        Thanks! I’m not sure why I’d want lots of examples, but here’s something related.

        Oscar was asking if this behavior—all Rényi entropies of p bigger than of q, but p not majorizing q—is unusual. Maybe without much extra work you could figure out roughly what ‘percentage’ of pairs (p,q) display this behavior, say for probability distributions on a 3-element set?

        If you’ve already written code to decide if a given pair of probability distributions displays this behavior, then you just need to

        1) decide what measure you’ll use to define the ‘percentage’ of pairs of probability distributions that display a given behavior,

        2) figure out how to estimate that percentage, either using Monte Carlo estimation or some other procedure.

        You can think of a probability distribution on a 3-element set as a triple


        such that

        0 \le x, y,z \le 1


        x + y + z = 1

        The space of these is an equilateral triangle. Since the problem involves majorization and (implicitly) convex-linear maps, the symmetries are just the obvious symmetries of the triangle. So, it makes some sense to just use the obvious measure on this triangle, which is proportional to

        dx \; dy

        So for Monte Carlo, you can just randomly pick x \in [0,1] in a uniformly distributed way, then randomly pick y \in [0,1-x] in a uniformly distributed way, then let z = 1-x-y.

        As you know, there’s also another measure on this equilateral triangle, coming from the Fisher information metric. That would give a different answer!

        • Marc Harper says:

          Here are some simple computations. For 10,000,000 randomly selected pairs of points in the simplex (n=3), 826,289 pairs had both p and q decreasing; of these, 360,420 satisfied the Renyi inequality, and of these pairs, 325,013 had p majorizing q. So the desired proportion is ~10%.

          Here is one such pair if anyone wants to verify (Renyi for all beta and p majorizes q):
          (0.702035380032679, 0.25269600971504746, 0.04526861025227358)
          (0.6012357521664494, 0.32989515938159686, 0.06886908845195372)

  3. David Speyer says:

    Looks like (0.6, 0.2, 0.2) and (0.35, 0.35, 0.3) works with n=3.

  4. romain says:

    I haven’t followed the previous post (yet), but if I’m not wrong, these 3-number lists seem to work as well:
    p=(0.36 0.32 0.32)
    q=(0.35 0.34 0.31)

  5. David Speyer says:

    (0.6, 0.2, 0.2) and (0.42, 0.42, 1.6) seem to work.

  6. Matt Reece says:

    Much as I love Python, this is easier to play with in Mathematica:

    p = {0.4, 0.29, 0.29, 0.02}; 
    q = {0.39, 0.31, 0.2, 0.1}; 
    Plot[ 1/(1 - b) Log[Tr[q^b]] - 1/(1 - b) Log[Tr[p^b]], {b, 0.0, 20.0}]
  7. romain says:

    Ok, there may be some ambiguity in the constraint given by John:

    …yet do not obey these inequalities:
    p_1 + \cdots + p_k \ge q_1 + \cdots + q_k for all 1 \le k \le n

    To me, it meant that there must be at least one k for which this inequality does not hold.

    In words, the constraint is that the cumulative sum of p must be strictly smaller than that of q. And actually, this cannot be true for all k because clearly, when k=n, then \sum p=\sum q=1 (though this is a bit of a degenerate case).

    • Marc Harper says:

      I agree. If the interpretation is correct as you suggest (and I think it is), there are many solutions for n=3, including yours.

      • romain says:

        Yes, as soon as the inequality does not hold true for a given k, then we cannot say that p majorizes q.

        So, we have here two distributions p and q. The Rényi entropy of p is always smaller than that of q, yet p does not majorize q, so that John’s conjecture is true that “majorize” is not equivalent to “have a smaller Rényi entropy” but is in fact a stronger statement.

    • John Baez says:

      Romain writes:

      To me, it meant that there must be at least one k for which this inequality does not hold.

      Right. I was asking for examples where these inequalities fail for at least one k. ‘Majorization’ occurs when these inequalities hold for all k. I was looking for a probability distribution p that has all its Rényi entropies less than another q, yet p fails to majorize q.

  8. Arrow says:

    I got these:
    {0.41, 0.3, 0.29}
    {0.4, 0.32, 0.28}

    just by dropping last numbers from example in the post and shifting first 2 up by 0.02 to preserve ordering. As I see now romain got different ones so there must be many such examples.

  9. Boris Borcic says:

    John, I suspect this could for good turn into a memorable programming languages-and-implementations competition. The core of it would remain your exact problem statement. Maybe one or some of the early competitors could be seduced into implementing the infrastructure site? I guess your were unconsciously jealous of the attention London got for the Olympic Games :)

  10. Arrow says:

    The nicest pair of triples i was able to find is:
    {0.5, 0.25, 0.25},
    {0.45, 0.35, 0.2}

    I used mathematica to check if the condition holds, sample code is below, unfortunately I have no idea how to paste code properly, will give latex a try but don’t know how well it will come out:

    define vectors:
    \{p1 = 1/2, p2 = 1/4\};
    \{q1 = 9/20, q2 = 7/20\};
    p = \{p1, p2, 1 - p1 - p2\}
    q = \{q1, q2, 1 - q1 - q2\}

    Check that partial sums violate majorization:
    PSum[v\_, n\_] := Sum[v[[i]], \{i, 1, n\}]
    Array[PSum[p, \#] \&, Length[p]]
    Array[PSum[q, \#] \&, Length[q]]

    Solve the equation to find for which beta entropies equal each other:
    F[\beta \_, v\_] := 1/(1 - \beta) Log[Sum[(v[[i]])^\beta, \{i, 1, Length[v]\}]]
    Reduce[F[\beta , p] == F[\beta, q], \beta, Reals]

    Mathematica finds two solutions for beta, one negative and one 0.

    Plot both entropies to verify p is below q.
    Plot[\{F[\beta, p], F[\beta, q]\}, \{\beta, 0, 10\}, PlotStyle \to \{Red, Blue\}]
    It is and since they are not crossing anywhere for beta>0 it follows that for all beta (except beta =1) p entropy is below q entropy.

    Now I am certainly not an expert on computational methods and I don’t know how trustworthy mathematica reduce command is in this context so I cannot guarantee that it is correct.

    • Arrow says:

      I meant for all beta > 0.

      By the way, why is only beta > 0 considered in Rényi entropy definition?

      • John Baez says:

        \beta < 0 would make the entropy undefined for probability distributions that vanish at some points, and it would probably have many other evil features as well. For example, low-probability events would contribute enormously to the entropy.

        In a certain analogy to thermodynamics, which I used to relate Rényi entropy to free energy, \beta < 0 would correspond to temperatures below absolute zero. At negative temperatures, a system is more likely to be found in states of high energy than states of low energy. Negative temperatures are not completely absurd, but they only make sense for systems with an upper bound on their energy—just as positive temperatures only make sense for systems with a lower bound on their energy.

    • John Baez says:

      Nice examples, Arrow! I like your last example the best.

      For including code you surround the code by something like this:

      [ sourcecode language=”python”]

      [ /sourcecode]

      except without the space after the [. I don’t know what range of languages it takes. It seems to do something halfway reasonable if you leave the language choice out.

  11. Tom Leinster says:

    John, you might like to try this paper:

    • G.P. Patil and C. Taillie, Diversity as a concept and its measurement, Journal of the American Statistical Association 77 (1982), 548-561.

    For instance, Definition 5.1 defines what it means for one community to be “intrinsically more diverse” than another, in terms of transferring abundance from a more common species to a rarer one. That seems very close to majorization (which appears in Theorem 5.1).

    To my knowledge, this is the first ever paper where the definition of “Tsallis” entropy is written down (some years before Tsallis, needless to say). That’s at the end of Section 3.2. It’s not merely written down, but developed. They notice that these entropies are related to the Hill numbers and Rényi entropies by increasing transformations (Section 3.3).

    I don’t know whether they address the exact question that you’re asking here, but it seems quite plausible.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.