Boltzmann’s idea (translated into Shannon’s terms) was that the world of possibility could be divided into regions within which changes made no difference to the meaning, but between which changes did matter (microstates within macrostates). For Shannon, information is change in entropy, which means a selection among possible macrostates.

As Shannon pointed out in the part of the book on continuous systems, although total entropy is not well specified, information is, provided the coordinate system remains constant. The entropy-information partition can be stated as: information from a message (observation) is what you learn about which macrostate the system being observed is in, entropy is what you don’t know about the system as a whole. The latter can be further subdifided into potential information obtainable by further observation (refinement of the macrostate probability distribution) and the possibly infinite remaining unobservable differences in the total system.

Incidentally, one of the above commenters said that Weaver said entropy had nothing to do with meaning. That’s not what Shannon said. He said that in calculating the capacity of a noisy channel (observation method) one should not consider the meaning, because the meaning of a message (observation) to the receiver (observer) depended entirely on the prior state of the receiver (observer). As does the quantity of information the receiver (observer) could get from the message (observation).

But maybe I have misinterpreted the message in the original post.

]]>Tobias wrote:

In my understanding, what John means by “information” is the amount by which your “missing information” changes when he tells you something about his number. Think of it as the amount of information contained in his message to you.

Yes. It’s very easy to get mixed up about minus signs when studying entropy and information. I won’t be ultra-precise here because it seems that Ady and you and I all understand and agree about what’s going on.

The entropy of a probability distribution is the expected amount of information you gain when you learn the value of a random variable distributed according to that probability distribution.

So, it’s the expected amount of information you gain when someone sends you a message telling you the value of that random variable.

Or, you could say it’s the amount of information you’re *missing*, *before* you know that random variable.

Since I’ve thought about this a lot and have become pretty relaxed about it, I don’t mind saying ‘entropy is information’. But that can be confusing if you haven’t thought about this subject a lot.

It’s a bit like how the sign of ‘work’ in physics can get very confusing if you don’t keep track of whether you mean ‘work done on the system’ or ‘work the system has done’.

]]>There might be several ways to answer this question. Here’s my take on it.

John wrote:

It makes more intuitive sense if you think of entropy as information, and the function f as some kind of data processing that doesn’t introduce any additional randomness. Such a process can only decrease the amount of information. For example, squaring the number -5 gives the same answer as squaring 5, so if I tell you “this number squared is 25″, I’m giving you less information than if I said “this number is -5″.

In my understanding, what John means by “information” is the amount by which your “missing information” changes when he tells you something about his number. Think of it as the amount of information contained in his message to you.

Now consider the following two cases:

If he tells you the number itself, your missing information suddenly decreases from total ignorance of the value of the number to complete knowledge of the number.

On the other hand, if the tells you only the square of the number, your missing information will not decrease to 0, since you won’t know the sign of the number. Your missing information decreases by a smaller amount than in the previous case.

The difference between these two cases is precisely our “information loss”.

]]>If the amount of information is decreased then the missing information is increased, so shouldn’t the entropy have increased? (It’s supposed to be a measure of the missing information – our ignorance about the system).

]]>One reason would be in order to measure entropy, so x or y has to be a probability. -xlnx is then the entropy. Or, I should say entropy is the negative sum over all your possible x’s of that value. So if the random variable X can equal 0 (tails) or 1 …

]]>The baseball example was meant to show that while the Shannon entropy of the black box signal (the pitcher throwing perfect strikes) recorded low information content, the internal energetic process of the black box, the greater metabolism of the pitcher, was in fact highly negentropic, far from a normal distribution. (This is not saying that it contravenes the second law, clearly a greater portion of energy goes to ground in the process) At this point I am not sure what that proves, but I am unsatisfied with the notion of an identity between information and entropy.

The H theorem is a measure with a particular angle of incidence that, like a conic section, reveals one aspect of information but not the whole.

Of course the genus of information theoretic measures is a living testament to evolution, there are new species at every turn. I have a middling grasp of the Shannon’s more fundamental theorems and it is likely that, unbeknownst to me, there is formal recognition that information is related to, but not identical to entropy.

And, having come this far, unfortunately there is more. I regret being both flatfootedly declarative and perhaps ultimately unclear, but for me this is a germinal notion. Perhaps it is the season for a shift of paradigms and perhaps I am actually coming late to this understanding.

Would things sort out more satisfactorily if we shift information upward in the cosmological hierarchy, grant it a more prominent, costarring role in nature’s grand opera?

I have come to view it in this way. Information is energy’s counterpoise and correspondent constraint. Information keeps energy from running directly off the page, gives it pause and time to doodle things like planets and plants. Path is emergent in the interaction of energy and information and path is the salient feature in life process. And information theoretic measures lend themselves to the understanding of path.

That’s probably more than enough. Regards. ]]>