Thanks for responding!! For other readers, the quantitative statement is that the number of of typical words (those words with $| latex |-1/n * log (p(x)) – H(X) | < \beta$ ) can be bounded by (this follows by direct computation from ). When $H(X) < log |X|$ (which is exactly the non-uniform distribution case) and $\latex \beta$ sufficiently small, $H(X) + \beta – log |X| < 0$, so the ratio of typical words to all words goes to zero as $N$ grows.

]]>I haven’t thought about this for about 5 years, but: when the probability distribution on is uniform, I believe you are right that the ‘typical set’ is all of . In this case no compression is possible. For every other probability distribution on , the typical set will become small compared to when .

]]>Thanks for the extra context John.

]]>Sounds like yummy breakfast. But no Cramer large deviation stuff? Last century I ran a seminar on Cramer’s theorem, iterated logarithm, arcsin law etc. Now suddenly it seems Shannon is yummy, too.

]]>That’s nice! Thanks!

I hope I have enough energy to say more about the asymptotic equipartition property.

]]>Let’s keep talking about all this stuff.

By the way, Jamie, speaking of error correction…

Everyone please remember: on all WordPress blogs, LaTeX is done like this:

$latex E = mc^2$

with the word ‘latex’ directly following the first dollar sign, no space. Double dollar signs don’t work here.

]]>I didn’t mean to say the asymptotic equipartition property is extremely hard. However, the rest of the proof looks easy in comparison, so one is inclined to look at this part and say “yuck, that’s some technical fact I’d rather take on faith”. It seems like the pit in the peach. But I was trying to convince everyone that unlike the pit in the peach, it’s highly nutritious, and tasty in its own way.

I stated a watered-down version of the asymptotic equipartition theorem: just for purposes of exposition, I assumed that each letter in the string was drawn independently from the same probability distribution on letters. In other words, I was assuming that they’re ‘independent identically distributed’ random variables. This is clearly too restrictive—it sure ain’t true for English text!

The statement and proof gets a bit harder when we do the full-fledged thing: you can see a proof here for the i.i.d. case and here for the more general case. The more general case uses Lévy’s martingale convergence theorem, Markov’s inequality and Borel-Cantelli lemma. For some people that would be very scary, while others eat such stuff for breakfast.

But anyway, regardless of whether the proof is hard, the result seems both interesting and highly believable: not shocking.

]]>My quick skim of your summary of the hard lemma left me wondering why it is a hard lemma*. Isn’t it just a baby step away from the fact that as n-> infinity, samples of size n look more and more like the underlying distribution from which they are drawn?

*That in itself is progress, since my quick skim of Wikipedia’s summary of the hard lemma left me almost clueless as to the content of the hard lemma (it struck me as very vague:-)

]]>