<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: The Mathematics of Biodiversity (Part 7)</title>
	<atom:link href="http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/feed/" rel="self" type="application/rss+xml" />
	<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/</link>
	<description></description>
	<lastBuildDate>Sun, 19 May 2013 18:13:45 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: The best way to extort an extortionist is to be fair « neuroecology</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16887</link>
		<dc:creator><![CDATA[The best way to extort an extortionist is to be fair « neuroecology]]></dc:creator>
		<pubDate>Wed, 18 Jul 2012 19:08:01 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16887</guid>
		<description><![CDATA[[...] It’s all over the place!  If you rerun the simulation again and again,you get a different distribution of these values, but they always seem to be &gt;.3 and never settle at 1 (tit-for-tat)!  We can measure how diverse the distribution is by the entropy of possible [...]]]></description>
		<content:encoded><![CDATA[<p>[...] It’s all over the place!  If you rerun the simulation again and again,you get a different distribution of these values, but they always seem to be &gt;.3 and never settle at 1 (tit-for-tat)!  We can measure how diverse the distribution is by the entropy of possible [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Mathematics of Biodiversity (Part 8) « Azimuth</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16789</link>
		<dc:creator><![CDATA[The Mathematics of Biodiversity (Part 8) « Azimuth]]></dc:creator>
		<pubDate>Sun, 15 Jul 2012 04:36:34 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16789</guid>
		<description><![CDATA[[...] They computed the first-order bias of the information (related to the &lt;b&gt;Miller&#8211;Madow correction&lt;/b&gt;)  [...]]]></description>
		<content:encoded><![CDATA[<p>[...] They computed the first-order bias of the information (related to the <b>Miller&#8211;Madow correction</b>)  [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Baez</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16765</link>
		<dc:creator><![CDATA[John Baez]]></dc:creator>
		<pubDate>Sat, 14 Jul 2012 03:19:02 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16765</guid>
		<description><![CDATA[Thanks, I&#039;ll fix this.  I&#039;ll be interested to hear what you&#039;re thinking about.]]></description>
		<content:encoded><![CDATA[<p>Thanks, I&#8217;ll fix this.  I&#8217;ll be interested to hear what you&#8217;re thinking about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Baez</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16764</link>
		<dc:creator><![CDATA[John Baez]]></dc:creator>
		<pubDate>Sat, 14 Jul 2012 03:18:06 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16764</guid>
		<description><![CDATA[Thanks!  I believe the new version of my blog article gets the facts straight---if you see any actual errors please let me know.  

I believe the usual definition of &#039;estimator&#039; requires that you&#039;re able to compute an estimate from $latex n$ samples of data no matter what those samples are.  As you note, Smith and Schürmann&#039;s result evades Paninski&#039;s theorem by relaxing this definition of &#039;estimator&#039;: their formula only lets you compute an estimate if your samples obey some condition.   One might call this a &#039;conditional estimator&#039;.  I imagine some statisticians have already thought hard about this idea.]]></description>
		<content:encoded><![CDATA[<p>Thanks!  I believe the new version of my blog article gets the facts straight&#8212;if you see any actual errors please let me know.  </p>
<p>I believe the usual definition of &#8216;estimator&#8217; requires that you&#8217;re able to compute an estimate from <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> samples of data no matter what those samples are.  As you note, Smith and Schürmann&#8217;s result evades Paninski&#8217;s theorem by relaxing this definition of &#8216;estimator&#8217;: their formula only lets you compute an estimate if your samples obey some condition.   One might call this a &#8216;conditional estimator&#8217;.  I imagine some statisticians have already thought hard about this idea.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: memming</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16750</link>
		<dc:creator><![CDATA[memming]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 19:17:50 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16750</guid>
		<description><![CDATA[Actually I&#039;ve seen the same (or very similar) entropy estimator as in the Smith and Schürmann paper you mentioned. And the unbiasedness proof seems to be right, given that we have enough data to observe every symbol at least once (for $latex H_1$) while Paninski&#039;s proof assumes a fixed $latex N$ number of symbols observed. Given infinite data, most entropy estimators are in fact asymptotically unbiased.

BTW, if the number of symbols (or species) is known, entropy is always bounded by $latex [0, \log K]$, so the variance cannot be infinite. When it is unknown, it is a different story.]]></description>
		<content:encoded><![CDATA[<p>Actually I&#8217;ve seen the same (or very similar) entropy estimator as in the Smith and Schürmann paper you mentioned. And the unbiasedness proof seems to be right, given that we have enough data to observe every symbol at least once (for <img src='http://s0.wp.com/latex.php?latex=H_1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='H_1' title='H_1' class='latex' />) while Paninski&#8217;s proof assumes a fixed <img src='http://s0.wp.com/latex.php?latex=N&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='N' title='N' class='latex' /> number of symbols observed. Given infinite data, most entropy estimators are in fact asymptotically unbiased.</p>
<p>BTW, if the number of symbols (or species) is known, entropy is always bounded by <img src='http://s0.wp.com/latex.php?latex=%5B0%2C+%5Clog+K%5D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='[0, &#92;log K]' title='[0, &#92;log K]' class='latex' />, so the variance cannot be infinite. When it is unknown, it is a different story.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Blake Stacey</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16744</link>
		<dc:creator><![CDATA[Blake Stacey]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 16:00:43 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16744</guid>
		<description><![CDATA[Typo alert: $X$ is missing its &quot;latex&quot;.

(I had a more substantial comment to make here, but I realised I should explicitly work through the de-Finetti-theorem-related stuff I was thinking about before I started yapping on over it.)]]></description>
		<content:encoded><![CDATA[<p>Typo alert: $X$ is missing its &#8220;latex&#8221;.</p>
<p>(I had a more substantial comment to make here, but I realised I should explicitly work through the de-Finetti-theorem-related stuff I was thinking about before I started yapping on over it.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Blake Stacey</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16743</link>
		<dc:creator><![CDATA[Blake Stacey]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 15:47:53 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16743</guid>
		<description><![CDATA[The most recent thing I&#039;ve seen in the tradition I was talking about is this:

A. Bialas, W. Czyz, K. Zalewski (2006), &quot;Measurement of Renyi entropies in multiparticle production: a DO-LIST II&quot; &lt;i&gt;Acta Physica Polonica B&lt;/i&gt; &lt;b&gt;37:&lt;/b&gt; 2713&#8211;28 [&lt;a href=&quot;http://arxiv.org/abs/hep-ph/0607082&quot; rel=&quot;nofollow&quot;&gt;arXiv:hep-ph/0607082&lt;/a&gt;].]]></description>
		<content:encoded><![CDATA[<p>The most recent thing I&#8217;ve seen in the tradition I was talking about is this:</p>
<p>A. Bialas, W. Czyz, K. Zalewski (2006), &#8220;Measurement of Renyi entropies in multiparticle production: a DO-LIST II&#8221; <i>Acta Physica Polonica B</i> <b>37:</b> 2713&ndash;28 [<a href="http://arxiv.org/abs/hep-ph/0607082" rel="nofollow">arXiv:hep-ph/0607082</a>].</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lou Jost</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16732</link>
		<dc:creator><![CDATA[Lou Jost]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 09:20:48 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16732</guid>
		<description><![CDATA[Anne Chao has such a result, and it will lead to the best available estimator of entropy! Stay tuned....

John, I just gave a talk yesterday afternoon here in Barcelona (we all miss you!!!) about estimation of species richness, entropy, and Hill numbers. I suggested a way to &quot;tune&quot; the observed relative abundances (which sum to unity) by noting that their true values must sum to the coverage deficit $latex 1-C$, where $latex C$ is the Good--Turing coverage, the population share of the species detected in the sample. The coverage deficit can be estimated from the sample as $latex f_1/n$ where $latex f_1$ is the number of singletons in the sample and n is sample size. 

Chao and Shen use this method to tune all observed frequencies by a  correction factor $latex 1/(1-C).$ However, I later realized (and Anne agreed) that this is not quite correct (though it works very well anyway), because the relative uncertainty in the estimate of $latex p_i$ is very small for the large $latex p_i$ but is large for the small $latex p_i.$ For example, a species represented only once in the sample, with frequency $latex 1/n,$ could easily have been missed or been found twice; its true frequency in the population might easily be $latex 2/n$ or $latex 0.5/n,$ so our relative uncertainty in its value is huge. Contrast this with a species which makes up 50% of the sample. Using the binomial formula for standard deviation, we can show that the  precision of this estimate is very high if $latex n$ is large. We don&#039;t have the right to tune that frequency very much. We should tune all frequencies in proportion to their uncertainty. It turns out to be easy to write a simple formula for this.

Tuning only solves half the problem, though. In the case of entropy, the abundance distribution of the unseen species has a largish effect on the value of entropy. That part still needs more work. Hopefully Anne&#039;s innovation will make this unnecessary.]]></description>
		<content:encoded><![CDATA[<p>Anne Chao has such a result, and it will lead to the best available estimator of entropy! Stay tuned&#8230;.</p>
<p>John, I just gave a talk yesterday afternoon here in Barcelona (we all miss you!!!) about estimation of species richness, entropy, and Hill numbers. I suggested a way to &#8220;tune&#8221; the observed relative abundances (which sum to unity) by noting that their true values must sum to the coverage deficit <img src='http://s0.wp.com/latex.php?latex=1-C&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='1-C' title='1-C' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=C&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='C' title='C' class='latex' /> is the Good&#8211;Turing coverage, the population share of the species detected in the sample. The coverage deficit can be estimated from the sample as <img src='http://s0.wp.com/latex.php?latex=f_1%2Fn&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='f_1/n' title='f_1/n' class='latex' /> where <img src='http://s0.wp.com/latex.php?latex=f_1&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='f_1' title='f_1' class='latex' /> is the number of singletons in the sample and n is sample size. </p>
<p>Chao and Shen use this method to tune all observed frequencies by a  correction factor <img src='http://s0.wp.com/latex.php?latex=1%2F%281-C%29.&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='1/(1-C).' title='1/(1-C).' class='latex' /> However, I later realized (and Anne agreed) that this is not quite correct (though it works very well anyway), because the relative uncertainty in the estimate of <img src='http://s0.wp.com/latex.php?latex=p_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p_i' title='p_i' class='latex' /> is very small for the large <img src='http://s0.wp.com/latex.php?latex=p_i&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p_i' title='p_i' class='latex' /> but is large for the small <img src='http://s0.wp.com/latex.php?latex=p_i.&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p_i.' title='p_i.' class='latex' /> For example, a species represented only once in the sample, with frequency <img src='http://s0.wp.com/latex.php?latex=1%2Fn%2C&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='1/n,' title='1/n,' class='latex' /> could easily have been missed or been found twice; its true frequency in the population might easily be <img src='http://s0.wp.com/latex.php?latex=2%2Fn&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='2/n' title='2/n' class='latex' /> or <img src='http://s0.wp.com/latex.php?latex=0.5%2Fn%2C&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='0.5/n,' title='0.5/n,' class='latex' /> so our relative uncertainty in its value is huge. Contrast this with a species which makes up 50% of the sample. Using the binomial formula for standard deviation, we can show that the  precision of this estimate is very high if <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> is large. We don&#8217;t have the right to tune that frequency very much. We should tune all frequencies in proportion to their uncertainty. It turns out to be easy to write a simple formula for this.</p>
<p>Tuning only solves half the problem, though. In the case of entropy, the abundance distribution of the unseen species has a largish effect on the value of entropy. That part still needs more work. Hopefully Anne&#8217;s innovation will make this unnecessary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dave tweed</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16731</link>
		<dc:creator><![CDATA[dave tweed]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 07:53:54 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16731</guid>
		<description><![CDATA[A tangential thought: an &quot;estimator&quot; can be viewed as an example of a &quot;machine learning model&quot; that just happens to have no adjustable parameters at all. But in machine learning it&#039;s a common viewpoint that if you want to reduce (even minimise) the generalisation error when using your model there are terms in the error which can be grouped as &quot;due to bias&quot; and which can be grouped as &quot;due to variance&quot;, and in general it&#039;s trading off these effectively that gives the best performance on new data (see &lt;a href=&quot;http://en.wikipedia.org/wiki/Supervised_learning#Bias-variance_tradeoff&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;). Now admittedly for such a &quot;trivial&quot; case it may be possible to do an analysis/optimisation that goes further, but is it possible that neither unbiased nor minimal variance estimators are the most appropriate, but some trade-off between the two that minimises the statistic&#039;s error when applied to finite sets of samples from experiments?]]></description>
		<content:encoded><![CDATA[<p>A tangential thought: an &#8220;estimator&#8221; can be viewed as an example of a &#8220;machine learning model&#8221; that just happens to have no adjustable parameters at all. But in machine learning it&#8217;s a common viewpoint that if you want to reduce (even minimise) the generalisation error when using your model there are terms in the error which can be grouped as &#8220;due to bias&#8221; and which can be grouped as &#8220;due to variance&#8221;, and in general it&#8217;s trading off these effectively that gives the best performance on new data (see <a href="http://en.wikipedia.org/wiki/Supervised_learning#Bias-variance_tradeoff" rel="nofollow">here</a>). Now admittedly for such a &#8220;trivial&#8221; case it may be possible to do an analysis/optimisation that goes further, but is it possible that neither unbiased nor minimal variance estimators are the most appropriate, but some trade-off between the two that minimises the statistic&#8217;s error when applied to finite sets of samples from experiments?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Baez</title>
		<link>http://johncarlosbaez.wordpress.com/2012/07/12/the-mathematics-of-biodiversity-part-7/#comment-16722</link>
		<dc:creator><![CDATA[John Baez]]></dc:creator>
		<pubDate>Fri, 13 Jul 2012 03:21:22 +0000</pubDate>
		<guid isPermaLink="false">http://johncarlosbaez.wordpress.com/?p=10580#comment-16722</guid>
		<description><![CDATA[I thought I&#039;d deleted the word &quot;known&quot; in that sentence, to lessen the confusion.  I&#039;ll do it now.

Of course there&#039;s a huge debate on what probabilities actually mean, with subjective Bayesians religiously avoiding phrases like &quot;the true probability&quot;---and for good reasons.  It&#039;s an important debate that I&#039;ve spent a lot of time on.  I&#039;m actually a subjective Bayesian with mild reservations concerning the possibility that even this position doesn&#039;t seem to get to the bottom of certain issues.

But here I was trying to explain the concept of an &lt;a href=&quot;http://en.wikipedia.org/wiki/Bias_of_an_estimator&quot; rel=&quot;nofollow&quot;&gt;unbiased estimator&lt;/a&gt; without getting too technical.  This is a mathematical issue that largely sidesteps the debate about what probabilities mean.  But the problem is that when you talk using words, it&#039;s hard to avoid seeming like you&#039;re taking a position on that debate---and it&#039;s a lot simpler to talk like someone who believes probabilities are an objective feature of reality.  

So, in anything I write, you should be able to take phrases like &quot;the true probability distribution&quot; and replace them with &quot;the probability distribution that models Fred&#039;s beliefs&quot; without anything bad happening.  

Allowing myself to get more technical, the math goes like this:

Suppose $latex p$ is a probability distribution on a finite set $latex X$.   Its entropy is

$latex \displaystyle{ S = -\sum_{x \in X} p(x) \, \ln p(x) }$

Define an &lt;b&gt;estimator for entropy&lt;/b&gt; to be a function

$latex \hat{S}: X^n \to \mathbb{R}$

The idea is that given $latex n$ samples from the set $X$, the estimator gives a number $latex \hat{S}(x_1, \dots, x_n)$ that&#039;s supposed to be an estimate of entropy.   If these samples are independent and distributed according to the distribution $latex p$, the &lt;b&gt;mean estimated entropy&lt;/b&gt; will be

$latex \displaystyle{ \langle \hat{S} \rangle = \sum_{x_1, \dots, x_n \in X} \hat{S}(x_1, \dots, x_n) \, p(x_1) \cdots p(x_n) } $

The &lt;b&gt;bias&lt;/b&gt; of the estimator is the difference between the mean estimated entropy and the actual entropy of $latex p$:

$latex  \langle \hat{S} \rangle - S $

The estimator is &lt;b&gt;unbiased&lt;/b&gt; if the bias is zero for all $latex p$.

Proposition 8 of Paninski&#039;s paper says there exists no unbiased estimator for entropy.  

People often think of $latex p$ the &#039;true&#039; probability distribution, whose entropy the estimator is seeking to estimate on the basis of samples distributed according to that distribution.  But the math doesn&#039;t care about the word &#039;true&#039;, and since the definition of &#039;unbiased estimator&#039; involves a property that&#039;s supposed to hold for &lt;i&gt;all&lt;/i&gt; $latex p$, we don&#039;t even need any way to specify a particular one.]]></description>
		<content:encoded><![CDATA[<p>I thought I&#8217;d deleted the word &#8220;known&#8221; in that sentence, to lessen the confusion.  I&#8217;ll do it now.</p>
<p>Of course there&#8217;s a huge debate on what probabilities actually mean, with subjective Bayesians religiously avoiding phrases like &#8220;the true probability&#8221;&#8212;and for good reasons.  It&#8217;s an important debate that I&#8217;ve spent a lot of time on.  I&#8217;m actually a subjective Bayesian with mild reservations concerning the possibility that even this position doesn&#8217;t seem to get to the bottom of certain issues.</p>
<p>But here I was trying to explain the concept of an <a href="http://en.wikipedia.org/wiki/Bias_of_an_estimator" rel="nofollow">unbiased estimator</a> without getting too technical.  This is a mathematical issue that largely sidesteps the debate about what probabilities mean.  But the problem is that when you talk using words, it&#8217;s hard to avoid seeming like you&#8217;re taking a position on that debate&#8212;and it&#8217;s a lot simpler to talk like someone who believes probabilities are an objective feature of reality.  </p>
<p>So, in anything I write, you should be able to take phrases like &#8220;the true probability distribution&#8221; and replace them with &#8220;the probability distribution that models Fred&#8217;s beliefs&#8221; without anything bad happening.  </p>
<p>Allowing myself to get more technical, the math goes like this:</p>
<p>Suppose <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' /> is a probability distribution on a finite set <img src='http://s0.wp.com/latex.php?latex=X&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='X' title='X' class='latex' />.   Its entropy is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+S+%3D+-%5Csum_%7Bx+%5Cin+X%7D+p%28x%29+%5C%2C+%5Cln+p%28x%29+%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;displaystyle{ S = -&#92;sum_{x &#92;in X} p(x) &#92;, &#92;ln p(x) }' title='&#92;displaystyle{ S = -&#92;sum_{x &#92;in X} p(x) &#92;, &#92;ln p(x) }' class='latex' /></p>
<p>Define an <b>estimator for entropy</b> to be a function</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Chat%7BS%7D%3A+X%5En+%5Cto+%5Cmathbb%7BR%7D&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{S}: X^n &#92;to &#92;mathbb{R}' title='&#92;hat{S}: X^n &#92;to &#92;mathbb{R}' class='latex' /></p>
<p>The idea is that given <img src='http://s0.wp.com/latex.php?latex=n&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='n' title='n' class='latex' /> samples from the set $X$, the estimator gives a number <img src='http://s0.wp.com/latex.php?latex=%5Chat%7BS%7D%28x_1%2C+%5Cdots%2C+x_n%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;hat{S}(x_1, &#92;dots, x_n)' title='&#92;hat{S}(x_1, &#92;dots, x_n)' class='latex' /> that&#8217;s supposed to be an estimate of entropy.   If these samples are independent and distributed according to the distribution <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' />, the <b>mean estimated entropy</b> will be</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cdisplaystyle%7B+%5Clangle+%5Chat%7BS%7D+%5Crangle+%3D+%5Csum_%7Bx_1%2C+%5Cdots%2C+x_n+%5Cin+X%7D+%5Chat%7BS%7D%28x_1%2C+%5Cdots%2C+x_n%29+%5C%2C+p%28x_1%29+%5Ccdots+p%28x_n%29+%7D+&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;displaystyle{ &#92;langle &#92;hat{S} &#92;rangle = &#92;sum_{x_1, &#92;dots, x_n &#92;in X} &#92;hat{S}(x_1, &#92;dots, x_n) &#92;, p(x_1) &#92;cdots p(x_n) } ' title='&#92;displaystyle{ &#92;langle &#92;hat{S} &#92;rangle = &#92;sum_{x_1, &#92;dots, x_n &#92;in X} &#92;hat{S}(x_1, &#92;dots, x_n) &#92;, p(x_1) &#92;cdots p(x_n) } ' class='latex' /></p>
<p>The <b>bias</b> of the estimator is the difference between the mean estimated entropy and the actual entropy of <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' />:</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Clangle+%5Chat%7BS%7D+%5Crangle+-+S+&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;langle &#92;hat{S} &#92;rangle - S ' title='&#92;langle &#92;hat{S} &#92;rangle - S ' class='latex' /></p>
<p>The estimator is <b>unbiased</b> if the bias is zero for all <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' />.</p>
<p>Proposition 8 of Paninski&#8217;s paper says there exists no unbiased estimator for entropy.  </p>
<p>People often think of <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' /> the &#8216;true&#8217; probability distribution, whose entropy the estimator is seeking to estimate on the basis of samples distributed according to that distribution.  But the math doesn&#8217;t care about the word &#8216;true&#8217;, and since the definition of &#8216;unbiased estimator&#8217; involves a property that&#8217;s supposed to hold for <i>all</i> <img src='http://s0.wp.com/latex.php?latex=p&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='p' title='p' class='latex' />, we don&#8217;t even need any way to specify a particular one.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
