Last time I mentioned that estimating entropy from real-world data is important not just for measuring biodiversity, but also for another area of biology: *neurobiology!*

When you look at something, neurons in your eye start firing. But how, exactly, is their firing related to what you see? Questions like this are hard! Answering them— ‘cracking the neural code’—is a big challenge. To make progress, neuroscientists are using information theory. But as I explained last time, estimating information from experimental data is tricky.

Romain Brasselet, now a postdoc at the Max Planck Institute for Biological Cybernetics at Tübingen, is working on these topics. He sent me a nice email explaining this area.

This is a bit of a digression, but the Mathematics of Biodiversity program in Barcelona has been extraordinarily multidisciplinary, with category theorists rubbing shoulders with ecologists, immunologists and geneticists. One of the common themes is entropy and its role in biology, so I think it’s worth posting Romain’s comments here. This is what he has to say…

### Information in neurobiology

I will try to explain why neurobiologists are today very interested in reliable estimates of entropy/information and what are the techniques we use to obtain them.

The activity of sensory as well as more central neurons is known to be modulated by external stimulations. In 1926, in a seminal paper, Adrian observed that neurons in the sciatic nerve of the frog fire action potentials (or spikes) when some muscle in the hindlimb is stretched. In addition, he observed that the frequency of the spikes increases with the amplitude of the stretching.

• E.D. Adrian, The impulses produced by sensory nerve endings. (1926).

For another very nice example, in 1962, Hubel and Wiesel found neurons in the cat visual cortex whose activity depends on the orientation of a visual stimulus, a simple black line over white background: some neurons fire preferentially for one orientation of the line (Hubel and Wiesel were awarded the 1981 Nobel Prize in Physiology for their work). This incidentally led to the concept of “receptive field” which is of tremendous importance in neurobiology—but though it’s fascinating, it’s a different topic.

Good, we are now able to define what makes a neuron tick. The problem is that neural activity is often very “noisy”: when the exact same stimulus is presented many times, the responses appear to be very different from trial to trial. Even careful observation cannot necessarily reveal correlations between the stimulations and the neural activity. So we would like a measure capable of capturing the statistical dependencies between the stimulation and the response of the neuron to know if we can say something about the stimulation just by observing the response of a neuron, which is essentially the task of the brain. In particular, we want a fundamental measure that does not rely on any assumption about the functioning of the brain. Information theory provides the tools to do this, that is why we like to use it: we often try to measure the mutual information between stimuli and responses.

To my knowledge, the first paper using information theory in neuroscience was by MacKay and McCulloch in 1952:

• Donald M. Mackay and Warren S. McCulloch, The limiting information capacity of a neuronal link, *Bulletin of Mathematical Biophysics* **14** (1952), 127–135.

But information theory was not used in neuroscience much until the early 90’s. It started again with a paper by Bialek *et al.* in 1991:

• W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck and D. Warland, Reading a neural code, *Science* **252** (1991), 1854–1857.

However, when applying information-theoretic methods to biological data, we often have a limited sampling of the neural response, we are usually very happy when we have 50 trials for a given stimulus. Why is this limited sample a problem?

During the major part of the 20th century, following Adrian’s finding, the paradigm for the neural code was the frequency of the spikes or, equivalently, the number of spikes in a window of time. But in the early 90’s, it was observed that the exact timing of spikes is (in some cases) reliable across trials. So instead of considering the neural response as a single number (the number of spikes), the temporal patterns of spikes started to be taken into account. But time is continuous, so to be able to do actual computations, time was discretized and a neural response became a binary string.

Now, if you consider relevant time-scales, say, a 100 millisecond time window with a 1 millisecond bin with a firing frequency of about 50 per second, then your response space is huge and the estimates of information with only 50 trials are not reliable anymore. That’s why a lot of efforts have been carried out to overcome the limited sampling bias.

Now, getting at the techniques developed in this field, John already mentioned the work by Liam Paninski, but here are other very interesting references:

• Stefano Panzeri and Alessandro Treves, Analytical estimates of limited sampling biases in different information measures, * Network: Computation in Neural Systems* **7** (1996), 87–107.

They computed the first-order bias of the information (related to the Miller–Madow correction) and then used a Bayesian technique to estimate the number of responses not included in the sample but that would be in an infinite sample (a goal similar to that of Good’s rule of thumb).

• S.P. Strong, R. Koberle, R.R. de Ruyter van Steveninck, and W. Bialek, Entropy and information in neural spike trains, * Phys. Rev. Lett.* **80** (1998), 197–200.

The entropy (or if you prefer, information) estimate can be expanded in a power series in (the sample size) around the true value. By computing the estimate for various values of and fitting it with a parabola, it is possible to estimate the value of the entropy as

These approaches are also well-known:

• Ilya Nemenman, Fariel Shafee and William Bialek, Entropy and inference, revisited, 2002.

• Alexander Kraskov, Harald Stögbauer and Peter Grassberger, Estimating mutual information, *Phys. Rev. E.* **69** (2004), 066138.

Actually, Stefano Panzeri has quite a few impressive papers about this problem, and recently with colleagues he has made public a free Matlab toolbox for information theory (www.ibtb.org) implementing various correction methods.

Finally, the work by Jonathan Victor is worth mentioning, since he provided (to my knowledge again) the first estimate of mutual information using geometry. This is of particular interest with respect to the work by Christina Cobbold and Tom Leinster on measures of biodiversity that take the distance between species into account:

• J. D. Victor and K. P. Purpura, Nature and precision of temporal coding in visual cortex: a metric-space analysis, *Journal of Neural Physiology* **76** (1996), 1310–1326.

He introduced a distance between sequences of spikes and from this, derived a lower bound on mutual information.

• Jonathan D. Victor, Binless strategies for estimation of information from neural data, *Phys. Rev. E.* **66** (2002), 051903.

Taking inspiration from work by Kozachenko and Leonenko, he obtained an estimate of the information based on the distances between the closest responses.

Without getting too technical, that’s what we do in neuroscience about the limited sampling bias. The incentive is that obtaining reliable estimates is crucial to understand the ‘neural code’, the holy grail of computational neuroscientists.

Perhaps I missed something in a previous post or in this one but can you provide a link to “the program in Barcelona”? Thanks.

Perhaps this is it http://fens2012.neurosciences.asso.fr/

Hmm, that’s not it, but it looks interesting.

This whole series of posts is about the Research Program on the Mathematics Biodiversity, at CRM, which lasts from June 18 to July 20, 2012. I edited this post to clarify that this is the ‘program’ I was talking about.

Also widely used entropy estimator in neuroscience is a Bayesian estimator called NSB (Nemenman-Shafee-Bialek) estimator.

I. Nemenman, F. Shafee, and W. Bialek, “Entropy and inference, revisited,” Advances in neural information processing, vol. 14, 2002.

See also:

http://www.menem.com/~ilya/wiki/index.php/Entropy_Estimation

Romain has recently edited this post to include more information, and that reference is now among those he lists!

Coincidentally I think the Donald MacKay mentioned here is the father of Prof David MacKay, author of “Sustainable Energy, without the hot air”.

Interesting discussion! It would seem that you guys are thinking in the same general direction as I have for the past 5 years or so. After reading through the blogs, I believe some of the issues discussed here may find partial solutions in a number of papers I wrote. In particular,

1. Zhang, Z. and Zhou, J. (2010). Re-parameterization of multinomial distributions and diversity indices. Journal of Statistical Planning and Inference 140, pp. 1731–1738, 2010;

2. Zhang, Z. (2012). Entropy estimation in Turing’s perspective. Neural Computation, 24, pp. 1368—1389.

I have been thinking about the question what constitutes a biodiversity index for years and the following is a summary of what I believe currently:

1. The word “biodiversity” clearly conveys certain intuitive meaning. However people seem very hesitant to give mathematical definitions for diversity. This puzzles me. The only reason I could see is that “diversity” may be reasonably understood in two different ways, one involving the total number of species (K) and another involving the species proportions (p_{k}). Personally I believe the latter is the real issue – not that K is not important, it is and it is well defined.

2. The lack of (universally accepted) definition of diversity hinders the methodological advancement in issues such as statistical estimation. We need to develop (good) general definitions of diversity.

Pertaining to 2 above, I wish to see some discussion or comments on the minimal set of conditions a “diversity index” must satisfy. I can only think of three:

1. It must be non-negative (this could be a superficial one).

2. It must be permutation (with respect to the letters of the alphabet) invariant.

3. It attains its minimum (e.g. 0) when p_{k}=1 for some k.

Can you think of any other? Maybe this could be the beginning of a fruitful discussion.

Hello! The people attending the Mathematics of Biodiversity conference are very interested in two other conditions that a diversity index should satisfy, namely that it be an ‘effective number’ and that it obey the ‘replication principle’. I explained these conditions in Part 4. The Hill numbers are favored as diversity indices because they obey these conditions (as well as the ones you list), while the Shannon and Rényi entropies do not. In his talk, Lou Jost explained some serious mistakes that people have made by working with diversity indices that do not obey these additional conditions. I also recommend these:

• Lou Jost, Entropy and diversity,

Oikos113(2006), 363–375.• Tom Leinster, Measuring biodiversity,

Azimuth, 7 November 2011.Thanks for pointing out your papers! I’ll tell Tom and Lou about them.