John Harte is an ecologist who uses maximum entropy methods to predict the distribution, abundance and energy usage of species. Marc Harper uses information theory in bioinformatics and evolutionary game theory. Harper, Harte and I are organizing a workshop on entropy and information in biological systems, and I’m really excited about it!
It’ll take place at the National Institute for Mathematical and Biological Synthesis in Knoxville Tennesee. It’s tentatively scheduled for October 22-24, 2014. When the date gets confirmed, I’ll post an advertisement so you can apply to attend.
Writing the proposal was fun, because we got to pull together lots of interesting people who are applying information theory and entropy to biology in quite different ways. So, here it is!
Ever since Shannon initiated research on information theory in 1948, there have been hopes that the concept of information could serve as a tool to help systematize and unify work in biology. The link between information and entropy was noted very early on, and it suggested that a full thermodynamic understanding of biology would naturally involve the information processing and storage that are characteristic of living organisms. However, the subject is full of conceptual pitfalls for the unwary, and progress has been slower than initially expected. Premature attempts at ‘grand syntheses’ have often misfired. But applications of information theory and entropy to specific highly focused topics in biology have been increasingly successful, such as:
• the maximum entropy principle in ecology,
• Shannon and Rényi entropies as measures of biodiversity,
• information theory in evolutionary game theory,
• information and the thermodynamics of individual cells.
Because they work in diverse fields, researchers in these specific topics have had little opportunity to trade insights and take stock of the progress so far. The aim of the workshop is to do just this.
In what follows, participants’ names are in boldface, while the main goals of the workshop are in italics.
Roderick Dewar is a key advocate of the principle of Maximum Entropy Production, which says that biological systems—and indeed all open, non-equilibrium systems—act to produce entropy at the maximum rate. Along with others, he has applied this principle to make testable predictions in a wide range of biological systems, from ATP synthesis [DJZ2006] to respiration and photosynthesis of individual plants [D2010] and plant communities. He has also sought to derive this principle from ideas in statistical mechanics [D2004, D2009], but it remains controversial.
The first goal of this workshop is to study the validity of this principle.
While they may be related, the principle of Maximum Entropy Production should not be confused with the MaxEnt inference procedure, which says that we should choose the probabilistic hypothesis with the highest entropy subject to the constraints provided by our data. MaxEnt was first explicitly advocated by Jaynes. He noted that it is already implicit in the procedures of statistical mechanics, but convincingly argued that it can also be applied to situations where entropy is more ‘informational’ than ‘thermodynamic’ in character.
Recently John Harte has applied MaxEnt in this way to ecology, using it to make specific testable predictions for the distribution, abundance and energy usage of species across spatial scales and across habitats and taxonomic groups [Harte2008, Harte2009, Harte2011]. Annette Ostling is an expert on other theories that attempt to explain the same data, such as the ‘neutral model’ [AOE2008, ODLSG2009, O2005, O2012]. Dewar has also used MaxEnt in ecology [D2008], and he has argued that it underlies the principle of Maximum Entropy Production.
Thus, a second goal of this workshop is to familiarize all the participants with applications of the MaxEnt method to ecology, compare it with competing approaches, and study whether MaxEnt provides a sufficient justification for the principle of Maximum Entropy Production.
Entropy is not merely a predictive tool in ecology: it is also widely used as a measure of biodiversity. Here Shannon’s original concept of entropy naturally generalizes to ‘Rényi entropy’, which depends on a parameter . This equals
where is the fraction of organisms of the th type (which could mean species, some other taxon, etc.). In the limit this reduces to the Shannon entropy:
As increases, we give less weight to rare types of organisms. Christina Cobbold and Tom Leinster have described a systematic and highly flexible system of biodiversity measurement, with Rényi entropy at its heart [CL2012]. They consider both the case where all we have are the numbers , and the more subtle case where we take the distance between different types of organisms into account.
John Baez has explained the role of Rényi entropy in thermodynamics [B2011], and together with Tom Leinster and Tobias Fritz he has proved other theorems characterizing entropy which explain its importance for information processing [BFL2011]. However, these ideas have not yet been connected to the widespread use of entropy in biodiversity studies. More importantly, the use of entropy as a measure of biodiversity has not been clearly connected to MaxEnt methods in ecology. Does the success of MaxEnt methods imply a tendency for ecosystems to maximize biodiversity subject to the constraints of resource availability? This seems surprising, but a more nuanced statement along these general lines might be correct.
So, a third goal of this workshop is to clarify relations between known characterizations of entropy, the use of entropy as a measure of biodiversity, and the use of MaxEnt methods in ecology.
As the amount of data to analyze in genomics continues to surpass the ability of humans to analyze it, we can expect automated experiment design to become ever more important. In Chris Lee and Marc Harper’s RoboMendel program [LH2013], a mathematically precise concept of ‘potential information’—how much information is left to learn—plays a crucial role in deciding what experiment to do next, given the data obtained so far. It will be useful for them to interact with William Bialek, who has expertise in estimating entropy from empirical data and using it to constrain properties of models [BBS, BNS2001, BNS2002], and Susanne Still, who applies information theory to automated theory building and biology [CES2010, PS2012].
However, there is another link between biology and potential information. Harper has noted that in an ecosystem where the population of each type of organism grows at a rate proportional to its fitness (which may depend on the fraction of organisms of each type), the quantity
always decreases if there is an evolutionarily stable state [Harper2009]. Here is the fraction of organisms of the th genotype at a given time, while is this fraction in the evolutionarily stable state. This quantity is often called the Shannon information of ‘relative to’ . But in fact, it is precisely the same as Lee and Harper’s potential information! Indeed, there is a precise mathematical analogy between evolutionary games and processes where a probabilistic hypothesis is refined by repeated experiments.
Thus, a fourth goal of this workshop is to develop the concept of evolutionary games as ‘learning’ processes in which information is gained over time.
We shall try to synthesize this with Carl Bergstrom and Matina Donaldson-Matasci’s work on the ‘fitness value of information’: a measure of how much increase in fitness a population can obtain per bit of extra information [BL2004, DBL2010, DM2013]. Following Harper, we shall consider not only relative Shannon entropy, but also relative Rényi entropy, as a measure of information gain [Harper2011].
A fifth and final goal of this workshop is to study the interplay between information theory and the thermodynamics of individual cells and organelles.
Suzanne Still has studied the thermodynamics of prediction in biological systems [BCSS2012]. And in a celebrated related piece of work, Jeremy England used thermodynamic arguments to a derive a lower bound for the amount of entropy generated during a process of self-replication of a bacterial cell [England2013]. Interestingly, he showed that E. coli comes within a factor of 3 of this lower bound.
In short, information theory and entropy methods are becoming powerful tools in biology, from the level of individual cells, to whole ecosystems, to experimental design, model-building, and the measurement of biodiversity. The time is ripe for an investigative workshop that brings together experts from different fields and lets them share insights and methods and begin to tackle some of the big remaining questions.
[AOE2008] D. Alonso, A. Ostling and R. Etienne, The assumption of symmetry and species abundance distributions, Ecology Letters 11 (2008), 93–105.
[TMMABB2012} D. Amodei, W. Bialek, M. J. Berry II, O. Marre, T. Mora, and G. Tkacik, The simplest maximum entropy model for collective behavior in a neural network, arXiv:1207.6319 (2012).
[B2011] J. Baez, Rényi entropy and free energy, arXiv:1102.2098 (2011).
[BFL2011] J. Baez, T. Fritz and T. Leinster, A characterization of entropy in terms of information loss, Entropy 13 (2011), 1945–1957.
[B2011] J. Baez and M. Stay, Algorithmic thermodynamics, Math. Struct. Comp. Sci. 22 (2012), 771–787.
[BCSS2012] A. J. Bell, G. E. Crooks, S. Still and D. A Sivak, The thermodynamics of prediction, Phys. Rev. Lett. 109 (2012), 120604.
[BL2004] C. T. Bergstrom and M. Lachmann, Shannon information and biological fitness, in IEEE Information Theory Workshop 2004, IEEE, 2004, pp. 50-54.
[BBS] M. J. Berry II, W. Bialek and E. Schneidman, An information theoretic approach to the functional classification of neurons, in Advances in Neural Information Processing Systems 15, MIT Press, 2005.
[BNS2001] W. Bialek, I. Nemenman and N. Tishby, Predictability, complexity and learning, Neural Computation 13 (2001), 2409–2463.
[BNS2002] W. Bialek, I. Nemenman and F. Shafee, Entropy and inference, revisited, in Advances in Neural Information Processing Systems 14, MIT Press, 2002.
[CL2012] C. Cobbold and T. Leinster, Measuring diversity: the importance of species similarity, Ecology 93 (2012), 477–489.
[CES2010] J. P. Crutchfield, S. Still and C. Ellison, Optimal causal inference: estimating stored information and approximating causal architecture, Chaos 20 (2010), 037111.
[D2004] R. C. Dewar, Maximum entropy production and non-equilibrium statistical mechanics, in Non-Equilibrium Thermodynamics and Entropy Production: Life, Earth and Beyond, eds. A. Kleidon and R. Lorenz, Springer, New York, 2004, 41–55.
[DJZ2006] R. C. Dewar, D. Juretíc, P. Zupanovíc, The functional design of the rotary enzyme ATP synthase is consistent with maximum entropy production, Chem. Phys. Lett. 430 (2006), 177–182.
[D2008] R. C. Dewar, A. Porté, Statistical mechanics unifies different ecological patterns, J. Theor. Bio. 251 (2008), 389–403.
[D2009] R. C. Dewar, Maximum entropy production as an inference algorithm that translates physical assumptions into macroscopic predictions: don’t shoot the messenger, Entropy 11 (2009), 931–944.
[D2010] R. C. Dewar, Maximum entropy production and plant optimization theories, Phil. Trans. Roy. Soc. B 365 (2010) 1429–1435.
[DBL2010} M. C. Donaldson-Matasci, C. T. Bergstrom, and
M. Lachmann, The fitness value of information, Oikos 119 (2010), 219-230.
[DM2013] M. C. Donaldson-Matasci, G. DeGrandi-Hoffman, and A. Dornhaus, Bigger is better: honey bee colonies as distributed information-gathering systems, Animal Behaviour 85 (2013), 585–592.
[England2013] J. L. England, Statistical physics of self-replication, J. Chem. Phys. 139 (2013), 121923.
[ODLSG2009} J. L. Green, J. K. Lake, J. P. O’Dwyer, A. Ostling and V. M. Savage, An integrative framework for stochastic, size-structured community assembly, PNAS 106 (2009), 6170--6175.
[Harper2009] M. Harper, Information geometry and evolutionary game theory, arXiv:0911.1383 (2009).
[Harper2011] M. Harper, Escort evolutionary game theory, Physica D 240 (2011), 1411–1415.
[Harte2008] J. Harte, T. Zillio, E. Conlisk and A. Smith, Maximum entropy and the state-variable approach to macroecology, Ecology 89 (2008), 2700-–2711.
[Harte2009] J. Harte, A. Smith and D. Storch, Biodiversity scales from plots to biomes with a universal species-area curve, Ecology Letters 12 (2009), 789–797.
[Harte2011] J. Harte, Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics, Oxford U. Press, Oxford, 2011.
[LH2013] M. Harper and C. Lee, Basic experiment planning via information metrics: the RoboMendel problem, arXiv:1210.4808 (2012).
[O2005] A. Ostling, Neutral theory tested by birds, Nature 436 (2005), 635.
[O2012] A. Ostling, Do fitness-equalizing tradeoffs lead to neutral communities?, Theoretical Ecology 5 (2012), 181–194.
[PS2012] D. Precup and S. Still, An information-theoretic approach to curiosity-driven reinforcement learning, Theory in Biosciences 131 (2012), 139–148.