Information and Entropy in Biological Systems (Part 4)

21 May, 2015

I kicked off the workshop on Information and Entropy in Biological Systems with a broad overview of the many ways information theory and entropy get used in biology:

• John Baez, Information and entropy in biological systems.

Abstract. Information and entropy are being used in biology in many different ways: for example, to study biological communication systems, the ‘action-perception loop’, the thermodynamic foundations of biology, the structure of ecosystems, measures of biodiversity, and evolution. Can we unify these? To do this, we must learn to talk to each other. This will be easier if we share some basic concepts which I’ll sketch here.

The talk is full of links, in blue. If you click on these you can get more details. You can also watch a video of my talk:


Information and Entropy in Biological Systems (Part 3)

20 May, 2015

We had a great workshop on information and entropy in biological systems, and now you can see what it was like. I think I’ll post these talks one a time, or maybe a few at a time, because they’d be overwhelming taken all at once.

So, let’s dive into Chris Lee’s exciting ideas about organisms as ‘information evolving machines’ that may provide ‘disinformation’ to their competitors. Near the end of his talk, he discusses some new results on an ever-popular topic: the Prisoner’s Dilemma. You may know about this classic book:

• Robert Axelrod, The Evolution of Cooperation, Basic Books, New York, 1984. Some passages available free online.

If you don’t, read it now! He showed that the simple ‘tit for tat’ strategy did very well in some experiments where the game was played repeatedly and strategies who did well got to ‘reproduce’ themselves. This result was very exciting, so a lot of people have done research on it. More recently a paper on this subject by William Press and Freeman Dyson received a lot of hype. I think this is a good place to learn about that:

• Mike Shulman, Zero determinant strategies in the iterated Prisoner’s Dilemma, The n-Category Café, 19 July 2012.

Chris Lee’s new work on the Prisoner’s Dilemma is here, cowritten with two other people who attended the workshop:

The art of war: beyond memory-one strategies in population games, PLOS One, 24 March 2015.

Abstract. We show that the history of play in a population game contains exploitable information that can be successfully used by sophisticated strategies to defeat memory-one opponents, including zero determinant strategies. The history allows a player to label opponents by their strategies, enabling a player to determine the population distribution and to act differentially based on the opponent’s strategy in each pairwise interaction. For the Prisoner’s Dilemma, these advantages lead to the natural formation of cooperative coalitions among similarly behaving players and eventually to unilateral defection against opposing player types. We show analytically and empirically that optimal play in population games depends strongly on the population distribution. For example, the optimal strategy for a minority player type against a resident tit-for-tat (TFT) population is ‘always cooperate’ (ALLC), while for a majority player type the optimal strategy versus TFT players is ‘always defect’ (ALLD). Such behaviors are not accessible to memory-one strategies. Drawing inspiration from Sun Tzu’s the Art of War, we implemented a non-memory-one strategy for population games based on techniques from machine learning and statistical inference that can exploit the history of play in this manner. Via simulation we find that this strategy is essentially uninvadable and can successfully invade (significantly more likely than a neutral mutant) essentially all known memory-one strategies for the Prisoner’s Dilemma, including ALLC (always cooperate), ALLD (always defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant (ZD) strategies, including extortionate and generous strategies.

And now for the talk! Click on the talk title here for Chris Lee’s slides, or go down and watch the video:

• Chris Lee, Empirical information, potential information and disinformation as signatures of distinct classes of information evolving machines.

Abstract. Information theory is an intuitively attractive way of thinking about biological evolution, because it seems to capture a core aspect of biology—life as a solution to “information problems”—in a fundamental way. However, there are non-trivial questions about how to apply that idea, and whether it has actual predictive value. For example, should we think of biological systems as being actually driven by an information metric? One idea that can draw useful links between information theory, evolution and statistical inference is the definition of an information evolving machine (IEM) as a system whose elements represent distinct predictions, and whose weights represent an information (prediction power) metric, typically as a function of sampling some iterative observation process. I first show how this idea provides useful results for describing a statistical inference process, including its maximum entropy bound for optimal inference, and how its sampling-based metrics (“empirical information”, Ie, for prediction power; and “potential information”, Ip, for latent prediction power) relate to classical definitions such as mutual information and relative entropy. These results suggest classification of IEMs into several distinct types:

1. Ie machine: e.g. a population of competing genotypes evolving under selection and mutation is an IEM that computes an Ie equivalent to fitness, and whose gradient (Ip) acts strictly locally, on mutations that it actually samples. Its transition rates between steady states will decrease exponentially as a function of evolutionary distance.

2. “Ip tunneling” machine: a statistical inference process summing over a population of models to compute both Ie, Ip can directly detect “latent” information in the observations (not captured by its model), which it can follow to “tunnel” rapidly to a new steady state.

3. disinformation machine (multiscale IEM): an ecosystem of species is an IEM whose elements (species) are themselves IEMs that can interact. When an attacker IEM can reduce a target IEM’s prediction power (Ie) by sending it a misleading signal, this “disinformation dynamic” can alter the evolutionary landscape in interesting ways, by opening up paths for rapid co-evolution to distant steady-states. This is especially true when the disinformation attack targets a feature of high fitness value, yielding a combination of strong negative selection for retention of the target feature, plus strong positive selection for escaping the disinformation attack. I will illustrate with examples from statistical inference and evolutionary game theory. These concepts, though basic, may provide useful connections between diverse themes in the workshop.


Information and Entropy in Biological Systems (Part 3)

6 April, 2015

I think you can watch live streaming video of our workshop on Information and Entropy in Biological Systems, which runs Wednesday April 8th to Friday April 10th. Later, videos will be made available in a permanent location.

To watch the workshop live, go here. Go down to where it says

Investigative Workshop: Information and Entropy in Biological Systems

Then click where it says live link. There’s nothing there now, but I’m hoping there will be when the show starts!

Below you can see the schedule of talks and a list of participants. The hours are in Eastern Daylight Time: add 4 hours to get Greenwich Mean Time. The talks start at 10 am EDT, which is 2 pm GMT.

Schedule

There will be 1½ hours of talks in the morning and 1½ hours in the afternoon for each of the 3 days, Wednesday April 8th to Friday April 10th. The rest of the time will be for discussions on different topics. We’ll break up into groups, based on what people want to discuss.

Each invited speaker will give a 30-minute talk summarizing the key ideas in some area, not their latest research so much as what everyone should know to start interesting conversations. After that, 15 minutes for questions and/or coffee.

Here’s the schedule. You can already see slides or other material for the talks with links!

Wednesday April 8

• 9:45-10:00 — the usual introductory fussing around.
• 10:00-10:30 — John Baez, Information and entropy in biological systems.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Chris Lee, Empirical information, potential information and disinformation.
• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — John Harte, Maximum entropy as a foundation for theory building in ecology.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Annette Ostling, The neutral theory of biodiversity and other competitors to the principle of maximum entropy.
• 2:45-3:00 — questions, coffee.
• 3:00-5:30 — break up into groups for discussions.

• 5:30 — reception.

Thursday April 9

• 10:00-10:30 — David Wolpert, The Landauer limit and thermodynamics of biological organisms.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Susanne Still, Efficient computation and data modeling.
• 11:30-11:45 — questions.

• 11:45-1:30 — group photo, lunch, conversations.

• 1:30-2:00 — Matina Donaldson-Matasci, The fitness value of information in an uncertain environment.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Roderick Dewar, Maximum entropy and maximum entropy production in biological systems: survival of the likeliest?
• 2:45-3:00 — questions, coffee.
• 3:00-6:00 — break up into groups for discussions.

Friday April 10

• 10:00-10:30 — Marc Harper, Information transport and evolutionary dynamics.
• 10:30-11:00 — questions, coffee.
• 11:00-11:30 — Tobias Fritz, Characterizations of Shannon and Rényi entropy.
• 11:30-11:45 — questions.

• 11:45-1:30 — lunch, conversations.

• 1:30-2:00 — Christina Cobbold, Biodiversity measures and the role of species similarity.
• 2:00-2:15 — questions, coffee.
• 2:15-2:45 — Tom Leinster, Maximizing biological diversity.
• 2:45-3:00 — questions, coffee.
• 3:00-6:00 — break up into groups for discussions.

Participants

Here are the confirmed participants. This list may change a little bit:

• John Baez – mathematical physicist.

• Romain Brasselet – postdoc in cognitive neuroscience knowledgeable about information-theoretic methods and methods of estimating entropy from samples of probability distributions.

• Katharina Brinck – grad student at Centre for Complexity Science at Imperial College; did masters at John Harte’s lab, where she extended his Maximum Entropy Theory of Ecology (METE) to trophic food webs, to study how entropy maximization on the macro scale together with MEP on the scale of individuals drive the structural development of model ecosystems.

• Christina Cobbold – mathematical biologist, has studied the role of species similarity in measuring biodiversity.

• Troy Day – mathematical biologist, works with population dynamics, host-parasite dynamics, etc.; influential and could help move population dynamics to a more information-theoretic foundation.

• Roderick Dewar – physicist who studies the principle of maximal entropy production.

• Barrett Deris – MIT postdoc studying the studying the factors that influence evolvability of drug resistance in bacteria.

• Charlotte de Vries – a biology master’s student who studied particle physics to the master’s level at Oxford and the Perimeter Institute. Interested in information theory.

• Matina Donaldson-Matasci – a biologist who studies information, uncertainty and collective behavior.

• Chris Ellison – a postdoc who worked with James Crutchfield on “information-theoretic measures of structure and memory in stationary, stochastic systems – primarily, finite state hidden Markov models”. He coauthored “Intersection Information based on Common Randomness”, http://arxiv.org/abs/1310.1538. The idea: “The introduction of the partial information decomposition generated a flurry of proposals for defining an intersection information that quantifies how much of “the same information” two or more random variables specify about a target random variable. As of yet, none is wholly satisfactory.” Works on mutual information between organisms and environment (along with David Krakauer and Jessica Flack), and also entropy rates.

• Cameron Freer – MIT postdoc in Brain and Cognitive Sciences working on maximum entropy production principles, algorithmic entropy etc.

• Tobias Fritz – a physicist who has worked on “resource theories” and haracterizations of Shannon and Rényi entropy and on resource theories.

• Dashiell Fryer – works with Marc Harper on information geometry and evolutionary game theory.

• Michael Gilchrist – an evolutionary biologist studying how errors and costs of protein translation affect the codon usage observed within a genome. Works at NIMBioS.

• Manoj Gopalkrishnan – an expert on chemical reaction networks who understands entropy-like Lyapunov functions for these systems.

• Marc Harper – works on evolutionary game theory using ideas from information theory, information geometry, etc.

• John Harte – an ecologist who uses the maximum entropy method to predict the structure of ecosystems.

• Ellen Hines – studies habitat modeling and mapping for marine endangered species and ecosystems, sea level change scenarios, documenting of human use and values. Her lab has used MaxEnt methods.

• Elizabeth Hobson – behavior ecology postdoc developing methods to quantify social complexity in animals. Works at NIMBioS.

• John Jungk – works on graph theory and biology.

• Chris Lee – in bioinformatics and genomics; applies information theory to experiment design and evolutionary biology.

• Maria Leites – works on dynamics, bifurcations and applications of coupled systems of non-linear ordinary differential equations with applications to ecology, epidemiology, and transcriptional regulatory networks. Interested in information theory.

• Tom Leinster – a mathematician who applies category theory to study various concepts of ‘magnitude’, including biodiversity and entropy.

• Timothy Lezon – a systems biologist in the Drug Discovery Institute at Pitt, who has used entropy to characterize phenotypic heterogeneity in populations of cultured cells.

• Maria Ortiz Mancera – statistician working at CONABIO, the National Commission for Knowledge and Use of Biodiversity, in Mexico.

• Yajun Mei – statistician who uses Kullback-Leibler divergence and how to efficiently compute entropy for the two-state hidden Markov models.

• Robert Molzon – mathematical economist who has studied deterministic approximation of stochastic evolutionary dynamics.

• David Murrugarra – works on discrete models in mathematical biology; interested in learning about information theory.

• Annette Ostling – studies community ecology, focusing on the influence of interspecific competition on community structure, and what insights patterns of community structure might provide about the mechanisms by which competing species coexist.

• Connie Phong – grad student at Chicago’s Institute of Genomics and System biology, working on how “certain biochemical network motifs are more attuned than others at maintaining strong input to output relationships under fluctuating conditions.”

• Petr Plechak – works on information-theoretic tools for estimating and minimizing errors in coarse-graining stochastic systems. Wrote “Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems”.

• Blake Polllard – physics grad student working with John Baez on various generalizations of Shannon and Renyi entropy, and how these entropies change with time in Markov processes and open Markov processes.

• Timothee Poisot – works on species interaction networks; developed a “new suite of tools for probabilistic interaction networks”.

• Richard Reeve – works on biodiversity studies and the spread of antibiotic resistance. Ran a program on entropy-based biodiversity measures at a mathematics institute in Barcelona.

• Rob Shaw – works on entropy and information in biotic and pre-biotic systems.

• Matteo Smerlak – postdoc working on nonequilibrium thermodynamics and its applications to biology, especially population biology and cell replication.

• Susanne Still – a computer scientist who studies the role of thermodynamics and information theory in prediction.

• Alexander Wissner-Gross – Institute Fellow at the Harvard University Institute for Applied Computational Science and Research Affiliate at the MIT Media Laboratory, interested in lots of things.

• David Wolpert – works at the Santa Fe Institute on i) information theory and game theory, ii) the second law of thermodynamics and dynamics of complexity, iii) multi-information source optimization, iv) the mathematical underpinnings of reality, v) evolution of organizations.

• Matthew Zefferman – works on evolutionary game theory, institutional economics and models of gene-culture co-evolution. No work on information, but a postdoc at NIMBioS.


Thermodynamics with Continuous Information Flow

21 March, 2015

guest post by Blake S. Pollard

Over a century ago James Clerk Maxwell created a thought experiment that has helped shape our understanding of the Second Law of Thermodynamics: the law that says entropy can never decrease.

Maxwell’s proposed experiment was simple. Suppose you had a box filled with an ideal gas at equilibrium at some temperature. You stick in an insulating partition, splitting the box into two halves. These two halves are isolated from one another except for one important caveat: somewhere along the partition resides a being capable of opening and closing a door, allowing gas particles to flow between the two halves. This being is also capable of observing the velocities of individual gas particles. Every time a particularly fast molecule is headed towards the door the being opens it, letting fly into the other half of the box. When a slow particle heads towards the door the being keeps it closed. After some time, fast molecules would build up on one side of the box, meaning half the box would heat up! To an observer it would seem like the box, originally at a uniform temperature, would for some reason start splitting up into a hot half and a cold half. This seems to violate the Second Law (as well as all our experience with boxes of gas).

Of course, this apparent violation probably has something to do with positing the existence of intelligent microscopic doormen. This being, and the thought experiment itself, are typically referred to as Maxwell’s demon.

Demon2
Photo credit: Peter MacDonald, Edmonds, UK

When people cook up situations that seem to violate the Second Law there is typically a simple resolution: you have to consider the whole system! In the case of Maxwell’s demon, while the entropy of the box decreases, the entropy of the system as a whole, demon include, goes up. Precisely quantifying how Maxwell’s demon doesn’t violate the Second Law has led people to a better understanding of the role of information in thermodynamics.

At the American Physical Society March Meeting in San Antonio, Texas, I had the pleasure of hearing some great talks on entropy, information, and the Second Law. Jordan Horowitz, a postdoc at Boston University, gave a talk on his work with Massimiliano Esposito, a researcher at the University of Luxembourg, on how one can understand situations like Maxwell’s demon (and a whole lot more) by analyzing the flow of information between subsystems.

Consider a system made up of two parts, X and Y. Each subsystem has a discrete set of states. Each systems makes transitions among these discrete states. These dynamics can be modeled as Markov processes. They are interested in modeling the thermodynamics of information flow between subsystems. To this end they consider a bipartite system, meaning that either X transitions or Y transitions, never both at the same time. The probability distribution p(x,y) of the whole system evolves according to the master equation:

\displaystyle{ \frac{dp(x,y)}{dt} = \sum_{x', y'} H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}p(x,y) }

where H_{x,x'}^{y,y'} is the rate at which the system transitions from (x',y') \to (x,y). The ‘bipartite’ condition means that H has the form

H_{x,x'}^{y,y'} = \left\{ \begin{array}{cc} H_{x,x'}^y & x \neq x'; y=y' \\   H_x^{y,y'} & x=x'; y \neq y' \\  0 & \text{otherwise.} \end{array} \right.

The joint system is an open system that satisfies the second law of thermodynamics:

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_{XY}}{dt} + \frac{dS_e}{dt} \geq 0 }

where

\displaystyle{ S_{XY} = - \sum_{x,y} p(x,y) \ln ( p(x,y) ) }

is the Shannon entropy of the system, satisfying

\displaystyle{ \frac{dS_{XY} }{dt} = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y}   p(x,y) \right] \ln \left( \frac{p(x',y')}{p(x,y)} \right) }

and

\displaystyle{ \frac{dS_e}{dt}  = \sum_{x,y} \left[ H_{x,x'}^{y,y'}p(x',y') - H_{x',x}^{y',y} p(x,y) \right] \ln \left( \frac{ H_{x,x'}^{y,y'} } {H_{x',x}^{y',y} } \right) }

is the entropy change of the environment.

We want to investigate how the entropy production of the whole system relates to entropy production in the bipartite pieces X and Y. To this end they define a new flow, the information flow, as the time rate of change of the mutual information

\displaystyle{ I = \sum_{x,y} p(x,y) \ln \left( \frac{p(x,y)}{p(x)p(y)} \right) }

Its time derivative can be split up as

\displaystyle{ \frac{dI}{dt} = \frac{dI^X}{dt} + \frac{dI^Y}{dt}}

where

\displaystyle{ \frac{dI^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^{y} p(x',y) - H_{x',x}^{y}p(x,y) \right] \ln \left( \frac{ p(y|x) }{p(y|x')} \right) }

and

\displaystyle{ \frac{dI^Y}{dt} = \sum_{x,y} \left[ H_{x}^{y,y'}p(x,y') - H_{x}^{y',y}p(x,y) \right] \ln \left( \frac{p(x|y)}{p(x|y')} \right) }

are the information flows associated with the subsystems X and Y respectively.

When

\displaystyle{ \frac{dI^X}{dt} > 0}

a transition in X increases the mutual information I, meaning that X ‘knows’ more about Y and vice versa.

We can rewrite the entropy production entering into the second law in terms of these information flows as

\displaystyle{ \frac{dS_i}{dt} = \frac{dS_i^X}{dt} + \frac{dS_i^Y}{dt} }

where

\displaystyle{ \frac{dS_i^X}{dt} = \sum_{x,y} \left[ H_{x,x'}^y p(x',y) - H_{x',x}^y p(x,y) \right] \ln \left( \frac{H_{x,x'}^y p(x',y) } {H_{x',x}^y p(x,y) } \right) \geq 0 }

and similarly for \frac{dS_Y}{dt} . This gives the following decomposition of entropy production in each subsystem:

\displaystyle{ \frac{dS_i^X}{dt} = \frac{dS^X}{dt} + \frac{dS^X_e}{dt} - \frac{dI^X}{dt} \geq 0 }

\displaystyle{ \frac{dS_i^Y}{dt} = \frac{dS^Y}{dt} + \frac{dS^X_e}{dt} - \frac{dI^Y}{dt} \geq 0},

where the inequalities hold for each subsystem. To see this, if you write out the left hand side of each inequality you will find that they are both of the form

\displaystyle{ \sum_{x,y} \left[ x-y \right] \ln \left( \frac{x}{y} \right) }

which is non-negative for x,y \geq 0.

The interaction between the subsystems is contained entirely in the information flow terms. Neglecting these terms gives rise to situations like Maxwell’s demon where a subsystem seems to violate the second law.

Lots of Markov processes have boring equilibria \frac{dp}{dt} = 0 where there is no net flow among the states. Markov processes also admit non-equilibrium steady states, where there may be some constant flow of information. In this steady state all explicit time derivatives are zero, including the net information flow:

\displaystyle{ \frac{dI}{dt} = 0 }

which implies that \frac{dI^X}{dt} = - \frac{dI^Y}{dt}. In this situation the above inequalities become

\displaystyle{ \frac{dS^X_i}{dt} = \frac{dS_e^X}{dt} - \frac{dI^X}{dt} }

and

\displaystyle{ \frac{dS^Y_i}{dt} = \frac{dS_e^X}{dt} + \frac{dI^X}{dt} }.

If

\displaystyle{ \frac{dI^X}{dt} > 0 }

then X is learning something about Y or acting as a sensor. The first inequality

\frac{dS_e^X}{dt} \geq \frac{dI^X}{dt} quantifies the minimum amount of energy X must supply to do this sensing. Similarly -\frac{dS_e^Y}{dt} \leq \frac{dI^X}{dt} bounds the amount of useful energy is available to Y as a result of this information transfer.

In their paper Horowitz and Esposito explore a few other examples and show the utility of this simple breakup of a system into two interacting subsystems in explaining various interesting situations in which the flow of information has thermodynamic significance.

For the whole story, read their paper!

• Jordan Horowitz and Massimiliano Esposito, Thermodynamics with continuous information flow, Phys. Rev. X 4 (2014), 031015.


A Second Law for Open Markov Processes

15 November, 2014

guest post by Blake Pollard

What comes to mind when you hear the term ‘random process’? Do you think of Brownian motion? Do you think of particles hopping around? Do you think of a drunkard staggering home?

Today I’m going to tell you about a version of the drunkard’s walk with a few modifications. Firstly, we don’t have just one drunkard: we can have any positive real number of drunkards. Secondly, our drunkards have no memory; where they go next doesn’t depend on where they’ve been. Thirdly, there are special places, such as entrances to bars, where drunkards magically appear and disappear.

The second condition says that our drunkards satisfy the Markov property, making their random walk into a Markov process. The third condition is really what I want to tell you about, because it makes our Markov process into a more general ‘open Markov process’.

There are a collection of places the drunkards can be, for example:

V= \{ \text{bar},\text{sidewalk}, \text{street}, \text{taco truck}, \text{home} \}

We call this set V the set of states. There are certain probabilities associated with traveling between these places. We call these transition rates. For example it is more likely for a drunkard to go from the bar to the taco truck than to go from the bar to home so the transition rate between the bar and the taco truck should be greater than the transition rate from the bar to home. Sometimes you can’t get from one place to another without passing through intermediate places. In reality the drunkard can’t go directly from the bar to the taco truck: he or she has to go from the bar to sidewalk to the taco truck.

This information can all be summarized by drawing a directed graph where the positive numbers labelling the edges are the transition rates:

For simplicity we draw only three states: home, bar, taco truck. Drunkards go from home to the bar and back, but they never go straight from home to the taco truck.

We can keep track of where all of our drunkards are using a vector with 3 entries:

\displaystyle{ p(t) = \left( \begin{array}{c} p_h(t) \\ p_b(t) \\ p_{tt}(t) \end{array} \right) \in \mathbb{R}^3 }

We call this our population distribution. The first entry p_h is the number of drunkards that are at home, the second p_b is how many are at the bar, and the third p_{tt} is how many are at the taco truck.

There is a set of coupled, linear, first-order differential equations we can write down using the information in our graph that tells us how the number of drunkards in each place change with time. This is called the master equation:

\displaystyle{ \frac{d p}{d t} = H p }

where H is a 3×3 matrix which we call the Hamiltonian. The off-diagonal entries are nonnegative:

H_{ij} \geq 0, i \neq j

and the columns sum to zero:

\sum_i H_{ij}=0

We call a matrix satisfying these conditions infinitesimal stochastic. Stochastic matrices have columns that sum to one. If we take the exponential of an infinitesimal stochastic matrix we get one whose columns sum to one, hence the label ‘infinitesimal’.

The Hamiltonian for the graph above is

H = \left( \begin{array}{ccc} -2 & 5 & 10 \\ 2 & -12 & 0 \\ 0 & 7 & -10 \end{array} \right)

John has written a lot about Markov processes and infinitesimal stochastic Hamiltonians in previous posts.

Given two vectors p,q \in \mathbb{R}^3 describing the populations of drunkards which obey the same master equation, we can calculate the relative entropy of p relative to q:

\displaystyle{ S(p,q) = \sum_{ i \in V} p_i \ln \left( \frac{p_i}{q_i} \right) }

This is an example of a ‘divergence’. In statistics, a divergence a way of measuring the distance between probability distributions, which may not be symmetrical and may even not obey the triangle inequality.

The relative entropy is important because it decreases monotonically with time, making it a Lyapunov function for Markov processes. Indeed, it is a well known fact that

\displaystyle{ \frac{dS(p(t),q(t) ) } {dt} \leq 0 }

This is true for any two population distributions which evolve according to the same master equation, though you have to allow infinity as a possible value for the relative entropy and negative infinity for its time derivative.

Why is entropy decreasing? Doesn’t the Second Law of Thermodynamics say entropy increases?

Don’t worry: the reason is that I have not put a minus sign in my definition of relative entropy. Put one in if you like, and then it will increase. Sometimes without the minus sign it’s called the Kullback–Leibler divergence. This decreases with the passage of time, saying that any two population distributions p(t) and q(t) get ‘closer together’ as they get randomized with the passage of time.

That itself is a nice result, but I want to tell you what happens when you allow drunkards to appear and disappear at certain states. Drunkards appear at the bar once they’ve had enough to drink and once they are home for long enough they can disappear. The set of places where drunkards can appear or disappear B is called the set of boundary states.  So for the above process

B = \{ \text{home},\text{bar} \}

is the set of boundary states. This changes the way in which the population of drunkards changes with time!

The drunkards at the taco truck obey the master equation. For them,

\displaystyle{ \frac{dp_{tt}}{dt} = 7p_b -10 p_{tt} }

still holds. But because the populations can appear or disappear at the boundary states the master equation no longer holds at those states! Instead it is useful to define the flow of drunkards into the i^{th} state by

\displaystyle{ \frac{Dp_i}{Dt} = \frac{dp_i}{dt}-\sum_j H_{ij} p_j}

This quantity describes by how much the rate of change of the populations at the boundary states differ from that given by the master equation.

The reason why we are interested in open Markov processes is because you can take two open Markov processes and glue them together along some subset of their boundary states to get a new open Markov process! This allows us to build up or break down complicated Markov processes using open Markov processes as the building blocks.

For example we can draw the graph corresponding to the drunkards’ walk again, only now we will distinguish boundary states from internal states by coloring internal states blue and having boundary states be white:

Consider another open Markov process with states

V=\{ \text{home},\text{work},\text{bar} \}

where

B=\{ \text{home}, \text{bar}\}

are the boundary states, leaving

I=\{\text{work}\}

as an internal state:

Since the boundary states of this process overlap with the boundary states of the first process we can compose the two to form a new Markov process:

Notice the boundary states are now internal states. I hope any Markov process that could approximately model your behavior has more interesting nodes! There is a nice way to figure out the Hamiltonian of the composite from the Hamiltonians of the pieces, but we will leave that for another time.

We can ask ourselves, how does relative entropy change with time in open Markov processes? You can read my paper for the details, but here is the punchline:

\displaystyle{ \frac{dS(p(t),q(t) ) }{dt} \leq \sum_{i \in B} \frac{Dp_i}{Dt}\frac{\partial S}{\partial p_i} + \frac{Dq_i}{Dt}\frac{\partial S}{\partial q_i} }

This is a version of the Second Law of Thermodynamics for open Markov processes.

It is important to notice that the sum is only over the boundary states! This inequality tells us that relative entropy still decreases inside our process, but depending on the flow of populations through the boundary states the relative entropy of the whole process could either increase or decrease! This inequality will be important when we study how the relative entropy changes in different parts of a bigger more complicated process.

That is all for now, but I leave it as an exercise for you to imagine a Markov process that describes your life. How many states does it have? What are the relative transition rates? Are there states you would like to spend more or less time in? Are there states somewhere you would like to visit?

Here is my paper, which proves the above inequality:

• Blake Pollard, A Second Law for open Markov processes.

If you have comments or corrections, let me know!


Sensing and Acting Under Information Constraints

30 October, 2014

I’m having a great time at a workshop on Biological and Bio-Inspired Information Theory in Banff, Canada. You can see videos of the talks online. There have been lots of good talks so far, but this one really blew my mind:

• Naftali Tishby, Sensing and acting under information constraints—a principled approach to biology and intelligence, 28 October 2014.

Tishby’s talk wasn’t easy for me to follow—he assumed you already knew rate-distortion theory and the Bellman equation, and I didn’t—but it was great!

It was about the ‘action-perception loop':


This is the feedback loop in which living organisms—like us—take actions depending on our goals and what we perceive, and perceive things depending on the actions we take and the state of the world.

How do we do this so well? Among other things, we need to balance the cost of storing information about the past against the payoff of achieving our desired goals in the future.

Tishby presented a detailed yet highly general mathematical model of this! And he ended by testing the model on experiments with cats listening to music and rats swimming to land.

It’s beautiful stuff. I want to learn it. I hope to blog about it as I understand more. But for now, let me just dive in and say some basic stuff. I’ll start with the two buzzwords I dropped on you. I hate it when people use terminology without ever explaining it.

Rate-distortion theory

Rate-distortion theory is a branch of information theory which seeks to find the minimum rate at which bits must be communicated over a noisy channel so that the signal can be approximately reconstructed at the other end without exceeding a given distortion. Shannon’s first big result in this theory, the ‘rate-distortion theorem’, gives a formula for this minimum rate. Needless to say, it still requires a lot of extra work to determine and achieve this minimum rate in practice.

For the basic definitions and a statement of the theorem, try this:

• Natasha Devroye, Rate-distortion theory, course notes, University of Chicago, Illinois, Fall 2009.

One of the people organizing this conference is a big expert on rate-distortion theory, and he wrote a book about it.

• Toby Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression, Prentice–Hall, 1971.

Unfortunately it’s out of print and selling for $259 used on Amazon! An easier option might be this:

• Thomas M. Cover and Joy A. Thomas, Elements of Information Theory, Chapter 10: Rate Distortion Theory, Wiley, New York, 2006.

The Bellman equation

The Bellman equation reduces the task of finding an optimal course of action to choosing what to do at each step. So, you’re trying to maximize the ‘total reward’—the sum of rewards at each time step—and Bellman’s equation says what to do at each time step.

If you’ve studied physics, this should remind you of how starting from the principle of least action, we can get a differential equation describing the motion of a particle: the Euler–Lagrange equation.

And in fact they’re deeply related. The relation is obscured by two little things. First, Bellman’s equation is usually formulated in a context where time passes in discrete steps, while the Euler–Lagrange equation is usually formulated in continuous time. Second, Bellman’s equation is really the discrete-time version not of the Euler–Lagrange equation but a more or less equivalent thing: the ‘Hamilton–Jacobi equation’.

Ah, another buzzword to demystify! I was scared of the Hamilton–Jacobi equation for years, until I taught a course on classical mechanics that covered it. Now I think it’s the greatest thing in the world!

Briefly: the Hamilton–Jacobi equation concerns the least possible action to get from a fixed starting point to a point q in space at time t. If we call this least possible action W(t,q), the Hamilton–Jacobi equation says

\displaystyle{ \frac{\partial W(t,q)}{\partial q_i} = p_i  }

\displaystyle{ \frac{\partial W(t,q)}{\partial t} = -E  }

where p is the particle’s momentum at the endpoint of its path, and E is its energy there.

If we replace derivatives by differences, and talk about maximizing total reward instead of minimizing action, we get Bellman’s equation:

Bellman equation, Wikipedia.

Markov decision processes

Bellman’s equation can be useful whenever you’re trying to figure out an optimal course of action. An important example is a ‘Markov decision process’. To prepare you for Tishby’s talk, I should say what this is.

In a Markov process, something randomly hops around from state to state with fixed probabilities. In the simplest case there’s a finite set S of states, and time proceeds in discrete steps. At each time step, the probability to hop from state s to state s' is some fixed number P(s,s').

This sort of thing is called a Markov chain, or if you feel the need to be more insistent, a discrete-time Markov chain.

A Markov decision process is a generalization where an outside agent gets to change these probabilities! The agent gets to choose actions from some set A. If at a given time he chooses the action \alpha \in A, the probability of the system hopping from state s to state s' is P_\alpha(s,s'). Needless to say, these probabilities have to sum to one for any fixed s.

That would already be interesting, but the real fun is that there’s also a reward R_\alpha(s,s'). This is a real number saying how much joy or misery the agent experiences if he does action \alpha and the system hops from s to s'.

The problem is to choose a policy—a function from states to actions—that maximizes the total expected reward over some period of time. This is precisely the kind of thing Bellman’s equation is good for!

If you’re an economist you might also want to ‘discount’ future rewards, saying that a reward n time steps in the future gets multiplied by \gamma^n, where 0 < \gamma \le 1 is some discount factor. This extra tweak is easily handled, and you can see it all here:

Markov decision process, Wikipedia.

Partially observable Markov decision processes

There’s a further generalization where the agent can’t see all the details of the system! Instead, when he takes an action \alpha \in A and the system hops from state s to state s', he sees something: a point in some set O of observations. He makes the observation o \in O with probability \Omega_\alpha(o,s').

(I don’t know why this probability depends on s' but not s. Maybe it ultimately doesn’t matter much.)

Again, the goal is to choose a policy that maximizes the expected total reward. But a policy is a bit different now. The action at any time can only depend on all the observations made thus far.

Partially observable Markov decision processes are also called POMPDs. If you want to learn about them, try these:

Partially observable Markov decision process, Wikipedia.

• Tony Cassandra, Partially observable Markov decision processes.

The latter includes an introduction without any formulas to POMDPs and how to choose optimal policies. I’m not sure who would study this subject and not want to see formulas, but it’s certainly a good exercise to explain things using just words—and it makes certain things easier to understand (though not others, in a way that depends on who is trying to learn the stuff).

The action-perception loop

I already explained the action-perception loop, with the help of this picture from the University of Bielefeld’s Department of Cognitive Neuroscience:


Nafthali Tishby has a nice picture of it that’s more abstract:

We’re assuming time comes in discrete steps, just to keep things simple.

At each time t

• the world has some state W_t, and
• the agent has some state M_t.

Why the letter M? This stands for memory: it can be the state of the agent’s memory, but I prefer to think of it as the state of the agent.

At each time

• the agent takes an action A_t, which affects the world’s next state, and

• the world provides a sensation S_t to the agent, which affect’s the agent’s next state.

This is simplified but very nice. Note that there’s a symmetry interchanging the world and the agent!

We could make it fancier by having lots of agents who all interact, but there are a lot of questions already. The big question Tishby focuses on is optimizing how much the agent should remember about the past if they

• get a reward depending on the action taken and the resulting state of the world

but

• pay a price for the information stored from sensations.

Tishby formulates this optimization question as something like a partially observed Markov decision process, uses rate-distortion theory to analyze how much information needs to be stored to achieve a given reward, and uses Bellman’s equation to solve the optimization problem!

So, everything I sketched fits together somehow!

I hope what I’m saying now is roughly right: it will take me more time to get the details straight. If you’re having trouble absorbing all the information I just threw at you, don’t feel bad: so am I. But the math feels really natural and good to me. It involves a lot of my favorite ideas (like generalizations of the principle of least action, and relative entropy), and it seems ripe to be combined with network theory ideas.

For details, I highly recommend this paper:

• Naftali Tishby and Daniel Polani, Information theory of decisions and actions, in Perception-Reason-Action Cycle: Models, Algorithms and System. Vassilis, Hussain and Taylor, Springer, Berlin, 2010.

I’m going to print this out, put it by my bed, and read it every night until I’ve absorbed it.


Biodiversity, Entropy and Thermodynamics

27 October, 2014

 

I’m giving a short 30-minute talk at a workshop on Biological and Bio-Inspired Information Theory at the Banff International Research Institute.

I’ll say more about the workshop later, but here’s my talk, in PDF and video form:

Biodiversity, entropy and thermodynamics.

Most of the people at this workshop study neurobiology and cell signalling, not evolutionary game theory or biodiversity. So, the talk is just a quick intro to some things we’ve seen before here. Starting from scratch, I derive the Lotka–Volterra equation describing how the distribution of organisms of different species changes with time. Then I use it to prove a version of the Second Law of Thermodynamics.

This law says that if there is a ‘dominant distribution’—a distribution of species whose mean fitness is at least as great as that of any population it finds itself amidst—then as time passes, the information any population has ‘left to learn’ always decreases!

Of course reality is more complicated, but this result is a good start.

This was proved by Siavash Shahshahani in 1979. For more, see:

• Lou Jost, Entropy and diversity.

• Marc Harper, The replicator equation as an inference dynamic.

• Marc Harper, Information geometry and evolutionary game theory.

and more recent papers by Harper.


Follow

Get every new post delivered to your Inbox.

Join 3,008 other followers