## Entropy 2018

6 July, 2017

The editors of the journal Entropy are organizing this conference:

Entropy 2018 — From Physics to Information Sciences and Geometry, 14–16 May 2018, Auditorium Enric Casassas, Faculty of Chemistry, University of Barcelona, Barcelona, Spain.

They write:

One of the most frequently used scientific words is the word “entropy”. The reason is that it is related to two main scientific domains: physics and information theory. Its origin goes back to the start of physics (thermodynamics), but since Shannon, it has become related to information theory. This conference is an opportunity to bring researchers of these two communities together and create a synergy. The main topics and sessions of the conference cover:

• Physics: classical and quantum thermodynamics
• Statistical physics and Bayesian computation
• Geometrical science of information, topology and metrics
• Maximum entropy principle and inference
• Kullback and Bayes or information theory and Bayesian inference
• Entropy in action (applications)

The inter-disciplinary nature of contributions from both theoretical and applied perspectives are very welcome, including papers addressing conceptual and methodological developments, as well as new applications of entropy and information theory.

All accepted papers will be published in the proceedings of the conference. A selection of invited and contributed talks presented during the conference will be invited to submit an extended version of their paper for a special issue of the open access journal Entropy.

## Information Processing in Chemical Networks (Part 2)

13 June, 2017

Dynamics, Thermodynamics and Information Processing in Chemical Networks, 13-16 June 2017, Complex Systems and Statistical Mechanics Group, University of Luxembourg. Organized by Massimiliano Esposito and Matteo Polettini.

I’ll do it in the comments!

I explained the idea of this workshop here:

and now you can see the program here.

## The Mathematics of Open Reaction Networks

8 June, 2017

Next week, Blake Pollard and I will talk about our work on reaction networks. We’ll do this at Dynamics, Thermodynamics and Information Processing in Chemical Networks, a workshop at the University of Luxembourg organized by Massimiliano Esposito and Matteo Polettini. We’ll do it on Tuesday, 13 June 2017, from 11:00 to 13:00, in room BSC 3.03 of the Bâtiment des Sciences. If you’re around, please stop by and say hi!

Here are the slides for my talk:

Abstract. To describe systems composed of interacting parts, scientists and engineers draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Feynman diagrams and the like. In principle all these different diagrams fit into a common framework: the mathematics of monoidal categories. This has been known for some time. However, the details are more challenging, and ultimately more rewarding, than this basic insight. Here we explain how various applications of reaction networks and Petri nets fit into this framework.

If you see typos or other problems please let me know now!

I hope to blog a bit about the workshop… it promises to be very interesting.

## Biology as Information Dynamics (Part 2)

27 April, 2017

Here’s a video of the talk I gave at the Stanford Complexity Group:

You can see slides here:

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

I’d given a version of this talk earlier this year at a workshop on Quantifying biological complexity, but I’m glad this second try got videotaped and not the first, because I was a lot happier about my talk this time. And as you’ll see at the end, there were a lot of interesting questions.

## Stanford Complexity Group

19 April, 2017

Aaron Goodman of the Stanford Complexity Group invited me to give a talk there on Thursday April 20th. If you’re nearby—like in Silicon Valley—please drop by! It will be in Clark S361 at 4:20 pm.

Here’s the idea. Everyone likes to say that biology is all about information. There’s something true about this—just think about DNA. But what does this insight actually do for us, quantitatively speaking? To figure this out, we need to do some work.

Biology is also about things that make copies of themselves. So it makes sense to figure out how information theory is connected to the replicator equation—a simple model of population dynamics for self-replicating entities.

To see the connection, we need to use ‘relative information’: the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Then everything pops into sharp focus.

It turns out that free energy—energy in forms that can actually be used, not just waste heat—is a special case of relative information Since the decrease of free energy is what drives chemical reactions, biochemistry is founded on relative information.

But there’s a lot more to it than this! Using relative information we can also see evolution as a learning process, fix the problems with Fisher’s fundamental theorem of natural selection, and more.

So this what I’ll talk about! You can see my slides here:

• John Baez, Biology as information dynamics.

but my talk will be videotaped, and it’ll eventually be put here:

You can already see lots of cool talks at this location!

## Information Geometry (Part 16)

1 February, 2017

This week I’m giving a talk on biology and information:

• John Baez, Biology as information dynamics, talk for Biological Complexity: Can it be Quantified?, a workshop at the Beyond Center, 2 February 2017.

While preparing this talk, I discovered a cool fact. I doubt it’s new, but I haven’t exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of ‘Fisher’s fundamental theorem of natural selection’. I won’t start by explaining that theorem, since my version looks rather different than Fisher’s, and I came up with mine precisely because I had trouble understanding his. I’ll say a bit more about this at the end.

Here’s my version:

The square of the rate at which a population learns information is the variance of its fitness.

This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!

### The setup

Let’s start by assuming we have $n$ different kinds of self-replicating entities with populations $P_1, \dots, P_n.$ As usual, these could be all sorts of things:

• molecules of different chemicals
• organisms belonging to different species
• genes of different alleles
• restaurants belonging to different chains
• people with different beliefs
• game-players with different strategies
• etc.

I’ll call them replicators of different species.

Let’s suppose each population $P_i$ is a function of time that grows at a rate equal to this population times its ‘fitness’. I explained the resulting equation back in Part 9, but it’s pretty simple:

$\displaystyle{ \frac{d}{d t} P_i(t) = f_i(P_1(t), \dots, P_n(t)) \, P_i(t) }$

Here $f_i$ is a completely arbitrary smooth function of all the populations! We call it the fitness of the ith species.

This equation is important, so we want a short way to write it. I’ll often write $f_i(P_1(t), \dots, P_n(t))$ simply as $f_i,$ and $P_i(t)$ simply as $P_i.$ With these abbreviations, which any red-blooded physicist would take for granted, our equation becomes simply this:

$\displaystyle{ \frac{dP_i}{d t} = f_i \, P_i }$

Next, let $p_i(t)$ be the probability that a randomly chosen organism is of the ith species:

$\displaystyle{ p_i(t) = \frac{P_i(t)}{\sum_j P_j(t)} }$

Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:

$\displaystyle{ \frac{d}{d t} p_i(t) = ( f_i - \langle f \rangle ) \, p_i(t) }$

Here $\langle f \rangle$ is the average fitness of all the replicators, or mean fitness:

$\displaystyle{ \langle f \rangle = \sum_j f_j(P_1(t), \dots, P_n(t)) \, p_j(t) }$

In what follows I’ll abbreviate the replicator equation as follows:

$\displaystyle{ \frac{dp_i}{d t} = ( f_i - \langle f \rangle ) \, p_i }$

### The result

Okay, now let’s figure out how fast the probability distribution

$p(t) = (p_1(t), \dots, p_n(t))$

changes with time. For this we need to choose a way to measure the length of the vector

$\displaystyle{ \frac{dp}{dt} = (\frac{d}{dt} p_1(t), \dots, \frac{d}{dt} p_n(t)) }$

And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.

I’ve talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution $p(t)$ changes with time, its speed

$\displaystyle{ \left\| \frac{dp}{dt} \right\|}$

as measured using the Fisher information metric can be seen as the rate at which information is learned. I’ll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose $v$ and $w$ are two tangent vectors to the point $p$ in the space of probability distributions. Then the Fisher information metric is given as follows:

$\displaystyle{ \langle v, w \rangle = \sum_i \frac{1}{p_i} \, v_i w_i }$

Using this we can calculate the speed at which $p(t)$ moves when it obeys the replicator equation. Actually the square of the speed is simpler:

$\begin{array}{ccl} \displaystyle{ \left\| \frac{dp}{dt} \right\|^2 } &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( \frac{dp_i}{dt} \right)^2 } \\ \\ &=& \displaystyle{ \sum_i \frac{1}{p_i} \left( ( f_i - \langle f \rangle ) \, p_i \right)^2 } \\ \\ &=& \displaystyle{ \sum_i ( f_i - \langle f \rangle )^2 p_i } \end{array}$

The answer has a nice meaning, too! It’s just the variance of the fitness: that is, the square of its standard deviation.

So, if you’re willing to buy my claim that the speed $\|dp/dt\|$ is the rate at which our population learns new information, then we’ve seen that the square of the rate at which a population learns information is the variance of its fitness!

### Fisher’s fundamental theorem

Now, how is this related to Fisher’s fundamental theorem of natural selection? First of all, what is Fisher’s fundamental theorem? Here’s what Wikipedia says about it:

It uses some mathematical notation but is not a theorem in the mathematical sense.

It states:

“The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.”

Or in more modern terminology:

“The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time”.

Largely as a result of Fisher’s feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher’s theorem was indeed correct (and that Fisher’s proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection

Price’s paper is here:

• George R. Price, Fisher’s ‘fundamental theorem’ made clear, Annals of Human Genetics 36 (1972), 129–140.

I don’t find it very clear, perhaps because I didn’t spend enough time on it. But I think I get the idea.

My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher’s original equation: the variance of the fitness.

But my left-hand side is different: it’s the square of the speed of the corresponding probability distribution, where speed is measured using the ‘Fisher information metric’. This metric was discovered by the same guy, Ronald Fisher, but I don’t think he used it in his work on the fundamental theorem!

Something a bit similar to my statement appears as Theorem 2 of this paper:

• Marc Harper, Information geometry and evolutionary game theory.

and for that theorem he cites:

• Josef Hofbauer and Karl Sigmund, Evolutionary Games and Population Dynamics, Cambridge University Press, Cambridge, 1998.

However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher’s fundamental theorem. Moreover, he assumes that the probability distribution $p(t)$ flows along the gradient of a function, and I’m not assuming that. Indeed, my version applies to situations where the probability distribution moves round and round in periodic orbits!

### Relative information and the Fisher information metric

The key to generalizing Fisher’s fundamental theorem is thus to focus on the speed at which $p(t)$ moves, rather than the increase in fitness. Why do I call this speed the ‘rate at which the population learns information’? It’s because we’re measuring this speed using the Fisher information metric, which is closely connected to relative information, also known as relative entropy or the Kullback–Leibler divergence.

I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here’s a faster one, which I created while preparing my talk.

The information of a probability distribution $q$ relative to a probability distribution $p$ is

$\displaystyle{ I(q,p) = \sum_{i =1}^n q_i \log\left(\frac{q_i}{p_i}\right) }$

It says how much information you learn if you start with a hypothesis $p$ saying that the probability of the ith situation was $p_i,$ and then update this to a new hypothesis $q.$

Now suppose you have a hypothesis that’s changing with time in a smooth way, given by a time-dependent probability $p(t).$ Then a calculation shows that

$\displaystyle{ \left.\frac{d}{dt} I(p(t),p(t_0)) \right|_{t = t_0} = 0 }$

for all times $t_0$. This seems paradoxical at first. I like to jokingly put it this way:

To first order, you’re never learning anything.

However, as long as the velocity $\frac{d}{dt}p(t_0)$ is nonzero, we have

$\displaystyle{ \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} > 0 }$

so we can say

To second order, you’re always learning something… unless your opinions are fixed.

This lets us define a ‘rate of learning’—that is, a ‘speed’ at which the probability distribution $p(t)$ moves. And this is precisely the speed given by the Fisher information metric!

In other words:

$\displaystyle{ \left\|\frac{dp}{dt}(t_0)\right\|^2 = \left.\frac{d^2}{dt^2} I(p(t),p(t_0)) \right|_{t = t_0} }$

where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.

In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a ‘rate at which information is learned’, we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it’s evolving according to the replicator equation, this speed is also just the variance of the fitness!

## Biology as Information Dynamics (Part 1)

31 January, 2017

This is my talk for the workshop Biological Complexity: Can It Be Quantified?

• John Baez, Biology as information dynamics, 2 February 2017.

Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’—a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Leibler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clean general formulation of Fisher’s fundamental theorem of natural selection.