Abstract. If biology is the study of self-replicating entities, and we want to understand the role of information, it makes sense to see how information theory is connected to the ‘replicator equation’ — a simple model of population dynamics for self-replicating entities. The relevant concept of information turns out to be the information of one probability distribution relative to another, also known as the Kullback–Liebler divergence. Using this we can get a new outlook on free energy, see evolution as a learning process, and give a clearer, more general formulation of Fisher’s fundamental theorem of natural selection.

I’d given a version of this talk earlier this year at a workshop on Quantifying biological complexity, but I’m glad this second try got videotaped and not the first, because I was a lot happier about my talk this time. And as you’ll see at the end, there were a lot of interesting questions.

On the slide showing the proportionality of the probability of a particle having an energy E_i, is that correct?

You had:

but then the unit in the expression would be the exponential of entropy squared. Shouldn’t it rather be:

to make it unit free?

Isn’t it pretty much always the case that the mathematically simplest situations are those that are the most symmetric and finely tuned?

So this all works out to a lot of evolutionary pressure meaning quick fixes are to be found. Those fixes may not be globally optimal, they may have problems, but they could be sufficient to give you an edge. And only if the situation “cools down” do you actually have the time thinking about all this in more detail so you can actually fix and improve things much more easily. However, if things are too cool / there is almost no evolutionary pressure, you will likely also lose interest and nothing is evolved at all, right?

I think that should be relevant to innovation theory: Is there some sort of “optimal temperature” at which innovation is ideally balanced between finding new stuff and optimizing old stuff? – Or if not that, is there an “optimal temperature distribution” (meaning it’s allowed to fluctuate, but in a specific way)? And if there is, could we approximate it for real life, facing a lack of information? (Like, for instance, that we don’t and can’t know what there is left to be invented, or that it’s hard to even properly quantify innovation in retrospect, let alone in the moment.)

One really basic question about the relative information :

You picked your examples so that this came out as some multiple of log(2), which you interpreted a “bit” of gained information. But, what if I(q,p) isn’t just an (integer) number of bits of this size?

For example, suppose you roll a 6-sided die, and then tell me you didn’t roll a “6”. I’ll update my prior from the uniform distribution to a distribution that’s uniform on the outcomes “1” through “5”, and zero for the outcome “6”. This seems to give me

but how am I to interpret this?

I suppose the “bits” just have a different size in this situation? Maybe that’s the point of information being “relative”? Is there a systematic way of figuring out what constitutes one bit of relative information in a given situation? Or, is it rather that this kind of information doesn’t actually come in discrete chunks?

Hi! Glad you liked my talk! Great to hear from you! I really miss talking to you. Someday we should finish that book on classical mechanics.

Despite what computer scientists seem to think, there’s no reason to think information comes in integer multiples of bits. For example, if we transmit data in base 3, it’s easy to transmit a trit of information, which is log_{2} 3 ≈ 1.585 bits.

You’ll note I cleverly didn’t choose a base for my logarithms near the start of my talk; this is why. In this setup, log 2 of information is one bit of information regardless of the base of the logarithm. The only reason I did calculations where the answers came out to be integer multiples of a bit is to make it easy for people to follow the calculation. Perhaps this was misleading!

Later, when talking about physics, I switched to using logarithms base e. Then information gets measured in nats, which I wish were called ‘nits’.

I miss talking to you too! I’ve got lots of stuff I’d love to talk about, including that book.

I agree that a “bit” is not really the smallest amount of information in any fundamental sense. It’s more like the smallest kind of question you can ask to get some information — the answer to a yes/no question. How much information is contained in the answer to a yes/no question depends a lot on the question.

However, your talk got me wondering whether, in some finite universe, there really can be a “smallest possible” question that could be asked, and thus a basic unit of information in that universe. I’m so far just idly wondering, and haven’t tried working it out; maybe it’s obviously nonsense…

I think there can only be a smallest nonzero amount of information if there’s a smallest nonzero probability: if you believe you have a fair n-sided die and I tell you the first side has not landed up, you’ve received

bits of information, and this can be arbitrarily small if can be arbitrarily large.

I doubt we can build a fair die with a googolplex sides, but I also don’t feel I’ll learn much about information theory by pondering this issue—at least, I won’t learn much very soon.

Also … The lizards are interesting! I hadn’t known about this phenomenon, but it immediately made me think of rock-paper-scissors, too, at least initially.

Then I happened to also be reading this post on Tim Gowers’s blog today, and realized that the lizards are more like intransitive dice!

This is like a probabilistic version of rock-paper-scissors in which, for example, rocks usually smash scissors, but sometimes the scissors manage to cut up a rock.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it. Cancel reply

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.

On the slide showing the proportionality of the probability of a particle having an energy E_i, is that correct?

You had:

but then the unit in the expression would be the exponential of entropy squared. Shouldn’t it rather be:

to make it unit free?

Isn’t it pretty much always the case that the mathematically simplest situations are those that are the most symmetric and finely tuned?

So this all works out to a lot of evolutionary pressure meaning quick fixes are to be found. Those fixes may not be globally optimal, they may have problems, but they could be sufficient to give you an edge. And only if the situation “cools down” do you actually have the time thinking about all this in more detail so you can actually fix and improve things much more easily. However, if things are

toocool / there is almost no evolutionary pressure, you will likely also lose interest and nothing is evolved at all, right?I think that should be relevant to innovation theory: Is there some sort of “optimal temperature” at which innovation is ideally balanced between finding new stuff and optimizing old stuff? – Or if not that, is there an “optimal temperature distribution” (meaning it’s allowed to fluctuate, but in a specific way)? And if there is, could we approximate it for real life, facing a lack of information? (Like, for instance, that we don’t and can’t know what there is left to be invented, or that it’s hard to even properly quantify innovation in retrospect, let alone in the moment.)

Really nice talk!

Kram wrote:

Yes. Everyone knows this! I can’t believe I made such a typo and nobody pointed it out during my talk! I’ll check, and fix it on my slides if needed.

I’ll reply to your less distressing comments later.

Nice talk, John!

One really basic question about the relative information :

You picked your examples so that this came out as some multiple of log(2), which you interpreted a “bit” of gained information. But, what if I(q,p) isn’t just an (integer) number of bits of this size?

For example, suppose you roll a 6-sided die, and then tell me you didn’t roll a “6”. I’ll update my prior from the uniform distribution to a distribution that’s uniform on the outcomes “1” through “5”, and zero for the outcome “6”. This seems to give me

but how am I to interpret this?

I suppose the “bits” just have a different size in this situation? Maybe that’s the point of information being “relative”? Is there a systematic way of figuring out what constitutes one bit of relative information in a given situation? Or, is it rather that this kind of information doesn’t actually come in discrete chunks?

Hi! Glad you liked my talk! Great to hear from you! I really miss talking to you. Someday we should finish that book on classical mechanics.

Despite what computer scientists seem to think, there’s no reason to think information comes in integer multiples of bits. For example, if we transmit data in base 3, it’s easy to transmit a

tritof information, which is log_{2}3 ≈ 1.585 bits.You’ll note I cleverly didn’t choose a base for my logarithms near the start of my talk; this is why. In this setup, log 2 of information is one bit of information regardless of the base of the logarithm. The only reason I did calculations where the answers came out to be integer multiples of a bit is to make it easy for people to follow the calculation. Perhaps this was misleading!

Later, when talking about physics, I switched to using logarithms base e. Then information gets measured in

nats, which I wish were called ‘nits’.By the way, base 3 seems to have certain information-theoretic advantages over all other integer bases, coming from the fact that the closest integer to e is 3.

I miss talking to you too! I’ve got lots of stuff I’d love to talk about, including that book.

I agree that a “bit” is not really the smallest amount of information in any fundamental sense. It’s more like the smallest kind of question you can ask to get some information — the answer to a yes/no question. How much information is contained in the

answerto a yes/no question depends a lot on the question.However, your talk got me wondering whether, in some finite universe, there really can be a “smallest possible” question that could be asked, and thus a basic unit of information in that universe. I’m so far just idly wondering, and haven’t tried working it out; maybe it’s obviously nonsense…

I think there can only be a smallest nonzero amount of information if there’s a smallest nonzero probability: if you believe you have a fair n-sided die and I tell you the first side has not landed up, you’ve received

bits of information, and this can be arbitrarily small if can be arbitrarily large.

I doubt we can build a fair die with a googolplex sides, but I also don’t feel I’ll learn much about information theory by pondering this issue—at least, I won’t learn much very soon.

Also … The lizards are interesting! I hadn’t known about this phenomenon, but it immediately made me think of rock-paper-scissors, too, at least initially.

Then I happened to also be reading this post on Tim Gowers’s blog today, and realized that the lizards are more like

intransitive dice!This is like a probabilistic version of rock-paper-scissors in which, for example, rocks

usuallysmash scissors, but sometimes the scissors manage to cut up a rock.Cool! Intransitive lizards!