That explanation matches up with the information geometry formulation, where a relationship between a derivative of a mean of a function and its variance is given without a dynamic being involved, just the Fisher geometry of the simplex. In [1] Amari shows the following.

Let be a finite set of size . We’ll equate the manifold of discrete probability distributions with the simplex , where the elements of correspond to the obvious corner points of the simplex. So

Let be a function and denote the map computing the mean of at , i.e. . Then the variance is

where the norm is induced by the Fisher metric.

The proof basically boils down to that is the gradient of at , and then the norm from the Fisher metric gives

That’s basically your explanation of being the gradient of if were the mean fitness — if the are all constants, then

So then this version of FFT comes down, as you said above, to the replicator equation (sometimes) being the gradient flow of the mean fitness on the simplex.

[1] Amari, *Methods of Information Geometry*, 1995, p. 42.

Thanks, Marc! I think I finally get it. Hofbauer and Sigmund are really talking about a conceptually different replicator equation in Theorem 19.5.1 of *Evolutionary Games and Population Dynamics*—different than the one I’ve been talking about, that is. And thanks to you I actually understand this other equation.

We could say it like this: we don’t have populations, we just have a time-dependent probability distribution We can think of the simplex as sitting in in the usual way, and pick a function defined on a neighborhood of and demand this version of the replicator equation:

where

and

The vector field with

is tangent to the simplex, because is the gradient of with its normal component subtracted out. So our replicator equation

describes a point moving around on the simplex.

In fact is just the gradient of *restricted to the simplex*, and the replicator equation describes gradient flow *on the simplex*.

The weird thing, from this point of view, is why we’re bothering to do such a roundabout procedure. All we needed to get our replicator equation was a function on the simplex. But what we do is this: arbitrarily extend to a neighborhood of the simplex in then take its gradient to get a vector field on that neighborhood, and then subtract off the normal component and restrict the result to the simplex to get our vector field

The only reason for doing this that I see is that

looks like “excess fitness”, so in the end we get a result reminiscent of Fisher’s fundamental theorem. But we don’t really have any populations, just probabilities, so the functions really don’t have any significance by themselves anymore: only the do.

]]>(1) As you point out, we’re talking about two different replicator equations. The reference and I are using an autonomous system of only the population proportions / probabilities that isn’t necessarily derived from a Lotka-Volterra equation.

For a combination of mathematical and conceptual reasons, from my experience with the literature, people tend to use models like

• the replicator equation for “very large” or infinite populations, involving only proportions and with fitness functions that only depend on proportions

• the Lotka-Volterra for finite yet “continuous” population sizes, so that derivatives or discrete dynamics systems can still be used

• Markov processes such as the Moran and Wright-Fisher processes for “more realistic” populations (finite with integral population sizes)

(Note I’m not taking issue with your approach and I understand that it retains all the as finite and unbounded.)

(2) Along a trajectory of my replicator equation the population proportions are all functions of a single independent variable, time. Since we’re taking a derivative of a function along a trajectory of the system with respect to time, there’s no freedom to choose a direction for the derivative, even though is multivariate when not constrained to the trajectory. So it is a directional derivative with direction determined by the dynamical system. IIUC this applies in the abstract to both our formulations, but your equation has both and (also functions of just time ultimately, and perhaps the directions are often the same up to normalization).

In my case, the derivative of function (like the mean fitness) along a trajectory can then be understood as

where the gradient is given by the partial derivatives (with all other variables constant) and by the dynamical system. See [1] or a text like [2].

[1] https://en.wikipedia.org/wiki/Lyapunov_function#Basic_Lyapunov_theorems_for_autonomous_systems

[2] Hassan Khalil, “Nonlinear Systems”, just above Theorem 4.1

Your reply leaves me even more befuddled.

1) In the framework I’m describing in this post the population is never “infinite”. The populations are treated as valued in This is an idealization which works best for large populations, but using it we get the Lotka–Volterra equations and these equations are the starting-point of my post. These equations make mathematical sense for any positive finite real populations , so we don’t ask question like “what does a population of 2.5 mean?”—we just work with the math. From these equations we derive the replicator equation

for the probabilities . But the replicator equation is not an autonomous system of differential equations for the probabilities, because it involves the populations as well, and many different choices of populations give the same probabilities

Thus, the time evolution of the probabilities does not depend on just the probabilities now, but also the choice of populations giving these probabilities. Unless, of course, we make an extra assumption on the nature of the fitness functions , such as that they depend only on the probabilities! This assumption is equivalent to demanding

for all and all This in turn is equivalent to

for all

2) A partial derivative with respect to one coordinate only makes sense if we know what other coordinate functions are being held constant: the answer depends on that choice. If we have a function defined on or the positive orthant,

makes sense because implicitly we are holding all the other coordinates constant. If we have a function defined only on the simplex, this partial derivative becomes ambiguous, since it’s impossible to change without changing some of the other coordinates. I can imagine ways to disambiguate it, e.g.: take the directional derivative in the direction on the simplex that makes the smallest angle to the vector pointing in the direction in

Theorem 19.5.1 in *Evolutionary Games and Population Dynamics* does not address either of these issues, making their proof (and yours) very hard to interpret. I’ve tried a number of ways to get it make sense, and they keep breaking down when I try to write up a proof. Right now I have yet another way in mind, but I’m nervous.

For the replicator equation I assumed that the population is infinite and that . (Above I used from the reference which was intended to match your .) Note if we start from the replicator equation, we have that

(even if we don’t assume the proportions sum to 1), so the sum has to be constant and preserved along trajectories (the replicator equation is “forward-invariant”). Anyway in my case only the proportions appear.

As for … I think they are just the formal partial derivatives (assuming variable independence) within the context of the multivariate chain rule for where all the are implicitly functions of . IIUC we don’t need to consider the impact of holding some variables constant in this case. If we knew explicitly as functions of (satisfying constraints), it wouldn’t matter if we used the chain rule as above or rewrote to a function of and then took the derivative.

]]>I’m still confused, Marc, about whether the in your argument are supposed to be probabilities or populations. That is: are they constrained to sum to 1, or not?

In your calculation you say they sum to 1, so let’s assume that.

If they are constrained to sum to 1, I’m not sure what

means, since we can’t change just one while holding the rest fixed. Maybe is defined on the whole orthant

so we know what

means, but then in your calculation below they sum to 1?

This would solves some problems, but not all, since when we derive the replicator equation from the Lotka–Volterra equation, the right-hand side of the replicator equation actually involves not just the *probabilities* but actually the *populations*:

(Here I’m using for probabilities and for populations.) So my next question is, when you write

is this partial derivative being evaluated at (in the simplex) or (in the orthant)?

]]>Thanks for your detailed reply—this is really helpful! I’ll try to explain some of this stuff in a nice way.

I fixed the typos, and I’m happy to do so because it gives me a chance to work through the equations in detail.

One thing I can’t fix is this:

Expert blog-commenters know to click “Reply”, not on the comment they’re replying to, but on the comment *it* was replying to.

If you do this, you create a comment that’s the same width as the comment you’re replying to. If you don’t do this, the comment thread gets skinnier and skinnier, which is especially unpleasant for comments with equations.

]]>‘fitness’–

Maybe a biologist would include learning as a part of fitness.

There is a laboratory experiment called ‘probability learning’ in which the laboratory animal forages optimally in the presence of changes in probability.

The set-up is a number of doors and a very hungry rat. The door to its cage opens. He faces in front of him a number of closed doors. Behind one of these closed doors there is food. If he guess correctly and goes to the correct door, the door will open when he gets there, and there will be food.

(By the way, all the other doors will open as well, showing him that there was food only behind the door that he chose.)

Now let’s say that he chooses incorrectly. The door he’s chosen will open, but there will be no food behind it. All of the other doors will open as well. And at that point, the rat sees the door he should have chosen, behind which there is food. But it is not the particular door that he did choose. Again, there is food behind only one of the doors.

Well, the rat ultimately goes back to his cage for water or to sleep. The door to his cage closes behind him.

And then, after some amount of time, the same situation repeats. The door to his cage opens and he has to make another choice between the closed doors.

Will he choose correctly or not?

(I never got involved in one of these experiments, and I know there may be ethical issues here about starving a rat.)

Little did the rat know that the experimenter had a random number generator and was randomly putting food behind, say, door A 75 percent of the time, and behind door B 25 percent of the time.

When all is said and done, the rat will have chosen door A 75 percent of the time and door B 25 percent of the time. He has ‘learned the probability’ and hence the name of the experiment: ‘probability learning.’

The equation goes like this:

Say the probability of the experimenter putting food behind a particular door is x.

And the probability of the rat choosing that particular door is y.

Then with a probability of (1-x) the experimenter will put the food behind some door other than that particular door.

And with a probability of (1-y) the rat will choose some door other than that particular door.

For each door, the model for the equation is that the rat will associate two different kinds of regret:

Regret at having chosen that particular door when he sees that the food occurred at some other door. Which for a particular door means that the rat will experience this kind of regret with probability ‘y(1-x)’

Regret at having NOT chosen that particular door when he has chosen some other door, and he sees that the food DID occur at that door. Which for a particular door means the rat will experience this kind of regret with probability ‘(1-y)x”

For each door, these two different kinds of regret in the model oppose each other in ‘force’, and the solution will be when these two forces of regret balance each other.

Here is the equation that the rat appears to solve unconsciously:

y(1-x)=x(1-y)

For each door behind which there is food, the answer is x=y.

Now I have to refer you to Randy Gallistel’s book ‘The Organization of Learning,’ chapter 11. There he reports a related foraging experiment with fish. But now instead of an individual animal in a laboratory, the experiment occurs in a fish pond.

Say there are two feeding tubes, out of which prey fish are sent with some probability by the experimenter.

Say that in a chosen window of time, the experimenter sends 7 prey fish out of tube A and 3 prey fish out of tube B. And, say that waiting for this prey is a school of 10 predator fish.

What happens is this:

7 of the predator fish will compete for prey at door A.

While 3 of the predator fish will compete for prey at door B.

It’s a Nash equilibirum.

No single predator fish can improve his chances for a prey fish by going to the other door. Very rarely is a predator fish observed to change doors.

A model for the result is this:

Regret #2, above, becomes dominated by a fear of the competitive group. Within its current group, a fish is probably experiencing some kind of order. While joining the other group is an unknown. Better the devil you know than the one you don’t, I guess.

Now substitute a dollar bill of cost for each predator fish and a dollar bill of benefit for each prey fish, and substitute for the feeding tubes– projects wit positive cash flow for financial investment. ‘Optimal capital budgeting’ would also be this kind Nash equilibirum. No dollar bill of cost could improve its payoff by changing to a different project for investment.

Improved fitness relative to these types of models might come from more evolved imagination and then language, where stories could be told and the language enables more and more things to be imagined, more and more things to regret, doubt and fear.

But then technology enters and the wilderness starts to be eliminated on a limited planet. Highly tuned imaginative story telling that supports ever increasing regret, doubt and fear about more and more things becomes a signal of reduced fitness, and some way of evolving or learning cooperation seems needed for survival.

What does seem evident to more and more people today is that human fitness on a limited planet requires reality-based cooperation rather than more and more highly evolved fantasy in story telling, story telling that’s clearly intended to generate ever increasing levels of doubt and fear.

]]>Yes that proof isn’t the most enlightening / well explained — they reuse the dummy variable in a confusing way as you’ve noted.

FFT can also be shown directly as follows.

Assuming that and that follows the replicator equation, using the chain rule we have that

The last step is bit tricky and follows because we can subtract

That sum is zero because factors out (doesn’t depend on i) and so

and since and we have

So FFT works for the gradient because of the chain rule. In the text Theorem 7.8.1 (on page 82) does this calculation for the special case that for a symmetric matrix .

More generally, if one looks at the mean fitness for an arbitrary , there’s an extra term (like in Wakeham’s article). Why? If we start instead with a vector-valued function and ask about the time derivative of its mean, we have that

The right most term can be manipulated into where denotes the element-wise product (Hadamard product). The two terms are equal when and it’s easy to see that happens if for a symmetric matrix . That makes the two terms of equal so we get twice the variance (e.g. the extra term in FFT is just another copy of the variance). Alternatively, we can start with as the half-mean fitness so that the time derivative is just the variance.

]]>I’m finding Theorem 19.5.1 a bit frustrating. The statement gives two mutually incompatible equations involving

and

My guess is that the first equation for is “just for fun”, not something we should use, though we *should* use

The second equation for is the one actually used in the proof.

Is that right?

Then in (19.18) below, is the time derivative of a point moving around on the probability simplex, defined to have components

Right?

]]>