## Resource Convertibility (Part 3)

13 April, 2015

guest post by Tobias Fritz

In Part 1 and Part 2, we learnt about ordered commutative monoids and how they formalize theories of resource convertibility and combinability. In this post, I would like to say a bit about the applications that have been explored so far. First, the study of resource theories has become a popular subject in quantum information theory, and many of the ideas in my paper actually originate there. I’ll list some references at the end. So I hope that the toolbox of ordered commutative monoids will turn out to be useful for this. But here I would like to talk about an example application that is much easier to understand, but no less difficult to analyze: graph theory and the resource theory of zero-error communication.

A graph consists of a bunch of nodes connected by a bunch of edges, for example like this:

This particular graph is the pentagon graph or 5-cycle. To give it some resource-theoretic interpretation, think of it as the distinguishability graph of a communication channel, where the nodes are the symbols that can be sent across the channel, and two symbols share an edge if and only if they can be unambiguously decoded. For example, the pentagon graph roughly corresponds to the distinguishability graph of my handwriting, when restricted to five letters only:

So my ‘w’ is distinguishable from my ‘u’, but it may be confused for my ‘m’. In order to communicate unambiguously, it looks like I should restrict myself to using only two of those letters in writing, since any third of them may be mistaken for one of the other three. But alternatively, I could use a block code to create context around each letter which allows for perfect disambiguation. This is what happens in practice: I write in natural language, where an entire word is usually not ambiguous.

One can now also consider graph homomorphisms, which are maps like this:

The numbers on the nodes indicate where each node on the left gets mapped to. Formally, a graph homomorphism is a function taking nodes to nodes such that adjacent nodes get mapped to adjacent nodes. If a homomorphism $G\to H$ exists between graphs $G$ and $H,$ then we also write $H\geq G$; in terms of communication channels, we can interpret this as saying that $H$ simulates $G,$ since the homomorphism provides a map between the symbols which preserves distinguishability. A ‘code’ for a communication channel is then just a homomorphism from the complete graph in which all nodes share an edge to the graph which describes the channel. With this ordering structure, the collection of all finite graphs forms an ordered set. This ordered set has an intricate structure which is intimately related to some big open problems in graph theory.

We can also combine two communication channels to form a compound one. Going back to the handwriting example, we can consider the new channel in which the symbols are pairs of letters. Two such pairs are distinguishable if and only if either the first letters of each pair are distinguishable or the second letters are,

$(a,b) \sim (a',b') \:\Leftrightarrow\: a\sim a' \:\lor\: b\sim b'$

When generalized to arbitrary graphs, this yields the definition of disjunctive product of graphs. It is not hard to show that this equips the ordered set of graphs with a binary operation compatible with the ordering, so that we obtain an ordered commutative monoid denoted Grph. It mathematically formalizes the resource theory of zero-error communication.

Using the toolbox of ordered commutative monoids combined with some concrete computations on graphs, one can show that Grph is not cancellative: if $K_{11}$ is the complete graph on 11 nodes, then $3C_5\not\geq K_{11},$ but there exists a graph $G$ such that

$3 C_5 + G \geq K_{11} + G$

The graph $G$ turns out to have 136 nodes. This result seems to be new. But if you happen to have seen something like this before, please let me know!

Last time, we also talked about rates of conversion. In Grph, it turns out that some of these correspond to famous graph invariants! For example, the rate of conversion from a graph $G$ to the single-edge graph $K_2$ is Shannon capacity $\Theta(\overline{G}),$ where $\overline{G}$ is the complement graph. This is of no surprise since $\Theta$ was originally defined by Shannon with precisely this rate in mind, although he did not use the language of ordered commutative monoids. In any case, the Shannon capacity $\Theta(\overline{G})$ is a graph invariant notorious for its complexity: it is not known whether there exists an algorithm to compute it! But an application of the Rate Theorem from Part 2 gives us a formula for the Shannon capacity:

$\Theta(\overline{G}) = \inf_f f(G)$

where $f$ ranges over all graph invariants which are monotone under graph homomorphisms, multiplicative under disjunctive product, and normalized such that $f(K_2) = 2.$ Unfortunately, this formula still not produce an algorithm for computing $\Theta.$ But it nonconstructively proves the existence of many new graph invariants $f$ which approximate the Shannon capacity from above.

Although my story ends here, I also feel that the whole project has barely started. There are lots of directions to explore! For example, it would be great to fit Shannon’s noisy channel coding theorem into this framework, but this has turned out be technically challenging. If you happen to be familiar with rate-distortion theory and you want to help out, please get in touch!

#### References

Here is a haphazard selection of references on resource theories in quantum information theory and related fields:

• Igor Devetak, Aram Harrow and Andreas Winter, A resource framework for quantum Shannon theory.

• Gilad Gour, Markus P. Müller, Varun Narasimhachar, Robert W. Spekkens and Nicole Yunger Halpern, The resource theory of informational nonequilibrium in thermodynamics.

• Fernando G.S.L. Brandão, Michał Horodecki, Nelly Huei Ying Ng, Jonathan Oppenheim and Stephanie Wehner, The second laws of quantum thermodynamics.

• Iman Marvian and Robert W. Spekkens, The theory of manipulations of pure state asymmetry: basic tools and equivalence classes of states under symmetric operations.

• Elliott H. Lieb and Jakob Yngvason, The physics and mathematics of the second law of thermodynamics.

## Resource Convertibility (Part 2)

10 April, 2015

guest post by Tobias Fritz

In Part 1, I introduced ordered commutative monoids as a mathematical formalization of resources and their convertibility. Today I’m going to say something about what to do with this formalization. Let’s start with a quick recap!

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Recall also that we think of the $x,y\in A$ as resource objects such that $x+y$ represents the object consisting of $x$ and $y$ together, and $x\geq y$ means that the resource object $x$ can be converted into $y.$

When confronted with an abstract definition like this, many people ask: so what is it useful for? The answer to this is twofold: first, it provides a language which we can use to guide our thoughts in any application context. Second, the definition itself is just the very start: we can now also prove theorems about ordered commutative monoids, which can be instantiated in any particular application context. So the theory of ordered commutative monoids will provide a useful toolbox for talking about concrete resource theories and studying them. In the remainder of this post, I’d like to say a bit about what this toolbox contains. For more, you’ll have to read the paper!

To start, let’s consider catalysis as one of the resource-theoretic phenomena neatly captured by ordered commutative monoids. Catalysis is the phenomenon that certain conversions become possible only due to the presence of a catalyst, which is an additional resource object which does not get consumed in the process of the conversion. For example, we have

$\text{timber + nails}\not\geq \text{table},$

$\text{timber + nails + saw + hammer} \geq \text{table + saw + hammer}$

because making a table from timber and nails requires a saw and a hammer as tools. So in this example, ‘saw $+$ hammer’ is a catalyst for the conversion of ‘timber $+$ nails’ into ‘table’. In mathematical language, catalysis occurs precisely when the ordered commutative monoid is not cancellative, which means that $x + z\geq y + z$ sometimes holds even though $x\geq y$ does not. So, the notion of catalysis perfectly matches up with a very natural and familiar notion from algebra.

One can continue along these lines and study those ordered commutative monoids which are cancellative. It turns out that every ordered commutative monoid can be made cancellative in a universal way; in the resource-theoretic interpretation, this boils down to replacing the convertibility relation by catalytic convertibility, in which $x$ is declared to be convertible into $y$ as soon as there exists a catalyst which achieves this conversion. Making an ordered commutative monoid cancellative like this is a kind of ‘regularization': it leads to a mathematically more well-behaved structure. As it turns out, there are several additional steps of regularization that can be performed, and all of these are both mathematically natural and have an appealing resource-theoretic interpretation. These regularizations successively take us from the world of ordered commutative monoids to the realm of linear algebra and functional analysis, where powerful theorems are available. For now, let me not go into the details, but only try to summarize one of the consequences of this development. This requires a bit of preparation.

In many situations, it is not just of interest to convert a single copy of some resource object $x$ into a single copy of some $y;$ instead, one may be interested in converting many copies of $x$ into many copies of $y$ all together, and thereby maximizing (or minimizing) the ratio of the resulting number of $y$‘s compared to the number of $x$‘s that get consumed. This ratio is measured by the maximal rate:

$\displaystyle{ R_{\mathrm{max}}(x\to y) = \sup \left\{ \frac{m}{n} \:|\: nx \geq my \right\} }$

Here, $m$ and $n$ are natural numbers, and $nx$ stands for the $n$-fold sum $x+\cdots+x,$ and similarly for $my.$ So this maximal rate quantifies how many $y$’ s we can get out of one copy of $x,$ when working in a ‘mass production’ setting. There is also a notion of regularized rate, which has a slightly more complicated definition that I don’t want to spell out here, but is similar in spirit. The toolbox of ordered commutative monoids now provides the following result:

Rate Theorem: If $x\geq 0$ and $y\geq 0$ in an ordered commutative monoid $A$ which satisfies a mild technical assumption, then the maximal regularized rate from $x$ to $y$ can be computed like this:

$\displaystyle{ R_{\mathrm{max}}^{\mathrm{reg}}(x\to y) = \inf_f \frac{f(y)}{f(x)} }$

where $f$ ranges over all functionals on $A$ with $f(y)\neq 0.$

Wait a minute, what’s a ‘functional’? It’s defined to be a map $f:A\to\mathbb{R}$ which is monotone,

$x\geq y \:\Rightarrow\: f(x)\geq f(y)$

$f(x+y) = f(x) + f(y)$

In economic terms, we can think of a functional as a consistent assignment of prices to all resource objects. If $x$ is at least as useful as $y,$ then the price of $x$ should be at least as high as the price of $y$; and the price of two objects together should be the sum of their individual prices. So the $f$ in the rate formula above ranges over all ‘markets’ on which resource objects can be ‘traded’ at consistent prices. The term ‘functional’ is supposed to hint at a relation to functional analysis. In fact, the proof of the theorem crucially relies on the Hahn–Banach Theorem.

The mild technical mentioned in the Rate Theorem is that the ordered commutative monoid needs to have a generating pair. This turns out to hold in the applications that I have considered so far, and I hope that it will turn out to hold in most others as well. For the full gory details, see the paper.

So this provides some idea of what kinds of gadgets one can find in the toolbox of ordered commutative monoids. Next time, I’ll show some applications to graph theory and zero-error communication and say a bit about where this project might be going next.

## Resource Convertibility (Part 1)

7 April, 2015

guest post by Tobias Fritz

Hi! I am Tobias Fritz, a mathematician at the Perimeter Institute for Theoretical Physics in Waterloo, Canada. I like to work on all sorts of mathematical structures which pop up in probability theory, information theory, and other sorts of applied math. Today I would like to tell you about my latest paper:

It should be of interest to Azimuth readers as it forms part of what John likes to call ‘green mathematics’. So let’s get started!

Resources and their management are an essential part of our everyday life. We deal with the management of time or money pretty much every day. We also consume natural resources in order to afford food and amenities for (some of) the 7 billion people on our planet. Many of the objects that we deal with in science and engineering can be considered as resources. For example, a communication channel is a resource for sending information from one party to another. But for now, let’s stick with a toy example: timber and nails constitute a resource for making a table. In mathematical notation, this looks like so:

$\mathrm{timber} + \mathrm{nails} \geq \mathrm{table}$

We interpret this inequality as saying that “given timber and nails, we can make a table”. I like to write it as an inequality like this, which I think of as stating that having timber and nails is at least as good as having a table, because the timber and nails can always be turned into a table whenever one needs a table.

To be more precise, we should also take into account that making the table requires some tools. These tools do not get consumed in the process, so we also get them back out:

$\text{timber} + \text{nails} + \text{saw} + \text{hammer} \geq \text{table} + \text{hammer} + \text{saw}$

Notice that this kind of equation is analogous to a chemical reaction equation like this:

$2 \mathrm{H}_2 + \mathrm{O}_2 \geq \mathrm{H}_2\mathrm{O}$

So given a hydrogen molecules and an oxygen molecule, we can let them react such as to form a molecule of water. In chemistry, this kind of equation would usually be written with an arrow ‘$\rightarrow$’ instead of an ordering symbol ‘$\geq$’ , but here we interpret the equation slightly differently. As with the timber and the nails and nails above, the inequality says that if we have two hydrogen atoms and an oxygen atom, then we can let them react to a molecule of water, but we don’t have to. In this sense, having two hydrogen atoms and an oxygen atom is at least as good as having a molecule of water.

So what’s going on here, mathematically? In all of the above equations, we have a bunch of stuff on each side and an inequality ‘$\geq$’ in between. The stuff on each side consists of a bunch of objects tacked together via ‘$+$’ . With respect to these two pieces of structure, the collection of all our resource objects forms an ordered commutative monoid:

Definition: An ordered commutative monoid is a set $A$ equipped with a binary relation $\geq,$ a binary operation $+,$ and a distinguished element $0$ such that the following hold:

$+$ and $0$ equip $A$ with the structure of a commutative monoid;

$\geq$ equips $A$ with the structure of a partially ordered set;

• addition is monotone: if $x\geq y,$ then also $x + z \geq y + z.$

Here, the third axiom is the most important, since it tells us how the additive structure interacts with the ordering structure.

Ordered commutative monoids are the mathematical formalization of resource convertibility and combinability as follows. The elements $x,y\in A$ are the resource objects, corresponding to the ‘collections of stuff’ in our earlier examples, such as $x = \text{timber} + \text{nails}$ or $y = 2 \text{H}_2 + \text{O}_2.$ Then the addition operation simply joins up collections of stuff into bigger collections of stuff. The ordering relation $\geq$ is what formalizes resource convertibility, as in the examples above. The third axiom states that if we can convert $x$ into $y,$ then we can also convert $x$ together with $z$ into $y$ together with $z$ for any $z,$ for example by doing nothing to $z.$

A mathematically minded reader might object that requiring $A$ to form a partially ordered set under $\geq$ is too strong a requirement, since it requires two resource objects to be equal as soon as they are mutually interconvertible: $x \geq y$ and $y \geq x$ implies $x = y.$ However, I think that this is not an essential restriction, because we can regard this implication as the definition of equality: ‘$x = y$’ is just a shorthand notation for ‘$x\geq y$ and $y\geq x$’ which formalizes the perfect interconvertibility of resource objects.

We could now go back to the original examples and try to model carpentry and chemistry in terms of ordered commutative monoids. But as a mathematician, I needed to start out with something mathematically precise and rigorous as a testing ground for the formalism. This helps ensure that the mathematics is sensible and useful before diving into real-world applications. So, the main example in my paper is the ordered commutative monoid of graphs, which has a resource-theoretic interpretation in terms of zero-error information theory. As graph theory is a difficult and traditional subject, this application constitutes the perfect training camp for the mathematics of ordered commutative monoids. I will get to this in Part 3.

In Part 2, I will say something about what one can do with ordered commutative monoids. In the meantime, I’d be curious to know what you think about what I’ve said so far!

## A Networked World (Part 3)

2 April, 2015

guest post by David Spivak

### From parts to wholes

Remember where we were. Ologs, linguistically-enhanced sketches, just weren’t doing justice to the idea that each step in a recipe is itself a recipe. But the idea seemed ripe for mathematical formulation.

Thus, I returned to a question I’d wondered about in the very beginning: how is macro-understanding built from micro-understanding? How can multiple individual humans come together, like cells in a multicellular organism, to make a whole that is itself a surviving decision-maker?

There were, and continue to be, a lot of “open-to-Spivak” questions one can ask: How are stories about events built from sub-stories about sub-events? How is macro-economics built from micro-economics? Are large-scale phenomena always based on, and relatable to, interactions between smaller-scale phenomena? For example, I still want to understand, in very basic terms, how classical (large-scale) phenomena are a manifestation of quantum phenomena.

Neuroscience professor Michael Gazzaniga has a similar question: How does cognition arise from the interaction of tiny event-noticers, and how does society emerge and effect individual brains? As put it in the last paragraph of his book Who’s In Charge, we are in need of a language by which to understand the interfaces of “our layered hierarchical existence”, because doing so “holds the answer to our quest for understanding mind/brain relationships.” He goes on:

Understanding how to develop a vocabulary for those layered interactions, for me, constitutes the scientific problem of this century.

I tend to be infatuated with this same kind of idea: cognition emerging from interactions between sub-cognitive pieces. This is what got me interested in what I now call “operadic modularity”. Luckily again, my Office of Naval Research hero (now at the Air Force Office of Scientific Research) granted me a chance to study it.

The idea is this: modularity is about arranging many modules into a single whole, which is another module, usable as part of a larger system. Each system of modules is a module of its own, and we see the nesting phenomenon. Operads can model the language of nestable interface arrangements, and their algebras can model how interfaces are filled in with the required sorts of modules.

Here, by operad, I mean symmetric colored operad, or symmetric multicategory. Operads are like categories—they have objects, morphisms, identities, and a unital and associative composition formula—the difference is that the domain of a morphism is a finite set of objects (in an operad) rather than a single object (as in a category). So morphisms in an operad are written like $\varphi\colon X_1,\ldots,X_n\to Y;$ we call such a morphism n-ary.

An early example, formulated operadically by Peter May (the inventor of operads) is called the little 2-cubes operad, denoted $E_2.$ It has only one object, say a square ⬜, and an n-ary morphism

⬜ ,…, ⬜ $\longrightarrow$

is any arrangement of $n$ non-overlapping squares in a larger square. These arrangements clearly display a nesting property.

Another source of examples comes from the fact that every monoidal category $(C,\otimes)$ has an underlying operad $\mathcal{O}_C,$ with

$\mathrm{Hom}_{\mathcal{O}_C}(X_1,\ldots,X_n;Y):=\mathrm{Hom}_{C}(X_1\otimes\cdots\otimes X_n,Y)$

(Either $C$ was symmetric monoidal to begin with or you can add in symmetries, roughly by multiplying each hom-set by $n!.$) The operad $\mathbf{Set}$ underlying the cartesian monoidal category $(\mathbf{Set},\times,\{1\})$ of sets is an example I’ll use later.

If you want to think about operads as modeling modularity—building one thing out of many—the first trick is to imagine the codomain object as the exterior and all the domain objects as sitting inside it, on the interior. May’s little 2-cubes operad gives the idea: squares in a square. From now on, if I speak of many little objects arranged inside one big object, I always mean it this way: the interior objects constitute the domain, the exterior object is the codomain, and the arrangement itself is the morphism. These arrangements can be nested inside one another, corresponding to composition in the operad.

What are other types of nested phenomena, which we might be able to think about operadically? How about circles wired together in a larger circle? An object in this operad is a circle with some number of wires sticking out; let’s call it a ported-circle. A morphism from n-many ported-circles to one ported-circle is any connection pattern involving—i.e., wiring together of—the ports. This description can be interpreted in a few different ways; I usually mean the underlying operad of the monoidal category of “sets and cospans under disjoint union”, but the “spaghetti and meatballs operad” of circular planar arc diagrams is another good interpretation.

Once you have an operad $\mathcal{O},$ you have a kind of calculus for nestable arrangements. As I’ve been saying, I often think of the morphisms in an operad in terms of pictures, such as wiring diagrams or squares-in-a-square. If you say you want these pictures to “mean something”, you’re probably looking for an algebra $F:\mathcal{O}\to\textbf{Sets}.$ This operad functor $F,$ which acts like a lax functor between monoidal categories, would tell you the set $F(X)$ of fillers or fills that can be placed into each object $X\in\mathcal{O}$ in the picture.

I often think of the operad $\mathcal{O}$ as a picture language, and the $\mathcal{O}$-algebra $F$ its intended semantics. Not only does such a set-valued functor on $\mathcal{O}$ give you a set of fills for each object $X\in\mathcal{O}$, it would also give you a formula for producing a large-scale fill (element of $F(Y)$) from any arrangement $\varphi\colon X_1,\ldots,X_n\to Y$ of small-scale fills (element of $F(X_1)\times\cdots\times F(X_n)$).

For example, given a pointed space $A$, you can ask for the set of based 2-spheres

$L_A($$)=\{\ell\colon S^2\to A\}$

in it. Here, $\ell$ is any element of $L_A($$).$ Think of a based sphere in $A$ as a continuous map from the filled-in square to $A$ that sends the boundary of the square to the basepoint of $A.$ Given n spheres $\ell_1,\ldots, \ell_n$ in $A,$ an arrangement $\varphi$ of non-overlapping squares in a square prescribes a new based sphere $L_A(\varphi)(\ell_1,\ldots, \ell_n)\in L_A($$).$ The idea is that you send all the unused space in the big exterior square to the basepoint of $A$, and follow the instructions $\ell_i$ when you get to the $i$th little square inside. Thus any “2-fold loop space” gives an algebra of May’s little 2-cubes operad.

So recently, I’ve been thinking a lot about operadic modularity, i.e., cases in which a thing can be built out of a bunch of simpler things. Note that not all cases of “nesting” have such a clear picture language. For example, context-free grammars are modular: you build [postal-address] out of [name-part], [street-address] and [zip-part], you build each of these, e.g., [name-part], in any of several ways (there is an optional suffix part and the option to abbreviate your first name using an initial). The point is, you build things out of smaller parts, nested inside still smaller parts. Seeing context-free grammars as free operads is one of the things Hermida, Makkai, and Power explained in their paper on higher dimensional multigraphs.

The operadic notion of modularity can also be applied to building hierarchical protein materials. Like context-free grammars, the operad of such materials doesn’t come with a nice picture language. However, it can be formalized as an operad nonetheless. That is, there is a grammar of actions that one can apply to a bunch of polypeptides, actions such as “attach”, “overlay”, “rigidMotion”, “helix”, “makeArray”. From these you can build proteins that are quite complex from a simple vocabulary of 20 amino acids. I’ve joined forces with Tristan Giesa and Ravi Jagadeesan to make such a program. The software package, called Matriarch, for “Materi-als Arch-itecture”, should be available soon as an open source Python library.

There are lots of operads whose morphisms look like string diagrams of various sorts. These operads, which generalize a set-theoretic version of May’s topological little 2-cubes, have clear picture languages. The algebras on such “visualizable” operads can model things like databases and dynamical systems. Over the past year or so, I’ve been writing a series of “worked example” papers, such as those above, in which I explain various picture languages and semantics for them.

I find that operads provide a nice language in which to discuss string diagrams of various sorts. String-diagrammatic languages exist for many different “doctrines”, such as categories, categories without identities, monoidal categories, cartesian monoidal categories, traced monoidal categories, operads, etc. For example, Dylan Rupel and I realized that traced monoidal categories are (well, if you have enough equipment and an expert like Patrick Schultz around) algebras on the operad of oriented 1-cobordisms. It seems to me that the other doctrines above are similarly associated to operads that are “nearby” Cob, e.g., sub-operads of Cob, operads under Cob, etc. Maps between these various operads should induce known adjunctions between the corresponding doctrines.

That brings us to present day. There will be a workshop in Turin in a couple of months, and I think it’ll be a lot of fun:

• Categorical Foundations of Network Theory, May 25-28, ISI Foundation, Turin, Italy.

I’m looking forward to hearing from John Baez, Jacob Biamonte, Eugene Lerman, Tobias Fritz and others, about what they’ve been thinking about recently. I think we’ll find interesting common ground. If there’s interest, I’d be happy to talk about categorical models for how information is communicated throughout a network, and whether this gives any insight that can lead to better decision-making by the larger whole.

## A Networked World (Part 2)

30 March, 2015

guest post by David Spivak

### Creating a knowledge network

In 2007, I asked myself: as mathematically as possible, what can formally ground meaningful information, including both its successful communication and its role in decision-making? I believed that category theory could be useful in formalizing the type of object that we call information, and the type of relationship that we call communication.

Over the next few years, I worked on this project. I tried to understand what information is, how it is stored, and how it can be transferred between entities that think differently. Since databases store information, I wanted to understand databases category-theoretically. I eventually decided that databases are basically just categories $\mathcal{C},$ corresponding to a collection of meaningful concepts and connections between them, and that these categories are equipped with functors $\mathcal{C}\to\mathbf{Set}.$ Such a functor assigns to each meaningful concept a set of examples and connects them as dictated by the morphisms of $\mathbf{Set}.$ I later found out that this “databases as categories” idea was not original; it is due to Rosebrugh and others. My view on the subject has matured a bit since then, but I still like this basic conception of databases.

If we model a person’s knowledge as a database (interconnected tables of examples of things and relationships), then the network of knowledgeable humans could be conceptualized as a simplicial complex equipped with a sheaf of databases. Here, a vertex represents an individual, with her database of knowledge. An edge represents a pair of individuals and a common ground database relating their individual databases. For example, you and your brother have a database of concepts and examples from your history. The common-ground database is like the intersection of the two databases, but it could be smaller (if the people don’t yet know they agree on something). In a simplicial complex, there are not only vertices and edges, but also triangles (and so on). These would represent databases held in common between three or more people.

I wanted “regular people” to actually make such a knowledge network, i.e., to share their ideas in the form of categories and link them together with functors. Of course, most people don’t know categories and functors, so I thought I’d make things easier for them by equipping categories with linguistic structures: text boxes for objects, labeled arrows for morphisms. For example, “a person has a mother” would be a morphism from the “person” object, to the “mother” object. I called such a linguistic category an olog, playing on the word blog. The idea (originally inspired during a conversation with my friend Ralph Hutchison) was that I wanted people, especially scientists, to blog their ontologies, i.e., to write “onto-logs” like others make web-logs.

Ologs codify knowledge. They are like concept webs, except with more rules that allow them to simultaneously serve as database schemas. By introducing ologs, I hoped I could get real people to upload their ideas into what is now called the cloud, and make the necessary knowledge network. I tried to write my papers to engage an audience of intelligent lay-people rather than for an audience of mathematicians. It was a risk, but to me it was the only honest approach to the larger endeavor.

(For students who might want to try going out on a limb like this, you should know that I was offered zero jobs after my first postdoc at University of Oregon. The risk was indeed risky, and one has to be ok with that. I personally happened to be the beneficiary of good luck and was offered a grant, out of the clear blue sky, by a former PhD in algebraic geometry, who worked at the Office of Naval Research at the time. That, plus the helping hands of Haynes Miller and many other brilliant and wonderful people, can explain how I lived to tell the tale.)

So here’s how the simplicial complex of ologs would ideally help humanity steer. Suppose we say that in order for one person to learn from another, the two need to find a common language and align some ideas. This kind of (usually tacit) agreement on, or alignment of, an initial common-ground vocabulary and concept-set is important to get their communication onto a proper footing.

For two vertices in such a simplicial network, the richer their common-ground olog (i.e., the database corresponding to the edge between them) is, the more quickly and accurately the vertices can share new ideas. As ideas are shared over a simplex, all participating databases can be updated, hence making the communication between them richer. In around 2010, Mathieu Anel and I worked out a formal way this might occur; however, we have not yet written it up. The basic idea can be found here.

In this setup, the simplicial complex of human knowledge should grow organically. Scientists, business people, and other people might find benefit in ologging their ideas and conceptions, and using them to learn from their peers. I imagined a network organizing itself, where simplices of like-minded people could share information with neighboring groups across common faces.

I later wrote a book called Category Theory for the Sciences, available free online, to help scientists learn how category theory could apply to familiar situations like taxonomies, graphs, and symmetries. Category theory, simply explained, becomes a wonderful key to the whole world of pure mathematics. It’s the closest thing we have to a universal language of thought, and therefore an appropriate language for forming connections.

My working hypothesis for the knowledge network was this. The information held by people whose worldview is more true—more accurate—would have better predictive power, i.e., better results. This is by definition: I define ones knowledge to be accurate as the extent to which, when he uses this knowledge to direct his actions, he has good luck handling his worldly affairs. As Louis Pasteur said, “Luck favors the prepared mind.” It follows that if someone has a track record of success, others will value finding broad connections into his olog. However, to link up with someone you must find a part of your olog that aligns with his—a functorial connection—and you can only receive meaningful information from him to the extent that you’ve found such common ground.

Thus, people who like to live in fiction worlds would find it difficult to connect, except to other like-minded “Obama’s a Christian”-type people. To the extent you are imbedded in a fictional—less accurate, less predictive—part of the network, you will find it difficult to establish functorial connections to regions of more accurate knowledge, and therefore you can’t benefit from the predictive and conceptual value of this knowledge.

In other words, people would be naturally inclined to try to align their understanding with people that are better informed. I felt hope that this kind of idea could lead to a system in which honesty and accuracy were naturally rewarded. At the very least, those who used it could share information much more effectively than they do now. This was my plan; I just had to make it real.

I had a fun idea for publicizing ologs. The year was in 2008, and I remember thinking it would be fantastic if I could olog the political platform and worldview of Barack Obama and of Sarah Palin. I wished I could sit down with them and other politicians and help them write ologs about what they believed and wanted for the country. I imagined that some politicians might have ologs that look like a bunch of disconnected text boxes—like a brain with neurons but no synapses—a collection of talking points but no real substantive ideas.

Anyway, there I was, trying to understand everything this way: all information was categories (or perhaps sketches) and presheaves. I would work with interested people from any academic discipline, such as materials science, to make ologs about whatever information they wanted to record category-theoretically. Ologs weren’t a theory of everything, but instead, as Jack Morava put it, a theory of anything.

One day I was working on a categorical sketch to model processes within processes, but somehow it really wasn’t working properly. The idea was simple: each step in a recipe is a mini-recipe of its own. Like chopping the carrots means getting out a knife and cutting board, putting a carrot on there, and bringing the knife down successively along it. You can keep zooming into any of these and see it as its own process. So there is some kind of nested, fractal-like behavior here. The olog I made could model the idea of steps in a recipe, but I found it difficult to encode the fact that each step was itself a recipe.

This nesting thing seemed like an idea that mathematics should treat beautifully, and ologs weren’t doing it justice. It was then when I finally admitted that there might be other fish in the mathematical sea.

## A Networked World (Part 1)

27 March, 2015

guest post by David Spivak

### The problem

The idea that’s haunted me, and motivated me, for the past seven years or so came to me while reading a book called The Moment of Complexity: our Emerging Network Culture, by Mark C. Taylor. It was a fascinating book about how our world is becoming increasingly networked—wired up and connected—and that this is leading to a dramatic increase in complexity. I’m not sure if it was stated explicitly there, but I got the idea that with the advent of the World Wide Web in 1991, a new neural network had been born. The lights had been turned on, and planet earth now had a brain.

I wondered how far this idea could be pushed. Is the world alive, is it a single living thing? If it is, in the sense I meant, then its primary job is to survive, and to survive it’ll have to make decisions. So there I was in my living room thinking, “oh my god, we’ve got to steer this thing!”

Taylor pointed out that as complexity increases, it’ll become harder to make sense of what’s going on in the world. That seemed to me like a big problem on the horizon, because in order to make good decisions, we need to have a good grasp on what’s occurring. I became obsessed with the idea of helping my species through this time of unprecedented complexity. I wanted to understand what was needed in order to help humanity make good decisions.

What seemed important as a first step is that we humans need to unify our understanding—to come to agreement—on matters of fact. For example, humanity still doesn’t know whether global warming is happening. Sure almost all credible scientists have agreed that it is happening, but does that steer money into programs that will slow it or mitigate its effects? This isn’t an issue of what course to take to solve a given problem; it’s about whether the problem even exists! It’s like when people were talking about Obama being a Muslim, born in Kenya, etc., and some people were denying it, saying he was born in Hawaii. If that’s true, why did he repeatedly refuse to show his birth certificate?

It is important, as a first step, to improve the extent to which we agree on the most obvious facts. This kind of “sanity check” is a necessary foundation for discussions about what course we should take. If we want to steer the ship, we have to make committed choices, like “we’re turning left now,” and we need to do so as a group. That is, there needs to be some amount of agreement about the way we should steer, so we’re not fighting ourselves.

Luckily there are a many cases of a group that needs to, and is able to, steer itself as a whole. For example as a human, my neural brain works with my cells to steer my body. Similarly, corporations steer themselves based on boards of directors, and based on flows of information, which run bureaucratically and/or informally between different parts of the company. Note that in neither case is there any suggestion that each part—cell, employee, or corporate entity—is “rational”; they’re all just doing their thing. What we do see in these cases is that the group members work together in a context where information and internal agreement is valued and often attained.

It seemed to me that intelligent, group-directed steering is possible. It does occur. But what’s the mechanism by which it happens, and how can we think about it? I figured that the way we steer, i.e., make decisions, is by using information.

I should be clear: whenever I say information, I never mean it “in the sense of Claude Shannon”. As beautiful as Shannon’s notion of information is, he’s not talking about the kind of information I mean. He explicitly said in his seminal paper that information in his sense is not concerned with meaning:

Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages.

In contrast, I’m interested in the semantic stuff, which flows between humans, and which makes possible decisions about things like climate change. Shannon invented a very useful quantitative measure of meaningless probability distributions.

That’s not the kind of information I’m talking about. When I say “I want to know what information is”, I’m saying I want to formulate the notion of human-usable semantic meaning, in as mathematical a way as possible.

Back to my problem: we need to steer the ship, and to do so we need to use information properly. Unfortunately, I had no idea what information is, nor how it’s used to make decisions (let alone to make good ones), nor how it’s obtained from our interaction with the world. Moreover, I didn’t have a clue how the minute information-handling at the micro-level, e.g., done by cells inside a body or employees inside a corporation, would yield information-handling at the macro (body or corporate) level.

I set out to try to understand what information is and how it can be communicated. What kind of stuff is information? It seems to follow rules: facts can be put together to form new facts, but only in certain ways. I was once explaining this idea to Dan Kan, and he agreed saying, “Yes, information is inherently a combinatorial affair.” What is the combinatorics of information?

Communication is similarly difficult to understand, once you dig into it. For example, my brain somehow enables me to use information and so does yours. But our brains are wired up in personal and ad hoc ways, when you look closely, a bit like a fingerprint or retinal scan. I found it fascinating that two highly personalized semantic networks could interface well enough to effectively collaborate.

There are two issues that I wanted to understand, and by to understand I mean to make mathematical to my own satisfaction. The first is what information is, as structured stuff, and what communication is, as a transfer of structured stuff. The second is how communication at micro-levels can create, or be, understanding at macro-levels, i.e., how a group can steer as a singleton.

Looking back on this endeavor now, I remain concerned. Things are getting increasingly complex, in the sorts of ways predicted by Mark C. Taylor in his book, and we seem to be losing some control: of the NSA, of privacy, of people 3D printing guns or germs, of drones, of big financial institutions, etc.

Can we expect or hope that our species as a whole will make decisions that are healthy, like keeping the temperature down, given the information we have available? Are we in the driver’s seat, or is our ship currently in the process of spiraling out of our control?

Let’s assume that we don’t want to panic but that we do want to participate in helping the human community to make appropriate decisions. A possible first step could be to formalize the notion of “using information well”. If we could do this rigorously, it would go a long way toward helping humanity get onto a healthy course. Further, mathematics is one of humanity’s best inventions. Using this tool to improve our ability to use information properly is a non-partisan approach to addressing the issue. It’s not about fighting, it’s about figuring out what’s happening, and weighing all our options in an informed way.

So, I ask: What kind of mathematics might serve as a formal ground for the notion of meaningful information, including both its successful communication and its role in decision-making?

## Stationary Stability in Finite Populations

24 March, 2015

guest post by Marc Harper

A while back, in the article Relative entropy minimization in evolutionary dynamics, we looked at extensions of the information geometry / evolutionary game theory story to more general time-scales, incentives, and geometries. Today we’ll see how to make this all work in finite populations!

Let’s recall the basic idea from last time, which John also described in his information geometry series. The main theorem is this: when there’s an evolutionarily stable state for a given fitness landscape, the relative entropy between the stable state and the population distribution decreases along the population trajectories as they converge to the stable state. In short, relative entropy is a Lyapunov function. This is a nice way to look at the action of a population under natural selection, and it has interesting analogies to Bayesian inference.

The replicator equation is a nice model from an intuitive viewpoint, and it’s mathematically elegant. But it has some drawbacks when it comes to modeling real populations. One major issue is that the replicator equation implicitly assumes that the population proportions of each type are differentiable functions of time, obeying a differential equation. This only makes sense in the limit of large populations. Other closely related models, such as the Lotka-Volterra model, focus on the number of individuals of each type (e.g. predators and prey) instead of the proportion. But they often assume that the number of individuals is a differentiable function of time, and a population of 3.5 isn’t very realistic either.

Real populations of replicating entities are not infinitely large; in fact they are often relatively small and of course have whole numbers of each type, at least for large biological replicators (like animals). They take up space and only so many can interact meaningfully. There are quite a few models of evolution that handle finite populations and some predate the replicator equation. Models with more realistic assumptions typically have to leave the realm of derivatives and differential equations behind, which means that the analysis of such models is more difficult, but the behaviors of the models are often much more interesting. Hopefully by the end of this post, you’ll see how all of these diagrams fit together:

One of the best-known finite population models is the Moran process, which is a Markov chain on a finite population. This is the quintessential birth-death process. For a moment consider a population of just two types $A$ and $B.$ The state of the population is given by a pair of nonnegative integers $(a,b)$ with $a+b=N,$ the total number of replicators in the population, and $a$ and $b$ the number of individuals of type $A$ and $B$ respectively. Though it may artificial to fix the population size $N,$ this often turns out not to be that big of a deal, and you can assume the population is at its carrying capacity to make the assumption realistic. (Lots of people study populations that can change size and that have replicators spatially distributed say on a graph, but we’ll assume they can all interact with each whenever they want for now).

A Markov model works by transitioning from state to state in each round of the process, so we need to define the transitions probabilities to complete the model. Let’s put a fitness landscape on the population, given by two functions $f_A$ and $f_B$ of the population state $(a,b).$ Now we choose an individual to reproduce proportionally to fitness, e.g. we choose an $A$ individual to reproduce with probability

$\displaystyle{ \frac{a f_A}{a f_A + b f_B} }$

since there are $a$ individuals of type $A$ and they each have fitness $f_A.$ This is analogous to the ratio of fitness to mean fitness from the discrete replicator equation, since

$\displaystyle{ \frac{a f_A}{a f_A + b f_B} = \frac{\frac{a}{N} f_A}{\frac{a}{N} f_A + \frac{b}{N} f_B} \to \frac{x_i f_i(x)}{\overline{f(x)}} }$

and the discrete replicator equation is typically similar to the continuous replicator equation (this can be made precise), so the Moran process captures the idea of natural selection in a similar way. Actually there is a way to recover the replicator equation from the Moran process in large populations—details at the end!

We’ll assume that the fitnesses are nonnegative and that the total fitness (the denominator) is never zero; if that seems artificial, some people prefer to transform the fitness landscape by $e^{\beta f(x)},$ which gives a ratio reminiscent of the Boltzmann or Fermi distribution from statistical physics, with the parameter $\beta$ playing the role of intensity of selection rather than inverse temperature. This is sometimes called Fermi selection.

That takes care of the birth part. The death part is easier: we just choose an individual at random (uniformly) to be replaced. Now we can form the transition probabilities of moving between population states. For instance the probability of moving from state $(a,b)$ to $(a+1, b-1)$ is given by the product of the birth and death probabilities, since they are independent:

$\displaystyle{ T_a^{a+1} = \frac{a f_A}{a f_A + b f_B} \frac{b}{N} }$

since we have to chose a replicator of type $A$ to reproduce and one of type $B$ to be replaced. Similarly for $(a,b)$ to $(a-1, b+1)$ (switch all the a’s and b’s), and we can write the probability of staying in the state $(a, N-a)$ as

$T_a^{a} = 1 - T_{a}^{a+1} - T_{a}^{a-1}$

Since we only replace one individual at a time, this covers all the possible transitions, and keeps the population constant.

We’d like to analyze this model and many people have come up with clever ways to do so, computing quantities like fixation probabilities (also known as absorption probabilities), indicating the chance that the population will end up with one type completely dominating, i.e. in state $(0, N)$ or $(N,0).$ If we assume that the fitness of type $A$ is constant and simply equal to 1, and the fitness of type $B$ is $r \neq 1,$ we can calculate the probability that a single mutant of type $B$ will take over a population of type $A$ using standard Markov chain methods:

$\displaystyle{\rho = \frac{1 - r^{-1}}{1 - r^{-N}} }$

For neutral relative fitness ($r=1$), $\rho = 1/N,$ which is the probability a neutral mutant invades by drift alone since selection is neutral. Since the two boundary states $(0, N)$ or $(N,0)$ are absorbing (no transitions out), in the long run every population ends up in one of these two states, i.e. the population is homogeneous. (This is the formulation referred to by Matteo Smerlak in The mathematical origins of irreversibility.)

That’s a bit different flavor of result than what we discussed previously, since we had stable states where both types were present, and now that’s impossible, and a bit disappointing. We need to make the population model a bit more complex to have more interesting behaviors, and we can do this in a very nice way by adding the effects of mutation. At the time of reproduction, we’ll allow either type to mutate into the other with probability $\mu.$ This changes the transition probabilities to something like

$\displaystyle{ T_a^{a+1} = \frac{a (1-\mu) f_A + b \mu f_B}{a f_A + b f_B} \frac{b}{N} }$

Now the process never stops wiggling around, but it does have something known as a stationary distribution, which gives the probability that the population is in any given state in the long run.

For populations with more than two types the basic ideas are the same, but there are more neighboring states that the population could move to, and many more states in the Markov process. One can also use more complicated mutation matrices, but this setup is good enough to typically guarantee that no one species completely takes over. For interesting behaviors, typically $\mu = 1/N$ is a good choice (there’s some biological evidence that mutation rates are typically inversely proportional to genome size).

Without mutation, once the population reached $(0,N)$ or $(N,0),$ it stayed there. Now the population bounces between states, either because of drift, selection, or mutation. Based on our stability theorems for evolutionarily stable states, it’s reasonable to hope that for small mutation rates and larger populations (less drift), the population should spend most of its time near the evolutionarily stable state. This can be measured by the stationary distribution which gives the long run probabilities of a process being in a given state.

Previous work by Claussen and Traulsen:

• Jens Christian Claussen and Arne Traulsen, Non-Gaussian fluctuations arising from finite populations: exact results for the evolutionary Moran process, Physical Review E 71 (2005), 025101.

suggested that the stationary distribution is at least sometimes maximal around evolutionarily stable states. Specifically, they showed that for a very similar model with fitness landscape given by

$\left(\begin{array}{c} f_A \\ f_B \end{array}\right) = \left(\begin{array}{cc} 1 & 2\\ 2&1 \end{array}\right) \left(\begin{array}{c} a\\ b \end{array}\right)$

the stationary state is essentially a binomial distribution centered at $(N/2, N/2).$

Unfortunately, the stationary distribution can be very difficult to compute for an arbitrary Markov chain. While it can be computed for the Markov process described above without mutation, and in the case studied by Claussen and Traulsen, there’s no general analytic formula for the process with mutation, nor for more than two types, because the processes are not reversible. Since we can’t compute the stationary distribution analytically, we’ll have to find another way to show that the local maxima of the stationary distribution are “evolutionarily stable”. We can approximate the stationary distribution fairly easily with a computer, so it’s easy to plot the results for just about any landscape and reasonable population size (e.g. $N \approx 100$).

It turns out that we can use a relative entropy minimization approach, just like for the continuous replicator equation! But how? We lack some essential ingredients such as deterministic and differentiable trajectories. Here’s what we do:

• We show that the local maxima and minima of the stationary distribution satisfy a complex balance criterion.

• We then show that these states minimize an expected relative entropy.

• This will mean that the current state and the expected next state are ‘close’.

• Lastly, we show that these states satisfy an analogous definition of evolutionary stability (now incorporating mutation).

The relative entropy allows us to measure how close the current state is to the expected next state, which captures the idea of stability in another way. This ports the relative minimization Lyapunov result to some more realistic Markov chain models. The only downside is that we’ll assume the populations are “sufficiently large”, but in practice for populations of three types, $N=20$ is typically enough for common fitness landscapes (there are lots of examples here for $N=80,$ which are prettier than the smaller populations). The reason for this is that the population state $(a,b)$ needs enough “resolution” $(a/N, b/N)$ to get sufficiently close to the stable state, which is not necessarily a ratio of integers. If you allow some wiggle room, smaller populations are still typically pretty close.

Evolutionarily stable states are closely related to Nash equilibria, which have a nice intuitive description in traditional game theory as “states that no player has an incentive to deviate from”. But in evolutionary game theory, we don’t use a game matrix to compute e.g. maximum payoff strategies, rather the game matrix defines a fitness landscape which then determines how natural selection unfolds.

We’re going to see this idea again in a moment, and to help get there let’s introduce an function called an incentive that encodes how a fitness landscape is used for selection. One way is to simply replace the quantities $a f_A(a,b)$ and $b f_B(a,b)$ in the fitness-proportionate selection ratio above, which now becomes (for two population types):

$\displaystyle{ \frac{\varphi_A(a,b)}{\varphi_A(a,b) + \varphi_B(a,b)} }$

Here $\varphi_A(a,b)$ and $\varphi_B(a,b)$ are the incentive function components that determine how the fitness landscape is used for natural selection (if at all). We have seen two examples above:

$\varphi_A(a,b) = a f_A(a, b)$

for the Moran process and fitness-proportionate selection, and

$\varphi_A(a,b) = a e^{\beta f_A(a, b)}$

for an alternative that incorporates a strength of selection term $\beta,$ preventing division by zero for fitness landscapes defined by zero-sum game matrices, such as a rock-paper-scissors game. Using an incentive function also simplifies the transition probabilities and results as we move to populations of more than two types. Introducing mutation, we can describe the ratio for incentive-proportion selection with mutation for the $i$th population type when the population is in state $x=(a,b,\ldots) / N$ as

$\displaystyle{ p_i(x) = \frac{\sum_{k=1}^{n}{\varphi_k(x) M_{i k} }}{\sum_{k=1}^{n}{\varphi_k(x)}} }$

for some matrix of mutation probabilities $M.$ This is just the probability that we get a new individual of the $i$th type (by birth and/or mutation). A common choice for the mutation matrix is to use a single mutation probability $\mu$ and spread it out over all the types, such as letting

$M_{ij} = \mu / (n-1)$

and

$M_{ii} = 1 - \mu$

Now we are ready to define the expected next state for the population and see how it captures a notion of stability. For a given state population $x$ in a multitype population, using $x$ to indicate the normalized population state $(a,b,\ldots) / N,$ consider all the neighboring states $y$ that the population could move to in one step of the process (one birth-death cycle). These neighboring states are the result of increasing a population type by one (birth) and decreasing another by one (death, possibly the same type), of course excluding cases on the boundary where the number of individuals of any type drops below zero or rises above $N.$ Now we can define the expected next state as the sum of neighboring states weighted by the transition probabilities

$E(x) = \sum_{y}{y T_x^{y}}$

with transition probabilities given by

$T_{x}^{y} = p_{i}(x) x_{j}$

for states $y$ that differ in $1/N$ at the $i$th coordinate and $-1/N$ at $j$th coordinate from $x.$ Here $x_j$ is just the probability of the random death of an individual of the $j$th type, so the transition probabilities are still just birth (with mutation) and death as for the Moran process we started with.

Skipping some straightforward algebraic manipulations, we can show that

$\displaystyle{ E(x) = \sum_{y}{y T_x^{y}} = \frac{N-1}{N}x + \frac{1}{N}p(x)}$

Then it’s easy to see that $E(x) = x$ if and only if $x = p(x),$ and that $x = p(x)$ if and only if $x_i = \varphi_i(x).$ So we have a nice description of ‘stability’ in terms of fixed points of the expected next state function and the incentive function

$x = E(x) = p(x) = \varphi(x),$

and we’ve gotten back to “no one has an incentive to deviate”. More precisely, for the Moran process

$\varphi_i(x) = x_i f_i(x)$

and we get back $f_i(x) = f_j(x)$ for every type. So we take $x = \varphi(x)$ as our analogous condition to an evolutionarily stable state, though it’s just the ‘no motion’ part and not also the ‘stable’ part. That’s what we need the stationary distribution for!

To turn this into a useful number that measures stability, we use the relative entropy of the expected next state and the current state, in analogy with the Lyapunov theorem for the replicator equation. The relative entropy

$\displaystyle{ D(x, y) = \sum_i x_i \ln(x_i) - y_i \ln(x_i) }$

has the really nice property that $D(x,y) = 0$ if and only if $x = y,$ so we can use the relative entropy $D(E(x), x)$ as a measure of how close to stable any particular state is! Here the expected next state takes the place of the ‘evolutionarily stable state’ in the result described last time for the replicator equation.

Finally, we need to show that the maxima (and minima) of of the stationary distribution are these fixed points by showing that these states minimize the expected relative entropy.

Seeing that local maxima and minima of the stationary distribution minimize the expected relative entropy is a more involved, so let’s just sketch the details. In general, these Markov processes are not reversible, so they don’t satisfy the detailed-balance condition, but the stationary probabilities do satisfy something called the global balance condition, which says that for the stationary distribution $s$ we have that

$s_x \sum_{x}{T_x^{y}} = \sum_{y}{s_y T_y^{x}}$

When the stationary distribution is at a local maximum (or minimum), we can show essentially that this implies (up to an $\epsilon,$ for a large enough population) that

$\displaystyle{\sum_{x}{T_x^{y}} = \sum_{y}{T_y^{x}} }$

a sort of probability inflow-outflow equation, which is very similar to the condition of complex balanced equilibrium described by Manoj Gopalkrishnan in this Azimuth post. With some algebraic manipulation, we can show that these states have $E(x)=x.$

Now let’s look again at the figures from the start. The first shows the vector field of the replicator equation:

You can see rest points at the center, on the center of each boundary edge, and on the corner points. The center point is evolutionarily stable, the center points of the boundary are semi-stable (but stable when the population is restricted to a boundary simplex), and the corner points are unstable.

This one shows the stationary distribution for a finite population model with a Fermi incentive on the same landscape, for a population of size 80:

A fixed population size gives a partitioning of the simplex, and each triangle of the partition is colored by the value of the stationary distribution. So you can see that there are local maxima in the center and on the centers of the triangle boundary edges. In this case, the size of the mutation probability determines how much of the stationary distribution is concentrated on the center of the simplex.

This shows one-half of the Euclidean distance squared between the current state and the expected next state:

And finally, this shows the same thing but with the relative entropy as the ‘distance function':

As you can see, the Euclidean distance is locally minimal at each of the local maxima and minima of the stationary distribution (including the corners); the relative entropy is only guaranteed so on the interior states (because the relative entropy doesn’t play nicely with the boundary, and unlike the replicator equation, the Markov process can jump on and off the boundary). It turns out that the relative Rényi entropies for $q$ between 0 and 1 also work just fine, but for the large population limit (the replicator dynamic), the relative entropy is the somehow the right choice for the replicator equation (has the derivative that easily gives Lyapunov stability), which is due to the connections between relative entropy and Fisher information in the information geometry of the simplex. The Euclidean distance is the $q=0$ case and the ordinary relative entropy is $q=1.$

As it turns out, something very similar holds for another popular finite population model, the Wright–Fisher process! This model is more complicated, so if you are interested in the details, check out our paper, which has many nice examples and figures. We also define a process that bridges the gap between the atomic nature of the Moran process and the generational nature of the Wright–Fisher process, and prove the general result for that model.

Finally, let’s see how the Moran process relates back to the replicator equation (see also the appendix in this paper), and how we recover the stability theory of the replicator equation. We can use the transition probabilities of the Moran process to define a stochastic differential equation (called a Langevin equation) with drift and diffusion terms that are essentially (for populations with two types:

$\mathrm{Drift}(x) = T^{+}(x) - T^{-}(x)$

$\displaystyle{ \mathrm{Diffusion}(x) = \sqrt{\frac{T^{+}(x) + T^{-}(x)}{N}} }$

As the population size gets larger, the diffusion term drops out, and the stochastic differential equation becomes essentially the replicator equation. For the stationary distribution, the variance (e.g. for the binomial example above) also has an inverse dependence on $N,$ so the distribution limits to a delta-function that is zero except for at the evolutionarily stable state!

What about the relative entropy? Loosely speaking, as the population size gets larger, the iteration of the expected next state also becomes deterministic. Then the evolutionarily stable states is a fixed point of the expected next state function, and the expected relative entropy is essentially the same as the ordinary relative entropy, at least in a neighborhood of the evolutionarily stable state. This is good enough to establish local stability.

Earlier I said both the local maxima and minima minimize the expected relative entropy. Dash and I haven’t proven that the local maxima always correspond to evolutionarily stable states (and the minima to unstable states). That’s because the generalization of evolutionarily stable state we use is really just a ‘no motion’ condition, and isn’t strong enough to imply stability in a neighborhood for the deterministic replicator equation. So for now we are calling the local maxima stationary stable states.

We’ve also tried a similar approach to populations evolving on networks, which is a popular topic in evolutionary graph theory, and the results are encouraging! But there are many more ‘states’ in such a process, since the configuration of the network has to be taken into account, and whether the population is clustered together or not. See the end of our paper for an interesting example of a population on a cycle.