Operads and the Tree of Life

This week Lisa and I are visiting her 90-year-old mother in Montréal. Friday I’m giving a talk at the Université du Québec à Montréal. The main person I know there is André Joyal, an expert on category theory and algebraic topology. So, I decided to give a talk explaining how some ideas from these supposedly ‘pure’ branches of math show up in biology.

My talk is called ‘Operads and the Tree of Life’.

Trees

In biology, trees are very important:

So are trees of a more abstract sort: phylogenetic trees describe the history of evolution. The biggest phylogenetic tree is the ‘Tree of Life’. It includes all the organisms on our planet, alive now or anytime in the past. Here’s a rough sketch of this enormous tree:

Its structure is far from fully understood. So, biologists typically study smaller phylogenetic trees, like this tree of dog-like species made by Elaine Ostrander:

Abstracting still further, we can also think of a tree as a kind of purely mathematical structure, like this:

Trees are important in combinatorics, but also in algebraic topology. The reason is that in algebraic topology we get pushed into studying spaces equipped with enormous numbers of operations. We’d get hopelessly lost without a good way of drawing these operations. We can draw an operation f with n inputs and one output as a little tree like this:

We can also draw the various ways of composing these operations. Composing them is just like building a big tree out of little trees!

An operation with n inputs and one output is called an n-ary operation. In the late 1960s, various mathematicians including Boardmann and Vogt realized that spaces with tons of n-ary operations were crucial to algebraic topology. To handle all these operations, Peter May invented the concept of an operad. This formalizes the way operations can be drawn as trees. By now operads are a standard tool, not just in topology, but also in algebraic geometry, string theory and many other subjects.

But how do operads show up in biology?

When attending a talk by Susan Holmes on phylogenetic trees, I noticed that her work on phylogenetic trees was closely related to a certain operad. And when I discussed her work here, James Griffin pointed out that this operad can be built using a slight variant of a famous construction due to Boardman and Vogt: their so-called ‘W construction’!

I liked the idea that trees and operads in topology might be related to phylogenetic trees. And thinking further, I found that the relation was real, and far from a coincidence. In fact, phylogenetic trees can be seen as operations in a certain operad… and this operad is closely related to the way computational biologists model DNA evolution as a branching sort of random walk.

That’s what I’d like to explain now.

I’ll be a bit sketchy, because I’d rather get across the basic ideas than the technicalities. I could even be wrong about some fine points, and I’d be glad to talk about those in the comments. But the overall picture is solid.

Phylogenetic trees

First, let’s ponder the mathematical structure of a phylogenetic tree. First, it’s a tree: a connected graph with no circuits. Second, it’s a rooted tree, meaning it has one vertex which is designated the root. And third, the leaves are labelled.

I should explain the third part! For any rooted tree, the vertices with just one edge coming out of them are called leaves. If the root is drawn at the bottom of the tree, the leaves are usually drawn at the top. In biology, the leaves are labelled by names of species: these labels matter. In mathematics, we can label the leaves by numbers 1, 2, \dots, n, where n is the number of leaves.

Summarizing all this, we can say a phylogenetic tree should at least be a leaf-labelled rooted tree.

That’s not all there is to it. But first, a comment. When you see a phylogenetic tree drawn by a biologist, it’ll pretty much always a binary tree, meaning that as we move up any edge, away from the root, it either branches into two new edges or ends in a leaf. The reason is that while species often split into two as they evolve, it is less likely for a species to split into three or more new species all at once.

So, the phylogenetic trees we see in biology are usually leaf-labeled rooted binary trees. However, we often want to guess such a tree from some data. In this game, trees that aren’t binary become important too!

Why? Well, here another fact comes into play. In a phylogenetic tree, typically each edge can be labeled with a number saying how much evolution occurred along that edge. But as this number goes to zero, we get a tree that’s not binary anymore. So, we think of non-binary trees as conceptually useful ‘borderline cases’ between binary trees.

So, it’s good to think about phylogenetic trees that aren’t necessarily binary… and have edges labelled by numbers. Let’s make this into a formal definition:

Definition A phylogenetic tree is a leaf-labeled rooted tree where each edge not touching a leaf is labeled by a positive real number called its length.

By the way, I’m not claiming that biologists actually use this definition. I’ll write \mathrm{Phyl}_n for the set of phylogenetic trees with n leaves. This becomes a topological space in a fairly obvious way, where we can trace out a continuous path by continuously varying the edge lengths of a tree. But when some edge lengths approach zero, our graph converges to one where the vertices at ends of these edges ‘fuse into one’, leaving us with a graph with fewer vertices.

Here’s an example for you to check your understanding of what I just said. With the topology I’m talking about, there’s a continuous path in \mathrm{Phyl}_4 that looks like this:

These trees are upside-down, but don’t worry about that. You can imagine this path as a process where biologists slowly change their minds about a phylogenetic tree as new data dribbles in. As they change their minds, the tree changes shape in a continuous way.

For more on the space of phylogenetic trees, see:

• Louis Billera, Susan Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733-767.

Operads

How are phylogenetic trees related to operads? I have three things to say about this. First, they are the operations of an operad:

Theorem 1. There is an operad called the phylogenetic operad, or \mathrm{Phyl}, whose space of n-ary operations is \mathrm{Phyl}_n.

If you don’t know what an operad is, I’d better tell you now. They come in different flavors, and technically I’ll be using ‘symmetric topological operads’. But instead of giving the full definition, which you can find on the nLab, I think it’s better if I sketch some of the key points.

For starters, an operad O consists of a topological space O_n for each n = 0,1,2,3 \dots. The point in O_n are called the n-ary operations of O. You can visualize an n-ary operation f \in O_n as a black box with n input wires and one output wire:

Of course, this also looks like a tree.

We can permute the inputs of an n-ary operation and get a new n-ary operation, so we have an action of the permutation group S_n on O_n. You visualize this as permuting input wires:

More importantly, we can compose operations! If we have an n-ary operation f, and n more operations, say g_1, \dots, g_n, we can compose f with all the rest and get an operation called

f \circ (g_1, \dots, g_n)

Here’s how you should imagine it:

Composition and permutation must obey some laws, all of which are completely plausible if you draw them as pictures. For example, the associative law makes a composite of composites like this well-defined:

Now, these pictures look a lot like trees. So it shouldn’t come as a shock that phylogenetic trees are the operations of some operad \mathrm{Phyl}. But let’s sketch why it’s true.

First, we can permute the ‘inputs’—meaning the labels on the leaves—of any phylogenetic tree and get a new phylogenetic tree. This is obvious.

Second, and more importantly, we can ‘compose’ phylogenetic trees. How do we do this? Simple: we glue the roots of a bunch of phylogenetic trees to the leaves of another and get a new one!

More precisely, suppose we have a phylogenetic tree with n leaves, say f. And suppose we have n more, say g_1, \dots, g_n. Then we can glue the roots of g_1, \dots, g_n to the leaves of g to get a new phylogenetic tree called

f \circ (g_1, \dots, g_n)

Third and finally, all the operad laws hold. Since these laws all look obvious when you draw them using pictures, this is really easy to show.

If you’ve been paying careful attention, you should be worrying about something now. In operad theory, we think of an operation f \in O_n as having n inputs and one output. For example, this guy has 3 inputs and one output:

But in biology, we think of a phylogenetic tree as having one input and n outputs. We start with one species (or other grouping of organisms) at the bottom of the tree, let it evolve and branch, and wind up with n of them!

In other words, operad theorists read a tree from top to bottom, while biologists read it from bottom to top.

Luckily, this isn’t a serious problem. Mathematicians often use a formal trick where they take an operation with n inputs and one output and think of it as having one input and n outputs. They use the prefix ‘co-‘ to indicate this formal trick.

So, we could say that phylogenetic trees stand for ‘co-operations’ rather than operations. Soon this trick will come in handy. But not just yet!

The W construction

Boardman and Vogt had an important construction for getting new operads for old, called the ‘W construction’. Roughly speaking, if you start with an operad O, this gives a new operad \mathrm{W}(O) whose operations are leaf-labelled rooted trees where:

1) all vertices except leaves are labelled by operations of O, and a vertex with n input edges must be labelled by an n-ary operation of O,

and

2) all edges except those touching the leaves are labelled by numbers in (0,1].

If you think about it, the operations of \mathrm{W}(O) are strikingly similar to phylogenetic trees, except that:

1) in phylogenetic trees the vertices don’t seem to be labelled by operations of a operad,

and

2) we use arbitrary nonnegative numbers to label edges, instead of numbers in (0,1].

The second point is a real difference, but it doesn’t matter much: if Boardman and Vogt had used nonnegative numbers instead of numbers in (0,1] to label edges in the W construction, it would have worked just as well. Technically, they’d get a ‘weakly equivalent’ operad.

The first point is not a real difference. You see, there’s an operad called \mathrm{Comm} which has exactly one operation of each arity. So, labelling vertices by operations of \mathrm{Comm} is a completely trivial process.

As a result, we conclude:

Theorem 2. The phylogenetic operad is weakly equivalent to \mathrm{W}(\mathrm{Comm}).

If you’re not an expert on operads (such a person is called an ‘operadchik’), you may be wondering what \mathrm{Comm} stands for. The point is that operads have ‘algebras’, where the abstract operations of the operad are realized as actual operations on some topological space. And the algebras of \mathrm{Comm} are precisely commutative topological monoids: that is, topological spaces equipped with a commutative associative product!

Branching Markov processes and evolution

By now, if you haven’t fallen asleep, you should be brimming with questions, such as:

1) What does it mean that phylogenetic trees are the operations of some operad \mathrm{Phyl}? Why should we care?

2) What does it mean to apply the W construction to the operad \mathrm{Comm}? What’s the significance of doing this?

3) What does it mean that \mathrm{Phyl} is weakly equivalent to \mathrm{W}(\mathrm{Comm})? You can see the definition of weak equivalence here, but it’s pretty technical, so it needs some explanation.

The answers to questions 2) and 3) take us quickly into fairly deep waters of category theory and algebraic topology—deep, that is, if you’ve never tried to navigate them. However, these waters are well-trawled by numerous experts, and I have little to say about questions 2) and 3) that they don’t already know. So given how long this talk already is, I’ll instead try to answer question 1). This is where some ideas from biology come into play.

I’ll summarize my answer in a theorem, and then explain what the theorem means:

Theorem 3. Given any continuous-time Markov process on a finite set X, the vector space V whose basis is X naturally becomes a coalgebra of the phylogenetic operad.

Impressive, eh? But this theorem is really just saying that biologists are already secretly using the phylogenetic operad.

Biologists who try to infer phylogenetic trees from present-day genetic data often use simple models where the genotype of each species follows a ‘random walk’. Also, species branch in two at various times. These models are called Markov models.

The simplest Markov model for DNA evolution is the Jukes–Cantor model. Consider a genome of fixed length: that is, one or more pieces of DNA having a total of N base pairs. For example, this tiny genome has N = 4 base pairs, just enough to illustrate the 4 possible choices, which are called A, T, C and G:

Since there are 4 possible choices for each base pair, there are 4^N possible genotypes with N base pairs. In the human genome, N is about 3 \times 10^9. So, there are about

4^{3 \times 10^9} \approx 10^{1,800,000,000}

genotypes of this length. That’s a lot!

As time passes, the Jukes–Cantor model says that the human genome randomly walks through this enormous set of possibilities, with each base pair having the same rate of randomly flipping to any other base pair.

Biologists have studied many ways to make this model more realistic in many ways, but in a Markov model of DNA evolution we’ll typically have some finite set X of possible genotypes, together with some random walk on this set. But the term ‘random walk’ is a bit imprecise: what I really mean is a ‘continuous-time Markov process’. So let me define that.

Fix a finite set X. For each time t \in [0,\infty) and pair of points i, j in X, a continuous-time Markov process gives a number T_{ij}(t) \in [0,1] saying the probability that starting at the point i at time zero, the random walk will go to the point j at time t. We can think of these numbers as forming an X \times X square matrix T(t) at each time t. We demand that four properties hold:

1) T(t) depends continuously on t.

2) For all s, t we have T(s) T(t) = T(s + t).

3) T(0) is the identity matrix.

4) For all j and t we have:

\sum_{i \in X} T_{i j}(t) = 1.

All these properties make a lot of sense if you think a bit, though condition 2) says that the random walk does not change character with the passage of time, which would be false given external events like, say, ice ages. As far as math jargon goes, conditions 1)-3) say that T is a continuous one-parameter semigroup, while condition 4) together with the fact that T_{ij}(t) \in [0,1] says that at each time, T(t) is a stochastic matrix.

Let V be the vector space whose basis is X. To avoid getting confused, let’s write e_i for the basis vector corresponding to i \in X. Any probability distribution on X gives a vector in V. Why? Because it gives a probability \psi_i for each i \in X, and we can think of these as the components of a vector \psi \in V.

Similarly, for any time t \in [0,\infty), we can think of the matrix T(t) as a linear operator

T(t) : V \to V

So, if we start with some probability distribution \psi of genotypes, and let them evolve for a time t according to our continuous-time Markov process, by the end the probability distribution will be T(t) \psi.

But species do more than evolve this way: they also branch! A phylogenetic tree describes a way for species to evolve and branch.

So, you might hope that any phylogenetic tree f \in \mathrm{Phyl}_n gives a ‘co-operation’ that takes one probability distribution \psi \in V as input and returns n probability distributions as output.

That’s true. But these n probability distributions will be correlated, so it’s better to think of them as a single probability distribution on the set X^n. This can be seen as a vector in the vector space V^{\otimes n}, the tensor product of n copies of V.

So, any phylogenetic tree f \in \mathrm{Phyl}_n gives a linear operator from V to V^{\otimes n}. We’ll call it

T(f) : V \to V^{\otimes n}

because we’ll build it starting from the Markov process T.

Here’s a sketch of how we build it—I’ll give a more precise account in the next and final section. A phylogenetic tree is made of a bunch of vertices and edges. So, I just need to give you an operator for each vertex and each edge, and you can compose them and tensor them to get the operator T(f):

1) For each vertex with one edge coming in and n coming out:

we need an operator

V \to V^{\otimes n}

that describes what happens when one species branches into n species. This operator takes the probability distribution we put in and makes n identical and perfectly correlated copies. To define this operator, we use the fact that the vector space V has a basis e_i labelled by the genotypes i \in X. Here’s how the operator is defined:

e_i \mapsto e_i \otimes \cdots \otimes e_i \in V^{\otimes n}

2) For each edge of length t, we need an operator that describes a random walk of length t. This operator is provided by our continuous-time Markov process: it’s

T(f) : V \to V

And that’s it! By combining these two kinds of operators, one for ‘branching’ and one for ‘random walking’, we get a systematic way to take any phylogenetic tree f \in \mathrm{Phyl}_n and get an operator

T(f) : V \to V^{\otimes n}

In fact, these operators T(f) obey just the right axioms to make V into what’s called a ‘coalgebra’ of the phylogenetic operad. But to see this—that is, to prove Theorem 3—it helps to use a bit more operad technology.

The proof

I haven’t even defined coalgebras of operads yet. And I don’t think I’ll bother. Why not? Well, while the proof of Theorem 3 is fundamentally trivial, it’s sufficiently sophisticated that only operadchiks would enjoy it without a lengthy warmup. And you’re probably getting tired by now.

So, to most of you reading this: bye! It was nice seeing you! And I hope you sensed the real point of this talk:

Some of the beautiful structures used in algebraic topology are also lurking in biology. These structures may or may not be useful in biology… but we’ll never know if we don’t notice them and say what they are! So, it makes sense for mathematicians to spend some time looking for them.

Now, let me sketch a proof of Theorem 3. It follows from a more general theorem:

Theorem 4. Suppose V is an object in some symmetric monoidal topological category C. Suppose that V is equipped with an action of the additive monoid [0,\infty). Suppose also that V is a cocommutative coalgebra. Then V naturally becomes a coalgebra of the phylogenetic operad.

How does this imply Theorem 3? In Theorem 3, C is the category of finite-dimensional real vector space. The action of [0,\infty) on V is the continuous-time Markov process. And V becomes a cocommutative coalgebra because it’s a vector space with a distinguished basis, namely the finite set X. This makes V into a cocommutative coalgebra in the usual way, where the comultiplication:

\Delta: V \to V \otimes V

‘duplicates’ basis vectors:

\Delta : e_i \mapsto e_i \otimes e_i

while the counit:

\epsilon : V \to \mathbb{R}

‘deletes’ them:

\epsilon : e_i \to 1

These correspond to species splitting in two and species going extinct, respectively. (Biologists trying to infer phylogenetic trees often ignore extinction, but it’s mathematically and biologically natural to include it.) So, all the requirements are met to apply Theorem 4 and make V into coalgebra of the phylogenetic operad.

But how do we prove Theorem 4? It follows immediately from Theorem 5:

Theorem 5. The phylogenetic operad \mathrm{Phyl} is the coproduct of the operad \mathrm{Comm} and the additive monoid [0,\infty), viewed as an operad with only 1-ary operations.

Given how coproducts works, this means that an algebra of both \mathrm{Comm} and [0,\infty) is automatically an algebra of \mathrm{Phyl}. In other words, any commutative algebra with an action of [0,\infty) is an algebra of \mathrm{Phyl}. Dualizing, it follows that any cocommutative coalgebra with an action of [0,\infty) is an coalgebra of \mathrm{Phyl}. And that’s Theorem 4!

But why is Theorem 5 true? First of all, I should emphasize that the idea of using it was suggested by Tom Leinster in our last blog conversation on the phylogenetic operad. And in fact, Tom proved a result very similar to Theorem 5 here:

• Tom Leinster, Coproducts of operads, and the W-construction, 14 September 2000.

He gives an explicit description of the coproduct of an operad O and a monoid, viewed as an operad with only unary operations. He works with non-symmetric, non-topological operads, but his ideas also work for symmetric, topological ones. Applying his ideas to the coproduct of \mathrm{Comm} and [0,\infty), we see that we get the phylogenetic operad!

And so, phylogenetic trees turn out to be related to coproducts of operads. Who’d have thought it? But we really don’t have as many fundamentally different ideas as you might think: it’s hard to have new ideas. So if you see biologists and algebraic topologists both drawing pictures of trees, you should expect that they’re related.

42 Responses to Operads and the Tree of Life

  1. Wow John, It’s me Jake, from CQT, I’m in Waterloo Ontario right now — I thought for a second you got on a plane to track me to down to get back to work on those rabbit pictures. I’ll be back in Singapore in a month. Don’t forget about me.

    I wanted to mention to others something that we noticed when you were teaching me stuff at CQT some time ago. You explained to me what an operad was, even drew a slick table showing how these tie to other structures. I have that table still. The two things I found the most interesting (warning seemingly way off topic from the post) are:

    * In classical switching function theory, operads are called read-once formula. These are precisely the class of switching functions built from gate sets with no fan-in — that is, the gate set can be composed only to build a tree.

    * In quantum network theory, they correspond to MERA networks, that can be contracted efficiently. Each of the three leg tensors (in this case) is an isomentry. To measure anything in quantum mechanics you take an inner product, when you take an inner product, you contract a state (here represented as a tree) with a conjugated copy of itself. Each of the three legged tensors will vanish when contracted to its isometric pair. This means the network can be contracted, and hence values can be measured efficiently.

    * And from this, we can conclude that read-once formula are necessarily satisfiable. That means they correspond to constraint equations that can be satisfied.

    This is a bit off topic, but every time I hear about operads I think about these cases. cheers!

    • John Baez says:

      Hi, Jake!

      You may think I’ve been slacking off on our network theory project. It’s sort of true. But I claim that this operad stuff is just another aspect of that project. As usual, I’m taking ideas from quantum theory and adapting them to stochastic processes. Amplitudes become probabilities, unitary matrices become stochastic matrices, L^2 becomes L^1, and so on. Even the Fock space idea shows up, though I didn’t make that explicit here.

      In particular, the pictures of trees here can be seen as Feynman diagrams! But these are a bit different than the ones we’ve been talking about. The edges are ‘propagators’ where time evolution is described by the continuous-time Markov process T(t). The vertices are ‘interactions’, but only of a very limited sort: the vector space V = L^1(X) is a cocommutative coalgebra, so it comes with a ‘duplication’ operator

      \Delta : V \to V \otimes V

      and a ‘deletion’ operator

      \epsilon : V \to \mathbb{R}

      These correspond to ‘speciation’ (where a species splits in two) and ‘extinction’.

      I haven’t completely figured out how this new stuff is connected to stochastic Petri nets, but it’s clearly part of the same story.

  2. Dan says:

    This may seem a bit nit-picky when you’re obviously more concerned with the big picture, but I’m confused so I’ll go ahead and ask and hope you’ll indulge me in what are probably stupid questions. See, I had to deal with C0-semigroups pretty extensively in a past life, but I was never fortunate enough to have ones that were either Markov or acting on a finite dimensional space, so I’m struggling a bit with your condition 4 above, namely, \sum_{i \in X} T_{ij} (t)=1 for all j and t, where T_{ij}(t) is defined to be the probability of being at j if you started at i at t=0. If I try to put this into words, I get something like: “If you find yourself at j at time t, then you must have come from from some i at time 0, and this is true for all j.” Is that a fair summary?

    Personally, I tend to think in terms of initial conditions with C0-semigroups, so I would expect the condition \sum_{j \in X} T_{ij} (t)=1 for all i and t to hold, which I would summarize as “If you start at i, then you must end up at some j at time t, and this is true for all i and t.” Does this condition follow from the other in some way? Or is it meant to be a trivial consequence of your definition of T_{ij}(t) as “the probability that starting at the point i at time zero, the random walk will go to the point j at time t?”

    Thanks.
    Dan.

    • John Baez says:

      Dan wrote:

      I’m struggling a bit with your condition 4 above, namely, \sum_{i \in X} T_{ij} (t)=1 for all j and t, where T_{ij}(t) is defined to be the probability of being at j if you started at i at t=0.

      Whoops—you caught a typo. I switched around an i and a j here. Actually for me T_{ij}(t) is defined to be the probability of being at i if you started at j at t=0.

      Some other people may use the opposite convention—this might add to your confusion.

      I’ll fix this. Thanks!

      Does this condition follow from the other in some way?

      No, these conditions are independent.

      People like to talk about left stochastic matrices, which are matrices of nonnegative numbers where the columns sum to 1, and also right stochastic matrices, which are matrices of nonnegative numbers where the rows sum to 1.

      One of these describes random processes where starting at any point you have a total probability 1 of going to some point or other.

      The other describes random processes where ending at any point you have a total probability 1 of coming from some point or other.

      However, which is which depends on a convention, namely whether you multiply column vectors by matrices on the left, or row vectors by matrices on the right.

      In this talk I’m interested in processes where starting at any point you have a total probability 1 of going to some point or other. And, I’m multiplying column vectors by matrices on the left.

      So, for all I care, there could be genotypes that our random walk will never get to—that’s okay. But every genotype has to go somewhere as it randomly walks around.

      A matrix that’s both left and right stochastic is called ‘doubly stochastic’.

      This is a good intro:

      Stochastic matrix, Wikipedia.

  3. Hi John,

    Thanks for the enlightening discussion about operads and their applications to phylogenetic trees. I see several theorems about the correspondence between operads and phylogenetic trees, but I would really be interested in how the correspondence is used. I.e., is there a theorem that is stated only in terms of general operads, which we can pull through the correspondence, to get a theorem stated only in terms of phylogenetic trees and Markov processes?

    You may be interested in the book Algebraic Statistics for Computational Biology and in particular in this review: “The Mathematics of Phylogenomics”.

  4. John Baez says:

    Ruchira wrote:

    I would really be interested in how the correspondence is used.

    Me too! But I just made it up; I haven’t used it for anything.

    I.e., is there a theorem that is stated only in terms of general operads, which we can pull through the correspondence, to get a theorem stated only in terms of phylogenetic trees and Markov processes?

    I don’t know yet—I haven’t gotten that far. Basically what happened is that I listened to Susan Holmes talk about the geometry of the space of n-leaved phylogenetic trees, and realized that these spaces form an operad. It seemed awfully familiar from my studies of algebraic topology. Then James Griffin and Tom Leinster figured out exactly how an algebraic topologist, or operad theorist, would think about this operad. Operads have ‘algebras’, so I then started wondering what the algebras of this operad were. I soon realized that the coalgebras of this operad were the branching Markov processes beloved by computational biologists working on phylogenetic trees! And that’s where I am now.

    In short, so far I’ve noticed that computational biologists are using some quite interesting operads without knowing it. I don’t know yet if it will help them to know it! But I think these connections are always worth noting. If X is an example of a Y, I think it’s always worth pointing it out, unless it’s so obvious it doesn’t need pointing out. These connections have a tendency to pay off, sometimes in unexpected ways.

    It’s possible that operad theory could turn out to be useful in studying phylogenetic trees. That would be great. But it’s also possible that work on phylogenetic trees could inspire new work on operads which then gets applied to something else, like algebraic geometry or string theory! That would be cool too.

    For people who don’t know operads, this stuff I’m talking about is probably not ‘ready for prime time’. But for people who do, it’ll be a shockingly practical example of an operad.

    You may be interested in the book Algebraic Statistics for Computational Biology and in particular in this review: “The Mathematics of Phylogenomics”.

    Thanks! If I learn more about this subject and what problems people are struggling with, that’ll improve my chance of making useful connections.

  5. David Corfield says:

    Do biologists have a way to pass between trees in the enormous, but simple, space of genotypes to phylogenetic trees? Markov processes for the former may involve enormous transition matrices, but at least one could use a simple rate of genetic mutation at each site. But then interpreting that model in terms of phenotypes, speciation, etc. can’t be easy.

    • John Baez says:

      David wrote:

      Do biologists have a way to pass between trees in the enormous, but simple, space of genotypes to phylogenetic trees?

      I’m no expert on this, so take the following with a huge grain of salt, and hope that someone more knowledgeable chimes in:

      In most of what I’ve seen, they aren’t observing trees in the space of genotypes: they’re observing the genotypes at the leaves of those trees and trying to infer the rest of the tree. In other words, trying to guess the past given the present. There is a huge amount of work devoted to that. A bunch of it uses Markov process models combined with maximum likelihood or Bayesian methods to guess the best tree. They don’t usually use the whole genotype: just some parts of the DNA. There’s a lot of software for doing this kind of inference, like PhyML, FastML, RaxML and Mr. Bayes.

      And in most of the work I’ve seen, the tree that’s inferred is simply called the phylogenetic tree. One assumes the edges are taxonomically significant units (e.g. species) and the vertices are taxonomically significant branching events (e.g. speciation events).

      I don’t know if that answers your question.

      By the way, I said that mostly we observe DNA data now and try to infer the tree in the past, but one exception I know involves the HIV virus. This mutates fast enough, and has been studied carefully enough, that people have been able to watch it evolves as time passes. Viruses don’t have ‘species’ in the conventional sense of being able to interbreed, but there are medically significant branches forming in the phylogenetic tree as we watch.

      Another exception would involve situations where we have access to fossil DNA, like Neanderthal or woolly mammoth DNA.

      And by the way, the whole ‘tree’ concept is a simplification of reality. For example, humans and Neanderthals interbred, and among bacteria there’s a lot of ‘lateral transmission’ of genetic material via plasmids and the like.

      • Just a micro comment: did you notice the striking resemblance between this image (HIV phylogenetic tree, evolving as we observe) and Charles Darwin’s very first drawing of the “tree of life” in his notebook (in the Museum of Natural history in NYC)?:

        Did they do so on purpose?

        • John Baez says:

          Hey, Carlo—it’s fun to see you here!

          I don’t know they made that tree looks similar on purpose. At first I was going to say that all trees look sort of similar, but I think you’re right, there’s something more than that going on here, either coincidentally or not.

          By the way, nobody except me can post images here, but if someone includes a link to an interesting image I can edit their comment to make it show up… and that’s what I just did. Of course I’m too lazy to do this very often.

  6. Graham says:

    Here is a link to a recent conference “Phylogenetics: New Data, New Phylogenetic Challenges”

    http://www.newton.ac.uk/programmes/PLG/plgw05p.html

    The talks were recorded. A lot of them are mathematical, and some are quite abstract, but I haven’t spotted anyone using operads yet.

    • John Baez says:

      Thanks! No operads in this one, but it offers us operadchiks a lot of food for thought:

      • K. St. John, Exploring treespace.

      Abstract: Phylogenies, or evolutionary histories, play a central role in modern biology, illustrating the interrelationships between species, and also aiding the prediction of structural, physiological, and biochemical properties. The reconstruction of the underlying evolutionary history from a set of morphological characters or biomolecular sequences is difficult since the optimality criteria favored by biologists are NP-hard, and the space of possible answers is huge. The number of possible phylogenetic trees for n taxa is (2n − 5)!!. Due to the hardness and the large number of possible answers, clever searching techniques and heuristics are used to estimate the underlying tree. We explore the underlying space of trees, under different metrics, in particular the nearest-neighbor-interchange (NNI), subtree-prune-and-regraft (SPR), tree-bisection-and-reconnection (TBR), and Robinson-Foulds (RF) distances.

      The trees here have no labellings on edges, so the metrics are metric on discrete sets… but still interesting!

  7. nad says:

    John wrote:

    By the way, I said that mostly we observe DNA data now and try to infer the tree in the past, but one exception I know involves the HIV virus. This mutates fast enough, and has been studied carefully enough, that people have been able to watch it evolves as time passes.

    Interresting, do you have any documentations about that? I could imagine that something like an educational video could bring in handy all the mutations.

    • John Baez says:

      Hi, Nad. I don’t have any really nice graphics illustrating HIV evolution, but here’s a good general overview of HIV evolution:

      • Andrew Rambaut,David Posada, Keith A. Crandall and Edward C.Holmes, The causes and consequences of HIV evolution.

      My friend Chris Lee is an expert on this stuff:

      • Chris Lee, HIV positive selection mutation database.

      and this database allowed him to calculate in detail the effect of various drugs on HIV evolution.

    • Tom Leinster says:

      There’s a lot about the evolution of HIV in the opening chapter, or perhaps the introduction, of Steve Jones’s book Almost Like a Whale. (In the US it goes by the title of Darwin’s Ghost. It’s a retelling of the Origin of Species.) If you’re after a non-technical description, you might like that.

      Another virus that evolves very fast is influenza. That’s one of the things that makes vaccination difficult: the vaccines constantly have to be updated.

  8. Roger Witte says:

    Excellent post – it must have been an excellent talk too!

    I do take slight issue with your idea that Operad theorists have much to say (as yet) on issues ‘what does it mean …’

    The point being that there are two distinct interesting questions with the same English wording

    1) What are the mathematical definitions and consequences …. This is what you are thinking

    2) What does it tell us about biology? A question that we should start to address now :)

    • John Baez says:

      Roger wrote:

      Excellent post – it must have been an excellent talk too!

      Thanks! But the talk is tomorrow, so while it may be predestined for excellence, I can’t be sure of that yet.

      I do take slight issue with your idea that operad theorists have much to say (as yet) on issues ‘what does it mean …’

      The point being that there are two distinct interesting questions with the same English wording:

      1) What are the mathematical definitions and consequences … This is what you are thinking.

      Actually I was thinking about the mathematical meaning, which is not always easily apparent, even if one has seen the definitions and theorems. I distinguish between being able to state results and having them fully integrated into ones way of thinking. Only the latter lets one make significant progress.

      In particular, for these questions:

      2) What does it mean to apply the W construction to the operad \mathrm{Comm}? What’s the significance of doing this?

      3) What does it mean that \mathrm{Phyl} is weakly equivalent to \mathrm{W}(\mathrm{Comm})? You can see the definition of weak equivalence here, but it’s pretty technical, so it needs some explanation.

      it takes a nontrivial change of worldview to appreciate why the W construction and weak equivalence are important and inevitable notions. They’re both aspects of the ‘homotopification’ of mathematics, where instead of demanding that equations hold, one demands that equations hold ‘up to coherent homotopy’. Operad theorists have a lot to say about what this means and why it’s important.

      But yes, then there’s another layer: what does all this have to with biology? I doubt anyone knows.

  9. Hey John,

    I am thinking that your statement “Phyl is weakly equivalent to W(Comm)” is way weaker than what you would want to say. This is tantamount to saying “Phyl is weakly equivalent to the point.”

    While true, it seems to me that the whole point you would like to make is that Phyl is actually isomorphic to W(Comm) (degreewise homeomorphic).

    • John Baez says:

      Urs wrote:

      I am thinking that your statement “Phyl is weakly equivalent to W(Comm)” is way weaker than what you would want to say. This is tantamount to saying “Phyl is weakly equivalent to the point.”

      Hmm, I guess you’re right. Let me make sure I understand. Is every topological operad whose space of n-ary operations is contractible weakly equivalent to the terminal topological operad? The action of S_n on the space of n-ary operations is irrelevant here?

      While true, it seems to me that the whole point you would like to make is that Phyl is actually isomorphic to W(Comm) (degreewise homeomorphic).

      I’m afraid something slightly weaker is true. The lengths of edges in the trees that give operations of W(Comm) are positive real numbers less than or equal to 1, while for Phyl they are positive real numbers. Since (0,1] isn’t homeomorphic to (0,∞), I’m afraid Phyl isn’t degreewise homeomorphic to W(Comm). They’re “almost homeomorphic” in some way that’s hard for me to state clearly.

      • John Baez says:

        As spaces, \mathrm{W}(\mathrm{Comm})_n is some sort of compactification of \mathrm{Phyl}_n.

      • Is every topological operad whose space of n-ary operations is contractible weakly equivalent to the terminal topological operad?

        Yes. There is necessarily a unique morphism to Comm and this is by assumption then a degreewise a weak homotopy equivalence of topological spaces. By this theorem these are weak equivalences of topological operads.

        The action of S_n on the space of n-ary operations is irrelevant here?

        The action needs to be respected by the morphism of course, but for morphisms to Comm this is automatic. But given a morphism, its property of being a weak equivalence is just that it is degreewise a weak homotopy equivalence of topological spaces.

        Notice that every topological operad which is 1. degreewise contractible and 2. “Sigma cofibrant” (has free permutation group action) is an E_\infty operad.

        while for Phyl they are positive real numbers

        Oh, okay. Right, so it slightly fails to be isomorphic to W(Comm) then. But not in a very interesting way. As you say in the second message: you could just add \infty as an admissible tree length, if you wanted to “fix” this.

      • John Baez says:

        Urs wrote:

        As you say in the second message: you could just add \infty as an admissible edge length, if you wanted to “fix” this.

        When it comes to biology, adding \infty as an admissible edge length for phylogenetic trees amounts to requiring that our Markov process settles down to some limit as t \to \infty. More precisely: an algebra of the phylogenetic operad extends to an algebra of the larger operad that allows infinite edge lengths iff \lim_{t \to \infty} T(t) \Psi exists for all \Psi, where T is the corresponding Markov process.

        This isn’t true for all Markov processes, but it is for the ones that show up in Markov models of DNA evolution. So that’s okay. Then there’s another little nuance…

        Boardman and Vogt’s W construction requires choosing a way to make [0,1] into a topological monoid. I don’t have their book on me, so I don’t remember how they do it, but other sources seem to suggest two methods. One is:

        x \ast y = x + y - xy

        and the other is:

        x \circ y = x \, \mathrm{max} \, y

        The second one has the small advantage of being obviously associative, but the first is also associative (I just checked!), and it seems to have a bigger advantage, as far as I’m concerned.

        Namely, I believe (but haven’t bothered to check) that the \ast operation makes [0,1] isomorphic, as a topological monoid, to [0,\infty] with its usual addition (defined so that \infty + x = x + \infty = \infty).

        If so, we can embed [0,\infty) with its usual addition as a dense submonoid of [0,1] with the \ast operation.

        And if so, this should make the phylogenetic operad into a dense suboperad of \mathrm{W}(\mathrm{Comm}), where the inclusion is a homotopy equivalence.

        That’s a fairly crisp statement of how they’re ‘almost isomorphic’.

        And of course even if Boardman and Vogt’s way of making the closed interval into a topological monoid doesn’t make it isomorphic to [0,\infty] with its usual addition, such a way exists.

        • Todd Trimble says:

          John wrote:

          I believe (but haven’t bothered to check) that the \ast operation makes [0,1] isomorphic, as a topological monoid, to [0,\infty] with its usual addition

          We have

          \begin{aligned} x \ast y &=& 1 - (1-x)(1-y) \\ &=& \phi^{-1}(\phi(x)\phi(y)) \end{aligned}

          where \phi(x) = 1-x. So \phi is a homomorphism from \ast on [0, 1] to ordinary multiplication on [0, 1]. And [0, 1] under multiplication is isomorphic to [0, \infty] under addition via x \mapsto -\log(x). So, you were right.

        • John Baez says:

          Great, thanks!!!

  10. John Baez says:

    My talk seemed to go well. Afterwards, André Joyal made some interesting remarks. I’d like to write them down here so I don’t forget them.

    First, a nice simple observation. He said that the phylogenetic operad formalizes a notion of time, ‘branching time’, which marches forward like the usual notion of time (described by the real numbers) except for certain points where it splits.

    A species can split in two and evolve in two different ways. We wondered in what other situations the concept of ‘branching time’ could apply. An obvious candidate is the branching scenario in some versions of the ‘many-worlds interpretation’ of quantum mechanics, but I don’t know how to make this precise.

    More technically, we talked about how the phylogenetic operad is related to Boardman and Vogt’s W construction.

    My talk discussed the coproduct of the operads \mathrm{Comm} and [0,\infty). More generally, we can take the coproduct O + M of any operad O and any monoid M. The operations in O + M can be drawn as (roughly) trees with vertices labelled by operations in O and edges labelled by elements of M.

    If we take M = [0,\infty) this gives the edges nonnegative real ‘lengths’.

    But we can also take M = \mathbb{N}. Then the edges have integral lengths. We can draw this by marking each edge with a number of ‘ticks’: little marks representing the ticks of a clock.

    The operad O + \mathbb{N} is interesting because it’s the operad freely generated by the operad O and an extra unary operation. Let’s call this unary operation \mathbf{tick}. In his book on algebraic set theory with Moerdijk, Joyal considered the monad generated by a monad and extra unary operation. This idea was previously studied by Bénabou and Jibladze.

    Moving in this circle of ideas, we saw that the operad O + \mathbb{N} contains a copy of O but also two copies of the free operad on the underlying collection of operations of O.

    One way to get the latter is to take each operation f of O and postcomposing it with the operation \mathbf{tick}. The resulting operations

    \mathbf{tick} \circ f

    generate a copy of the free operad on the underlying collection of operations of O. The ticks serve as ‘parentheses’ separating the operations of O.

    Another way is to take each operation f of O and postcomposing it a bunch of copies of the operation \mathbf{tick}. The resulting operations

    f \circ (\mathbf{tick}, \dots, \mathbf{tick})

    generate a second copy of the free operad on the underlying collection of operations of O.

    The same trick works for O + [0,\infty), taking our ‘tick’ to be any positive number.

    The same trick also works for O + [0,\infty] here [0,\infty] becomes a monoid with the obvious concept of addition, such that

    x + \infty = \infty + x = \infty

    As we’ve seen, [0,\infty] is isomorphic to [0,1] with the product

    x \ast y = x + y - x y

    From what we’ve seen earlier in this conversation, it follows that the operad O + [0,\infty] is closely related to the Boardman–Vogt construction \mathrm{W}(O). I now understand this a bit better. Since

    \infty + \infty = \infty

    there’s yet another nice way that the operad O + [0,\infty] contains a copy of the free operad on the underlying set of operations of O. Namely, we take each operation f of O and both precompose and postcompose it with \infty:

    \mathbf{tick} \circ f \circ (\mathbf{tick}, \dots, \mathbf{tick})

    I believe Boardman and Vogt use this trick in their book. Of course they talk about [0,1] instead of the isomorphic monoid [0,\infty], but that doesn’t matter. What matters is that they’re sandwiching the operations in O with an idempotent unary operation that’s not in O. We need to do this to have a chance for \mathrm{W}(O) to be a cofibrant replacement of $O$.

    Tom Leinster’s discussion of the W construction here doesn’t mention this point. He describes O + [0,1] and writes:

    So the operad O + [0,1] is almost exactly the operad \mathrm{W}(O) defined by the Boardman–Vogt method. As far as I can see, the only point of difference is that in Boardman–Vogt, ‘by convention, the roots and twigs have length 1’) (Homotopy Invariant Algebraic Structures… , p. 73), where is in the coproduct they have length 0. (The element 1 of the monoid [0,1] plays no special role; the unit element is 0.)

    All this is true except that the element 1 in the monoid [0,1] (corresponding to the element \infty in [0, \infty]) does play a special role in the Boardman–Vogt construction, by virtue of being a nontrivial idempotent.

  11. Mike Shulman says:

    Very neat!

    I don’t have any suggestions for what the W-construction and weak equivalence have to do with biology, but it seems clear that the reason coproducts of operads appear is that you’re considering the process of evolution of each species, and the process of branching of species, to be totally unrelated. Right?

    Something seems a little weird, though: the behavior of evolution of a single species is controlled by the Markov process, but the behavor of branching of species is controlled by the operad. In other words, we choose a particular n-ary (co)operation in the phylogenetic operad which specifies all of the species branching that is to occur, with specified time periods in between branching events, and then the corresponding coaction morphism specifies a probability distribution over the genotypes of the resulting n species. Wouldn’t it be more realistic to also assign probabilities to the different kinds of branching that could occur?

    • John Baez says:

      Mike wrote:

      I don’t have any suggestions for what the W-construction and weak equivalence have to do with biology, but it seems clear that the reason coproducts of operads appear is that you’re considering the process of evolution of each species, and the process of branching of species, to be totally unrelated. Right?

      Right. And here “you” means not just little old me, but also biologists who are trying to infer phylogenetic trees from data about the DNA of various species we see today.

      They do things like this: fix a Markov process and then seek the phylogenetic tree (in the sense I’ve defined here) and the genotype for the species at the root of this tree that maximize the likelihood that the species at the leaves have the genotypes we see today.

      So, for example, Robert Wayne must have taken snippets of DNA from various kinds of dog-like animals and run some maximum likelihood or Bayesian algorithm to guess this tree:

      Wouldn’t it be more realistic to also assign probabilities to the different kinds of branching that could occur?

      If we were trying to simulate evolution including the branching of species, we might try that. That could be a lot of fun.

      For the applications described above, I don’t think it would be very practical. It might be practical if we had a good guess as to the probability per unit time that an organism with a given genotype would branch into two species. But since we don’t, we’d probably be forced into a very simple dumb guess, namely that the probability is some constant. And this, I believe, would not affect the results of the above sort of calculation.

      • Graham says:

        Actually biologists do assume a probability distribution on the branching patterns, and the choice can affect the results. You are right that little is known about speciation rates and extinction rates, so guesses have to be made. In a Bayesian context, the distribution can be seen as part of the prior. You hope that the molecular data will overwhelm the prior, but if the signal in the data is weak, it may not.

      • Mike Shulman says:

        I was thinking more along the lines of the Bayesian calculation inferring what sort of branching was likely to have occurred, as Graham suggests. But I think now I see that I asked the wrong question, since what people are really doing is drawing inferences about the branching. I should have said, “wouldn’t it be more realistic to also draw inferences about the Markov process (like in an HMM), rather than fixing a choice at the outset?”

        I’m guessing that some people are already trying that too. But I was having a bit of fun trying to think about how to “mix up” the two aspects operadically.

        • Graham says:

          Yes, people already do that too, and more. A lot of Bayesian methods can be characterised as “co-estimate everything”.

          The inference is like that for a HMM. The likelihood of the observed data at the leaves (the multiple sequence alignment, see http://en.wikipedia.org/wiki/Multiple_sequence_alignment) is calculated by working back recursively towards the root.

          I only have a hazy idea about operads, and here is a very hazy idea: Statistical inference often seems to work ‘backwards’, using co-operations where the model uses operations, or vice-versa.

        • Todd Trimble says:

          I was about to ask innocently, “what’s an HMM?” Then I googled it and figured it referred to this. But I guess it wouldn’t hurt to record this fact for the similarly innocent. :-)

      • John Baez says:

        Graham wrote:

        Actually biologists do assume a probability distribution on the branching patterns, and the choice can affect the results. You are right that little is known about speciation rates and extinction rates, so guesses have to be made.

        Thanks for the correction! I only know a little about phylogenomics, so it’s nice to know that when I say something wrong, there’s a chance you’ll appear and correct me.

        What branching patterns might be particularly likely and/or unlikely?

        Also, you mention ‘extinction rates’. One person told me that extinction events were completely ignored in the process of guessing a phylogenetic tree from present-day DNA data. In other words, that you can work only with trees whose branches all make it to the present, without any harm. I’ve been hoping to find some context in which this actually causes problems. Do you know about that?

        (Of course we need to think about extinction events when we also use DNA data from extinct species… but I’m not talking about that now.)

        • Graham says:

          “What branching patterns might be particularly likely and/or unlikely?”

          Trees have a topology and they can have branch lengths or node times. As far as the topology alone is concerned, the main observation is the phylogenetic trees are unbalanced. I have just added an image to

          http://www.azimuthproject.org/azimuth/show/Tree+of+life

          to show what I mean. This imbalance show up at every level down to trees with 4 leaves which is the smallest size that imbalance can occur. All the obvious mathematical models (eg a birth-death process assuming constant rates of birth and death) make more balanced trees than this.

          As far as node times are concerned it will depend on context. For example, for gene trees within a species, coalescent theory
          (http://en.wikipedia.org/wiki/Coalescent_theory) gives a useful model. Looking backward in time, the coalescences happen very quickly, then slow down. For a gene sampled from say 10 individuals, and assuming a constant population size, the expected time for the first coalescence is proportional to 1/(10*9), the additional time to the next coalescence is proportional to 1/(9*8), and so on down the the last pair to meet with expected time proportional to 1/(2*1). Looking forwards in time, this gives a tree that grows faster then exponential.

          “In other words, that you can work only with trees whose branches all make it to the present, without any harm. I’ve been hoping to find some context in which this actually causes problems. Do you know about that?”

          It can cause problems if you want to estimate dates. For example, there are 22 Crocodylla (crocodiles, alligators and gharials), and they separated from the rest of the tree of life a very long time ago (lets say 100My for the sake of argument). If you assumed there were no extinctions, you would estimate the time of the most recent common ancestor of extant Crocodylla (that is, the time of the first speciation in the phylogenetic tree for Crocodylla ) to be a very long time ago as well. Unlike the coalescence case, the expected times go like 1/21, 1/20, … 1/2, 1/1 going back in time, so – very roughly – you might estimate this time as 1/(1+1/2+…1/21) = 1/3.64 = .27 of 100My since the ancestor species separated from the rest of the tree of life, that is 73My ago. More realistically, there will have been many extinctions and quite likely there were once many more than 22 in this group. In this case, it could easily be that the most recent common ancestor of extant Crocodylla is very recent.

          It also causes problems if you know some dates from fossils and want to estimate speciation and extinction rates. This article might be a good place to start.

          “Estimating diversification rates from phylogenetic information”. By Ricklefs R E, Trends Ecol Evol. 2007.

          Click to access Ricklefs2007PhylogeneticInformation.pdf

          Finally, ‘phylogenetics’ means estimating trees from genes and ‘phylogenomics’ means estimating trees from whole genomes.

        • John Baez says:

          Thanks for all that, Graham! As you noticed, I don’t even know the difference between phylogenetics and phylogenomics. Well, I do now. But I have a lot of catching up to do.

          By the way, when I gave my talk at UQAM the fellow in charge of the combinatorics seminar, Franco Saliola, said he had been to a nice talk by the mathematical biologist Lior Pachter. Have you heard of him? His website says:

          I work on the fundamental problem of comparative genomics: the determination of the origins and evolutionary history of the nucleotides in all extant genomes. My work incorporates various aspects of genomics, including the reconstruction of ancestral genomes (paleogenomics), the modeling of genome dynamics (phylogenomics and systems biology) and the assignment of function to genome elements (functional genomics and epigenomics).

          In addition to working on algorithms and mathematical foundations for comparative genomics, I also work on genome projects and perform large scale computational analyses. I have been a member of the mouse, rat, chicken and fly genome sequencing consortia, and the ENCODE project.

          My research draws on tools from discrete mathematics, algebra and statistics. I am also interested in questions in these subjects that are motivated by biology problems.

  12. […] Operads and the Tree of Life « Azimuth. […]

  13. Mike Stay says:

    Thanks to budget cuts, the website with the canid phylogenetic tree is no longer available. The Internet Archive managed to grab a copy, however. You may want to update your post.

  14. A few years ago, after hearing Susan Holmes speak about the mathematics of phylogenetic trees, I became interested in their connection to algebraic topology. I wrote an article about this here:

    • John Baez, Operads and the tree of life, 6 July 2011.

    In trying to the make the ideas precise I recruited the help of Nina Otter, who was then a graduate student at ETH Zürich, and who is now a grad student at Oxford working on mathematical biology with Heather Harrington.

    It took us quite a while to work out all the details, and I could never have done it myself. But now we’re done! Here’s our paper:

    • John Baez and Nina Otter, Operads and phylogenetic trees.

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

This site uses Akismet to reduce spam. Learn how your comment data is processed.