## Networks and Population Biology (Part 3)

I think today I’ll focus on one aspect of the talks Susan Holmes gave today: the space of phylogenetic trees. Her talks were full of interesting applications to genetics, but I’m afraid my summary will drift off into a mathematical daydream inspired by what she said! Luckily you can see her actual talk uncontaminated by my reveries here:

• Susan Holmes, Treespace: distances between trees.

It’s based on this paper:

• Louis Billera, Susan Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733-767.

As I mentioned last time, a phylogenetic tree is something like this:

More mathematically, what we see here is a tree (a connected graph with no circuits), with a distinguished vertex called the root, and $n$ vertices of degree 1, called leaves, that are labeled with elements from some $n$-element set. We shall call such a thing a leaf-labelled rooted tree.

Now, the tree above is actually a binary tree, meaning that as we move up an edge, away from the root, it either branches into two new edges or ends in a leaf. (More precisely: each vertex that doesn’t have degree 1 has degree 3.) This makes sense in biology because while species often split into two as they evolve, it is less likely for a species to split into three all at once.

So, the phylogenetic trees we see in biology are usually leaf-labeled rooted binary trees. However, we often want to guess such a tree from some data. In this game, trees that aren’t binary become important too!

Why? Well, each edge of the tree can be labeled with a number saying how much evolution occurred along that edge: for example, how many DNA base pairs changed. But as this number goes to zero, we get a tree that’s not binary anymore. So, we think of non-binary trees as conceptually useful ‘intermediate cases’ between binary trees.

This idea immediately leads us to consider a topological space consisting of phylogenetic trees which are not necessarily binary. And at this point in the lecture I drifted off into a daydream about ‘operads’, which are a nice piece of mathematics that’s closely connected to this idea.

So, I will deviate slightly from Holmes and define a phylogenetic tree to be a leaf-labeled rooted tree where each edge is labeled by a number called its length. This length must be positive for every edge except the edge incident to the root; for that edge any nonnegative length is allowed.

Let’s write $\mathrm{Phyl}_n$ for the set of phylogenetic trees with $n$ leaves. This becomes a topological space in a fairly obvious way. For example, there’s a continuous path in $\mathrm{Phyl}_3$ that looks like this:

Moreover we have this fact:

Theorem. There is a topological operad called the phylogenetic operad, or $\mathrm{Phyl},$ whose space of n-ary operations is $\mathrm{Phyl}_n$ for $n \ge 1$ and the empty set for $n = 0$.

If you don’t know what an operad is, don’t be scared. This mainly just means that you can glue a bunch of phylogenetic trees to the top of another one and get a new phylogenetic tree! More precisely, suppose you have a phylogenetic tree with $n$ leaves, say $T$. And suppose you have $n$ more, say $T_1, \dots, T_n$. Then you can glue the roots of $T_1, \dots, T_n$ to the leaves of $T$ to get a new phylogenetic tree called $T \circ (T_1, \dots, T_n)$. Furthermore, this gluing operation obeys some rules which look incredibly intimidating when you write them out using symbols, but pathetically obvious when you draw them using pictures of trees. And these rules are the definition of an operad.

I would like to know if mathematicians have studied the operad $\mathrm{Phyl}.$ It’s closely related to Stasheff’s associahedron operad, but importantly different. Operads have ‘algebras’, and the algebras of the associahedron operad are topological spaces with a product that’s ‘associative up to coherent homotopy’. I believe algebras of the phylogenetic operad are topological spaces with a commutative product that’s associative up to coherent homotopy. Has someone studied these?

In their paper Holmes and her coauthors discuss the associahedron in relation to their own work, but they don’t mention operads. I’ve found another paper that mentions ‘the space of phylogenetic trees’:

• David Speyer and Bernd Sturmfels, The tropical Grassmannian, Adv. Geom. 4 (2004), 389–411.

but they don’t seem to study the operad aspect either.

Perhaps one reason is that Holmes and her coauthors deliberately decide to ignore the labellings on the edges incident to the leaves. So, they get a space of phylogenetic trees with $n$ leaves whose product with $(0,\infty)^n$ is the space I’m calling $\mathrm{Phyl}_n$. As they mention, this simplifies the geometry a bit. However, it’s not so nice if you want an operad that accurately describes how you can build a big phylogenetic tree from smaller ones.

They don’t care about operads; they do some wonderful things with the geometry of their space of phylogenetic trees. They construct a natural metric on it, and show it’s a CAT(0) space in the sense of Gromov. This means that the triangles in this space are more skinny than those in Euclidean space—more like triangles in hyperbolic space:

They study geodesics in this space—even though it’s not a manifold, but something more singular. And so on!

There’s a lot of great geometry here. But for Holmes, all this is just preparation for doing some genomics— for example, designing statistical tests to measure how reliable the phylogenetic trees guessed from data actually are. And for that aspect, try this:

• Susan Holmes, Statistical approach to tests involving phylogenies, in O. Gascuel, editor, Mathematics of Evolution and Phylogeny, Oxford U. Press, Oxford, 2007.

### 20 Responses to Networks and Population Biology (Part 3)

1. Doesn’t

I will deviate slightly from Holmes and define a phylogenetic tree to be a leaf-labeled rooted binary tree where each edge is labeled by a positive real number called its length. Let’s write $\mathrm{Phyl}_n$ for the set of phylogenetic trees with $n$ leaves.

clash with

So, they get a space of phylogenetic trees with $n$ leaves whose product with $\mathbb{R}^n$ is the space I’m calling $\mathrm{Phyl}_n$?

For one, isn’t it $(\mathbb{R}^+)^n$. And second, there aren’t $n$ internal lines in a tree with $n$ leaves. In fact, given that you go beyond binary trees, there’s no fixed number of internal edges for trees in $\mathrm{Phyl}_n$. In your drawn path of trees, the number changes from 7 to 6 to 7.

• John Baez says:

David wrote:

For one, isn’t it $(\mathbb{R}^+)^n$?

Whoops – I slipped into doing topology here, without warning you. The space of positive numbers is homeomorphic to $\mathbb{R}$, so $(\mathbb{R}^+)^n$ is homeomorphic to $\mathbb{R}^n$, so I was treating them as ‘the same’. I’ll fix that.

I think Holmes et al say the same thing. They’re interested in the metric space structure of the space of phylogenetic trees, not just its topology. I don’t know what metric they’d get on this factor of $(\mathbb{R}^+)^n$ if they included it, but whatever it is, it must be some fairly dull space compared to the space they’re actually studying. That’s why they ignore it.

And second, there aren’t $n$ internal lines in a tree with $n$ leaves.

Right, but there are $n$ edges that end in leaves. For their mathematical work Holmes et al don’t bother labelling those edges with numbers, because they’re irrelevant to the interesting issue of how one tree topology can turn into another as the lengths of some internal edges go to zero:

But to me, and for many biological applications, it really matters how much the DNA has changed after all the species are done branching. That’s why space they consider, times $(\mathbb{R}^+)^n$, gives the space $\mathrm{Phyl}_n$ that I’m talking about.

By the way: to get an identity operation in the phylogenetic operad we need a degenerate tree that’s just a single point, whose root also counts as its only leaf. I’m not sure the best way to finesse this, but I don’t think it’s a serious problem.

Hmm, I guess it means we should allow the edge touching the root to have a length that’s any number in $[0,\infty)$, while the lengths of all the other edges must lie in $(0,\infty) = \mathbb{R}^+$.

(I wrote the last equation to check that we’re agreeing on what $\mathbb{R}^+$ means. Not everyone does.)

• David Corfield says:

So, allowing lengths only to edges ending in leaves, when you operadically attach $n$ trees to the $n$ leaves of another tree, do you just drop the length information of the latter, as it concerns only internal edges now? Might there be an issue if the degenerate tree is added at a leaf? Would you then discard the length of the edge above?

It seems rather odd to me. If you care about the lengths of some edges, why don’t you care about all lengths? You’ll be counting as ‘the same’ some very different evolutionary histories.

• John Baez says:

David wrote:

So, allowing lengths only to edges ending in leaves…

No: I said Holmes and company don’t attach lengths to edges in leaves. I attach lengths to all the edges. So, my space of trees should not seem at all ‘odd’ to you. Theirs should seem a bit odd, because:

1) it amounts to ignoring how much evolution a species does after it’s completely done branching,

and

2) it makes it awkward to get an operad — since if we attach a new tree to the top of an old one, a species that was ‘completely done branching’ in the old tree may now branch some more, so we now care about its length.

However, they are doing that rather odd thing because:

1) they only care about the lengths of internal edges for certain specific purposes,

and

2) they don’t care getting an operad.

• David Corfield says:

I’m with you at last! So you’re on the side of operadic truth, but what difference does that make to their ability to

… do some wonderful things with the geometry of their space of phylogenetic trees. They construct a natural metric on it, and show it’s a CAT(0) space in the sense of Gromov.

I see they claim

One could also consider trees with positive lengths on all edges, including those leading to leaves. However, the effect of this on tree space is simply to take the product with an $n$-dimensional Euclidean space. Since this does not significantly affect the geometry of the space, we will ignore this, knowing that it is possible to add this information at any later point that we wish.

I love the way you head off in search of green maths, and yet keep knocking up against 2-monads, operads, and the like. A case of reaching for familiar tools, or perhaps good green maths is a subset of the highly interconnected realm of good maths, so it’s likely you’ll never stray too far from the old favourites.

• John Baez says:

David wrote:

So you’re on the side of operadic truth…

Yes, as you know I usually try to be on the side of the truth—and here the operadic truth and the biological truth coincide, which is a good sign.

Thanks for quoting the paper by Holmes et al. I’d forgotten that the factor they deliberately neglect is not just topologically $\mathbb{R}^n$, but geometrically Euclidean $\mathbb{R}^n$. That makes their neglect entirely justifiable, since they can undo it at a moment’s notice. They’re just trying to focus on the geometrically exciting aspects of treespace.

I love the way you head off in search of green maths, and yet keep knocking up against 2-monads, operads, and the like. A case of reaching for familiar tools, or perhaps good green maths is a subset of the highly interconnected realm of good maths, so it’s likely you’ll never stray too far from the old favourites.

I’m of mixed minds about it—I’m not sure I ‘love’ it. Sometimes I feel I’m stuck in a rut so deep that even my most energetic attempts to do something practical keep sinking back into the comfortable routine of useless but elegant abstraction.

However, my more considered opinion is that lots of structures developed for use in pure mathematics and hard-core ‘fundamental’ theoretical physics are in fact sufficiently general as to turn up all over the place… and when I see them, I should point them out, because it can’t hurt much, and it might even help.

I want pure mathematicians to get interested in practical things like biology in the same sort of way they’re already interested in fairly useless things like elementary particle physics. Knowing there’s an operad or two lurking in biology may help get them interested.

2. The operad has been studied and it’s actually quite special.

It can be realised as something called the W-construction, and due to Boardman-Vogt, applied to the commutative operad. Note that in the construction above you could have labelled the vertices of the trees with elements of any operad O, and when you contract an edge you compose the two operad elements in the natural way. This is called the W-construction, WO.

To address the issue of the metric labelling on the leaves; well actually it’s fine from the operad view to leave them unlabelled, because you can still compose: you just have to label the new edge by 1!

The W-construction of an operad O is useful because it’s a cofibrant replacement of O in the category of non-symmetric operads. In fact for this you need the unlabelled version.

I learnt about this from a paper of Berger and Moerdijk:
http://math.unice.fr/~cberger/BV.pdf

• John Baez says:

I know about the W-construction, and I don’t think the phylogenetic operad arises as a special case of that construction, though it’s tantalizingly close. I could be wrong.

First of all, the phylogenetic operad is crucially a symmetric operad: all the $n$-ary operations are commutative (invariant under the action of $S_n$), and this fact can only be expressed if we work with symmetric operads.

If I dropped the commutative law, I think we’d be getting a fairly familiar cofibrant replacement for the operad for monoids. In other words: a slight variant of the associahedron operad.

In the associahedron operad the spaces of operations are contractible (indeed convex polytopes). But I think imposing the commutative law makes the spaces of operations — the spaces I’m calling $\mathrm{Phyl}_n$ — topologically nontrivial. The paper by Billera, Holmes and Vogtman includes some pictures of these spaces: they’re quite fascinating!

There’s a version of the $W$ construction for symmetric operads but I think it produces symmetric operads with contractible spaces of operations.

So, I think the phylogenetic operad is not one of the operads usually studied by homotopy theorists, though it’s very close.

But if I’m wrong, please let me know! This is important!

• David Corfield says:

Might you make contact then with vectorial species, p. 17 of this?

• The W-construction does take symmetric operads to symmetric operads, perhaps a point of confusion is that it doesn’t give a cofibrant replacement in the category of symmetric operads, however if one forgets the symmetric structure to get a non-symmetric operad then one does have a cofibrant replacement in that category. The point is that the composition structure maps are now cofibrations, however the symmetric actions do not act properly.

As far as I can tell the only difference between the phylogenetic operad and W(Com) is that the edge labellings are from the positive reals $\mathbb{R}^+$ and the interval $I=[0,1]$ respectively. Oh, and the labellings of the leaf edges…

I prefer to not label the leaf edges from an operad point of view. For starters one is not allowed to contract these external’ edges as that would alter the number of leaves and hence the arity. And what are edges for if not to be contracted?

However from a genetics point of view altering the number of leaves is a plus, for if one were to allow the leaf edges to be contracted then one would have to keep track of points inside the tree. You could view these points as degenerate leaves. So we’ve generalised the situation and are now allowed to have genetic profiles which have existed in the past and not just in the present (past genetic profiles exist as points on the tree as opposed to leaves of the tree).

• I forgot to address the issue of contractability. You are right that W(Com) is contractible, however I think that the same applies to the phylogenetic operad. Indeed, just contract each edge in a constant and hence coherent fashion. One ends up with a tree with a single internal vertex. Or have I misunderstood something?

Looking at the images from the paper you cited they all look contractible. The only exception being the images of the links, however they are not important to the topology as such, but to the geometry; the fact that the links are CAT(1) means that the space of phylogenetic trees is CAT(0). And every CAT(0) space is contractible.

To see that the space of trees is contractible as an $S_n$-space is a little trickier to arrive at from the geometry, but the contraction I gave at the start does the trick; it’s compatible with the $S_n$ action. No wait it can be seen from the geometry, as that contraction is along the geodesics with ends in the tree with one internal vertex.

• John Baez says:

Thanks very much, James! I now think I was confused and you are right about everything — except possibly whether we ‘should’ label the leaf edges, which is a moral/practical question rather than a question about mathematical facts.

This is really great! I’m going to have to work up a talk on ‘grafting cherry trees to phylogenetic trees’ or something like that. I remember now that when I first read Boardman and Vogt’s book I was really confused about whether W(Com) was an E operad; when I realized it was not I decided it was worthless and never looked back. But now I know what it’s good for!

To settle the moral/practical question about whether leaf edges should come with lengths we have two guiding criteria, which I was calling the ‘truth of biology’ and the ‘truth of operads’ (or more generally the criterion of mathematical elegance). Maybe they’ll pull in different directions, but I think ultimately there’s a single beautiful story here. It may involve a number of different operads and other things.

• Tom Leinster says:

I’m too fuzzy right now to think straight, but if you’re interested in slight variants of the W construction, you might be interested in this old note. Maybe you’ve already answered the question to your satisfaction, though.

• John Baez says:

Thanks, Tom. That paper of yours really helps. I now believe the phylogenetic operad is the coproduct (in the category of symmetric operads) of the operad for commutative algebras (which we’ve been calling Com here) and the additive monoid [0,∞) (regarded as an operad with only unary operations). This description seems quite useful to me: as usual, I’m scheming and dreaming.

• I believe that you’re right John, the phylogenetic operad is that coproduct. Perhaps an interesting question to ask is: for which operads $\mathcal{O}$ do the coproducts $\mathcal{O} + I$ have a natural CAT(0) metric?

I agree with Tom that the W-construction is the suboperad of the coproduct which has root and twig’ (I assume this means an edge adjacent to a leaf) length 1, though being picky I should point out that we need to be in the category of topological spaces rather than sets for it to be a true W-construction.

3. Graham says:

JB wrote in blog post: “they do some wonderful things with the geometry of their space of phylogenetic trees. They construct a natural metric on it,…”

That rang a bell, and eventually I found the article on my own computer:

“Geometry of the Space of Phylogenetic Trees”,
Louis J. Billera, Susan P. Holmes, Karen Vogtmann

http://www.math.cornell.edu/~vogtmann/papers/Trees/lap.pdf

• John Baez says:

I gave a link to this article at the very beginning of my blog entry here. It’s a very fun read for the geometrically inclined.

Holmes has also made her slides available, so I’ve added a link:

• Susan Holmes, Treespace: distances between trees.

• Graham says:

Sorry, I must have been asleep at the beginning of your blog.

• John Baez says:

But you woke up by the end – better than the other way around.

4. For more on operads and phylogenetic trees, try this.