I think today I’ll focus on one aspect of the talks Susan Holmes gave today: the space of phylogenetic trees. Her talks were full of interesting applications to genetics, but I’m afraid my summary will drift off into a mathematical daydream inspired by what she said! Luckily you can see her actual talk uncontaminated by my reveries here:
• Susan Holmes, Treespace: distances between trees.
It’s based on this paper:
• Louis Billera, Susan Holmes and Karen Vogtmann, Geometry of the space of phylogenetic trees, Advances in Applied Mathematics 27 (2001), 733-767.
More mathematically, what we see here is a tree (a connected graph with no circuits), with a distinguished vertex called the root, and vertices of degree 1, called leaves, that are labeled with elements from some -element set. We shall call such a thing a leaf-labelled rooted tree.
Now, the tree above is actually a binary tree, meaning that as we move up an edge, away from the root, it either branches into two new edges or ends in a leaf. (More precisely: each vertex that doesn’t have degree 1 has degree 3.) This makes sense in biology because while species often split into two as they evolve, it is less likely for a species to split into three all at once.
So, the phylogenetic trees we see in biology are usually leaf-labeled rooted binary trees. However, we often want to guess such a tree from some data. In this game, trees that aren’t binary become important too!
Why? Well, each edge of the tree can be labeled with a number saying how much evolution occurred along that edge: for example, how many DNA base pairs changed. But as this number goes to zero, we get a tree that’s not binary anymore. So, we think of non-binary trees as conceptually useful ‘intermediate cases’ between binary trees.
This idea immediately leads us to consider a topological space consisting of phylogenetic trees which are not necessarily binary. And at this point in the lecture I drifted off into a daydream about ‘operads’, which are a nice piece of mathematics that’s closely connected to this idea.
So, I will deviate slightly from Holmes and define a phylogenetic tree to be a leaf-labeled rooted tree where each edge is labeled by a number called its length. This length must be positive for every edge except the edge incident to the root; for that edge any nonnegative length is allowed.
Let’s write for the set of phylogenetic trees with leaves. This becomes a topological space in a fairly obvious way. For example, there’s a continuous path in that looks like this:
Moreover we have this fact:
Theorem. There is a topological operad called the phylogenetic operad, or whose space of n-ary operations is for and the empty set for .
If you don’t know what an operad is, don’t be scared. This mainly just means that you can glue a bunch of phylogenetic trees to the top of another one and get a new phylogenetic tree! More precisely, suppose you have a phylogenetic tree with leaves, say . And suppose you have more, say . Then you can glue the roots of to the leaves of to get a new phylogenetic tree called . Furthermore, this gluing operation obeys some rules which look incredibly intimidating when you write them out using symbols, but pathetically obvious when you draw them using pictures of trees. And these rules are the definition of an operad.
I would like to know if mathematicians have studied the operad It’s closely related to Stasheff’s associahedron operad, but importantly different. Operads have ‘algebras’, and the algebras of the associahedron operad are topological spaces with a product that’s ‘associative up to coherent homotopy’. I believe algebras of the phylogenetic operad are topological spaces with a commutative product that’s associative up to coherent homotopy. Has someone studied these?
In their paper Holmes and her coauthors discuss the associahedron in relation to their own work, but they don’t mention operads. I’ve found another paper that mentions ‘the space of phylogenetic trees’:
• David Speyer and Bernd Sturmfels, The tropical Grassmannian, Adv. Geom. 4 (2004), 389–411.
but they don’t seem to study the operad aspect either.
Perhaps one reason is that Holmes and her coauthors deliberately decide to ignore the labellings on the edges incident to the leaves. So, they get a space of phylogenetic trees with leaves whose product with is the space I’m calling . As they mention, this simplifies the geometry a bit. However, it’s not so nice if you want an operad that accurately describes how you can build a big phylogenetic tree from smaller ones.
They don’t care about operads; they do some wonderful things with the geometry of their space of phylogenetic trees. They construct a natural metric on it, and show it’s a CAT(0) space in the sense of Gromov. This means that the triangles in this space are more skinny than those in Euclidean space—more like triangles in hyperbolic space:
They study geodesics in this space—even though it’s not a manifold, but something more singular. And so on!
There’s a lot of great geometry here. But for Holmes, all this is just preparation for doing some genomics— for example, designing statistical tests to measure how reliable the phylogenetic trees guessed from data actually are. And for that aspect, try this:
• Susan Holmes, Statistical approach to tests involving phylogenies, in O. Gascuel, editor, Mathematics of Evolution and Phylogeny, Oxford U. Press, Oxford, 2007.