Networks and Population Biology (Part 2)

Yesterday I was too sick to attend this conference:

Tutorials on discrete mathematics and probability in networks and population biology, Institute of Mathematical Sciences, National University of Singapore, 2-6 May 2011. Organized by Andrew Barbour, Malwina Luczak, Gesine Reinert and Rongfeng Sun.

But today I bounced back, and I want to tell you about the first lectures by Persi Diaconis and his wife Susan Holmes, both statisticians at Stanford University.

Since Persi Diaconis is one of those people who make it really obvious that being a mathematician is more fun than any other profession, I should say a bit about him. He left home at the age of 14 to become a professional stage magician, and only returned to school so that he could learn all the math needed to understand Feller’s famous book An Introduction to Probability Theory and Its Applications. Since then he has worked on the mathematics of shuffling cards and the physics of flipping coins. He also works on random matrix theory, the statistics of Riemann zeta zeroes, applications of Hopf algebras to probability theory, and all sorts of other things that don’t sound fun unless you know enough to realize that yes, they are.

In my last blog post on this conference Tim van Beek asked if Persi could really flip a coin and have it land with whatever side up he wanted. So, I asked him about that. He said yes, he could. He didn’t demonstrate it at the time, since we were just about to go to a talk. But he said you can see videos of other people doing it on the web.

Unfortunately, the videos I’ve found so far mainly convince me, not that there are people who have mastered the physical skill of controlling a coin flip, but that there are lots of tricksters out there! Here are two:

• Derrin Brown, The system – coin toss.

• EricSurf6, How to win a coin toss.

Can you figure out how they do it? If you don’t get the second one, you can watch this.

So, I have to wonder: can Persi really flip a coin and have it land the way he wants, or is just fooling people into thinking he can? Mind you, I wouldn’t consider the latter a bad thing, at least insofar as he’s still a practicing stage magician: a basic part of the craft is trickery of this sort. I should ask if he’s sworn the Magician’s Oath:

“As a magician I promise never to reveal the secret of any illusion to a non-magician, unless that one swears to uphold the Magician’s Oath in turn. I promise never to perform any illusion for any non-magician without first practicing the effect until I can perform it well enough to maintain the illusion of magic.”

Anyway, he is giving a series of talks on “Graph limits and exponential models”. There are lots of very large graphs in biology, so he plans to talk about ways to study large graphs by taking limits where the graphs become infinite. He began by describing a result so beautiful that I would like to state it here.

‘Graph’ means many things, but in his talk a graph is simple (no self-loops or multiple edges between vertices), undirected and labeled. In other words, something like this:

A graph homomorphism

\phi : G \to H

is a map sending vertices of G to vertices of H, such that whenever vertices i,j of G have an edge between them, the vertices \phi(i), \phi(j) have an edge between them in H.

Given this, we can define t(G,H) to be the fraction of the functions from vertices of G to vertices of H that are actually graph homomorphisms.

The famous graph theorist Lovasz said that a sequence of graphs G_n converges if t(G_n, H) converges for all graphs H.

Using this definition, we have:

Theorem (Aldous-Hoover, Lovasz et al). There is a compact metric space X such that each graph gives a point in X, and a sequence of graphs G_n converges iff

G_n \to x

for some point x \in X.

The space X is called the space of graph limits, and the important thing is that one can construct it rather explicitly! Persi explained how. For a written explanation, see:

• Miklos Racz, Dense graph limits, 31 October 2010.

After a break, Susan Holmes began her series of talks on “Phylogeny, trees and the human microbiome”. I can’t easily summarize all she said, but here are the slides for the first talk:

• Susan Holmes, Molecular evolution: phylogenetic tree building.

Her basic plan today was to teach us a little genomics, a little statistics, and a little bit about Markov chains, so that we could start to understand how people attempt to reconstruct phylogenetic trees from DNA data.

A phylogenetic tree is something like this:

or something less grand where, for example, you only consider species of frogs, or even HIV viruses in a single patient. These days there are programs to generate such trees given copies of the ‘same gene’ in the different organisms you’re considering. These programs have names like PhyML, FastML, RaxML and Mr. Bayes. In case you’re wondering, ‘ML’ stands for ‘maximum likelihood’ a standard technique in statistics, while Bayes was the originator of some very important ideas in statistical reasoning.

But even though there are programs to do these things, there are still lots of fascinating problems to solve in this area! And indeed, even understanding the programs is a bit of a challenge.

Since we were talking about the genetic code here recently, I’ll wrap up by mentioning one thing learned about this from Susan’s talk.

It’s common to describe the accumulation of changes in DNA using substitution models. These are continuous time Markov chains where each base pair has a given probability of mutating to another one. Often people assume this probability for each base pair is independent of what its neighbors do. The last assumption is known to be false, but that’s another story for another day! What I wanted to say is that there are two famous models. The simplest is the Jukes-Cantor model, where each of the four bases—A, T, C, and G—has an equal probability of mutating into any other. But a somewhat better model is the Kimura model, where the transitions

A ↔ T
C ↔ G

have a different rate of occuring than all the remaining possibilities. If you look at the pictures of the A, T, C, and G molecules here you’ll instantly see why:

Since I’m busy learning about continuous-time Markov chains, it’s nice to see more good examples!

15 Responses to Networks and Population Biology (Part 2)

  1. John F says:

    Not directly relevant, but lateral gene transfer
    is one means of introducing loops into phylogenesis.

    Speaking of lateral communication among bacteria, have you seen this interesting speculation about electromagnetic signaling among bacterial communities?

  2. Mark Meckes says:

    I’ve never seen Persi do either trick personally, but I know people who have seen him demonstrate a perfect shuffle, and heard that he said that was much harder to learn to do than a controlled coin flip.

    • DavidTweed says:

      Typo: it’s Derren (2 e’s) Brown. (Youtube won’t let me watch the video since I’m actually in England.)

      I assume I’m not saying anything that should be secret if I point out there are two possible controlled-coin-tossing tasks: controlling the coin toss if the “tosser” gets to set which side is up at the start of the toss and if they have to control the outcome given a random-but-known initial side being up; I’d imagine reliably controlling the second task requires a lot more physical skill.

    • David Corfield says:

      My comment from last time included a quote by Edwin Jaynes about controlling coin tosses. I can’t recall if Persi told me then about his own abilities in this regard.

    • Mike Shulman says:

      I can toss a coin so that it looks like it’s flipping, if you don’t look too closely and/or don’t know what to look for, but the side that’s up when it lands is still the same as the side that was up when it started. It’s not that hard, although it takes a bit of practice; I’d be happy to demonstrate to anyone any time it’s practical. But I don’t know any way to get it to land on the *other* side from the one that started out facing up, aside from surreptitiously turning the coin over before or after flipping it.

  3. Graham says:

    “But even though there are programs to do these things, there are still lots of fascinating problems to solve in this area! And indeed, even understanding the programs is a bit of a challenge.”

    Yes, right now I’m busy understanding one of these (BEAST = Bayesian Evolutionary Analysis Sampling Trees) so that I can make it even more complicated.

    I haven’t read this paper by Susan Holmes, but it is worth looking at pictures if you’re interested in phylogenetics.

    Click to access chapihp04.pdf

    • John Baez says:

      Thanks for the reference, Graham! This paper covers a lot of the ideas she explained in her two talks today (see “Part 3” of this blog series when it comes out).

      I’m really glad we had our earlier conversation about phylogenetic trees and the associahedron—it prepared me for some of the math she’s explaining.

  4. Graham says:

    “Has any university department ever opened its account with such a statistically significant event as that which launched the Warwick Statistics Department? On Tuesday 9th October 1972, in the first serious lecture given to a group of 45 second-year mathematicians, entitled Possibilities & Probabilities, the founding professor tossed a 2p coin high in the air. The coin descended to the vinyl floor of lecture theatre L5, spun as a perfect sphere, and, in full view, slowly came to rest on its edge! Stunned silence turned into massive applause. No further publicity was necessary – truly the Statistics Department had arrived in style!”


    Click to access histories-small.pdf

  5. bob says:

    The first time I heard about category theory was in a sentence which also contained population biology (actually population dynamics), in the context of the work of Robert Rosen. Does his thinking have a role here?

  6. As I mentioned last time, a phylogenetic tree is something like this […]

  7. […] starters, let me point you back to part 2 for Persi’s definitions of ‘graph’ and ‘graph homomorphism’. If we […]

  8. John Baez says:

    The slides for this talk are available now:

    • Susan Holmes, Molecular evolution: phylogenetic tree building.

  9. Perhaps your readers will also like to see Susan Holmes participation in last year’s ACM Data Mining Camp expert panel video. She talks about how to get started in a data mining career and some of her research:

You can use Markdown or HTML in your comments. You can also use LaTeX, like this: $latex E = m c^2 $. The word 'latex' comes right after the first dollar sign, with a space after it.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.