Today was the last day of the tutorials on discrete mathematics and probability in networks and population biology. Persi Diaconis gave two talks, one on ‘exponential families’ of random graphs and one on ‘exchangeability’. Since there’s way too much to summarize, I’ll focus on explaining ideas from the first talk, leaving you to read about the second here:
• Persi Diaconis and Svante Janson, Graph limits and exchangeable random graphs.
Susan Holmes also gave two talks. The first was on metagenomics and the human microbiome—very cool stuff. Did you know that your body contains 100 trillion bacteria, and only 10 trillion human cells? And you’ve got 1000 species of bacteria in your gut? Statistical ecologists are getting very interested in this.
Her second talk was about doing statistics when you’ve got lots of data of different kinds that need to be integrated: numbers, graphs and trees, images, spatial information, and so on. This is clearly the wave of the future. You can see the slides for this talk here:
• Susan Holmes, Heterogeneous data challenge: combining complex data.
The basic idea of Persi Diaconis’ talk was simple and shocking. Suppose you choose a random graph in the most obvious way designed to heighten the chance that it contains a triangle, or some other figure. Then in fact all you’ve done is change the chance that there’s an edge between any given pair of vertices!
But to make this precise—to make it even true—we need to say what the rules are.
For starters, let me point you back to Part 2 for Persi’s definitions of ‘graph’ and ‘graph homomorphism’. If we fix a finite set , there will be a big set of graphs with exactly these vertices. To define a kind of ‘random graph’, we first pick a probability measure on each set . Then, we demand that these probability measures converge in a certain sense as .
However, we can often describe random graphs in a more intuitive way! For example, the simplest random graphs are the Erdős–Rényi random graphs. These depend on a parameter . The idea here is that we take our set of vertices and for each pair we flip a coin that lands heads up with probability . If it lands heads up, we stick in an edge between those vertices; otherwise not. So, the presence or absence of each edge is an independent random variable.
Here’s a picture of an Erdős–Rényi random graph drawn by von Frisch, with a 1% chance of an edge between any two vertices. But it’s been drawn in a way so that the best-connected vertices are near the middle, so it doesn’t look as random as it is:
People have studied the Erdős–Rényi random graphs very intensively, so now people are eager to study random graphs with more interesting correlations. For example, consider the graph where we draw an edge between any two people who are friends. If you’re my friend and I’m friends with someone else, that improves the chances that you’re friends with them! In other words, friends tend to form ‘triangles’. But in an Erdős–Rényi random graph there’s no effect like that.
‘Exponential families’ of random graphs seem like a way around this problem. The idea here is to pick a specific collection of graphs and say how commonly we want these to appear in our random graph. If we only use one graph , and we take this to be two vertices connected by an edge, we’ll get an Erdős–Rényi random graph. But, if we also want our graph to contain a lot of triangles, we can pick to be a triangle.
More precisely, remember from Part 2 that is the fraction of functions mapping to vertices of that are actually graph homomorphisms. This is the smart way to keep track of how often shows up inside . So, we pick some numbers and define a probability measure on as follows: the probability of any particular graph should be proportional to
If you’re a physicist you’ll call this a ‘Gibbs state’, and you’ll know this is the way to get a probability distribution that maximizes entropy while holding the expected values of constant. Statisticians like to call the whole family of Gibbs states as we vary the number an ‘exponential family’. But the cool part, for me, is that we can apply ideas from physics—namely, statistical mechanics—to graph theory.
So far we’ve got a probability measure on our set of graphs with vertices. These probability measures converge in a certain sense as . But Diaconis and Chatterjee proved a shocking theorem: for almost all choices of the graphs and numbers , these probability measures converge to an Erdős–Rényi random graph! And in the other cases, they converge to a probabilistic mixture of Erdős–Rényi random graphs.
In short, as long as the numbers are positive, exponential families don’t buy us much. We could just work with Erdős–Rényi random graphs, or probabilistic mixtures of these. The exponential families are still very interesting to study, but they don’t take us into truly new territory.
The theorem is here:
• Sourav Chatterjee and Persi Diaconis, Estimating and understanding random graph models.
To reach new territory, we can try letting some be negative. The paper talks about this too. Here many questions remain open!