• John Baez, Networks and population biology (Part 4), 6 May 2011.

It’s about an attempt by Persi Diaconis to define random graphs where certain features show up. Here’s a teaser, which leaves out the shocking conclusion:

People have studied the Erdős–Rényi random graphs very intensively, so now people are eager to study random graphs with more interesting correlations. For example, consider the graph where we draw an edge between any two people who are friends. If you’re my friend and I’m friends with someone else, that improves the chances that you’re friends with them! In other words, friends tend to form ‘triangles’. But in an Erdős–Rényi random graph there’s no effect like that.

‘Exponential families’ of random graphs seem like a way around this problem. The idea here is to pick a specific collection of graphs and say how commonly we want these to appear in our random graph. If we only use one graph and we take this to be two vertices connected by an edge, we’ll get an Erdős–Rényi random graph. But, if we also want our graph to contain a lot of triangles, we can pick to be a triangle.

His approach seems very well-motivated by statistical mechanics. However, it doesn’t work! But the way it fails is itself interesting.

]]>• Alan M. Frieze, On the value of a random minimum spanning tree problem, *Discrete Applied Mathematics* **10** (1985), 47–56.

Perhaps this was before people knew about, or talked about, the ‘giant component’ of a random graph.

]]>First use all edges smaller than 2c/n. There will be about cn of them, so for c>1, there will be a giant component C of size bn where b + exp(-bc) – 1 = 0. Delete edges from C until it is a tree T. The sum of weights in T is about bn(c/n) = bc.

Next, use edges with weights in [2c/n,1] to join G-T to T, using the edge with smallest weight for each vertex in G-T. There are about bn choices for each one, so these weights add up to about (1-b)n (2c/n + 1/(bn)) = (1-b)(2c_1/b). The total is bc + 2c – 2cb + 1/b – 1 = 2c – 1 – bc – 1/b. For example, if c=2, b is about .8 and this is 3-1.6+1.25=2.65.

]]>While I was working on the “multiscale information theory” effort described in the aforelinked papers, I tried to come up with a simpler setting for the basic ideas, one that would illustrate the basics and wouldn’t require taking logarithms of probabilities all over the place. Take a set of questions, all of which pertain to some physical system—to keep it simple, make them all binary questions. We suppose that each question carries one unit of information; or, to say it another way, answering any removes one unit of uncertainty. Furthermore, we suppose that the questions are independent, in that answering any does not help us to answer any other .

Our physical system may contain multiple pieces, or components; call them , , and so forth. Some of the questions in may pertain to part , but not all of them have to, so let’s call the subset of consisting of questions that each pertain to component . Likewise for and . So, the total *information content* of component is : This is the total number of questions which we must get the answers to in order to remove our uncertainty about . Then, the *mutual information* between components and is

We can go further and define a tertiary mutual information in the same manner, and we can do even fancier things as the number of components in the system grows larger. This provides a kind of “toy version” of information theory: It doesn’t have the richness of Shannon theory, but it does furnish functions that satisfy some of the important properties of Shannon information.

So, keeping the familiar concepts of Shannon theory firmly set aside for the moment, in this more abstract setting, a natural puzzle arises. What if the questions in the set *are* informative about each other? Maybe we want to keep probability theory at a distance for now, because it doesn’t feel natural to apply. Perhaps our notion of “similarity” between questions is something like, “Given the answer to question , we could conceivably compute the answer to question , given sufficient computer time and memory.” So, the first thing we try is to quantify this with a notion of “distance” . Why should this work like a distance? Well, if indicates how hard it is to calculate the answer to given an answer to , and quantifies the difficulty of computing an answer to given one for , then the worst-case scenario for calculating an answer for starting with feels like it should be the sum total difficulty for the two-step path, .

Thus, we put a metric on , and we can talk about the magnitude of , as well as of the subspaces within it. All our measures of information pick up a scale parameter , which indicates the cost of computation.

Fun to think about, maybe, as a setting where various abstract ideas can be applied.

]]>