Markov Models of Social Change (Part 1)

24 February, 2014

guest post by Alastair Jamieson-Lane

The world is complex, and making choices in a complex world is sometimes difficult.

As any leader knows, decisions must often be made with incomplete information. To make matters worse, the experts and scientists who are meant to advise on these important matters are also doing so with incomplete information—usually limited to only one or two specialist fields. When decisions need to be made that are dependent on multiple real-world systems, and your various advisors find it difficult to communicate, this can be problematic!

The generally accepted approach is to listen to whichever advisor tells you the things you want to hear.

When such an approach fails (for whatever mysterious and inexplicable reason) it might be prudent to consider such approaches as Bayesian inference, analysis of competing hypotheses or cross-impact balance analysis.

Because these methods require experts to formalize their opinions in an explicit, discipline neutral manner, we avoid many of the problems mentioned above. Also, if everything goes horribly wrong, you can blame the algorithm, and send the rioting public down to the local university to complain there.

In this blog article I will describe cross-impact balance analysis and a recent extension to this method, explaining its use, as well as some basic mathematical underpinnings. No familiarity with cross-impact balance analysis will be required.

Wait—who is this guy?

Since this is my first time writing a blog post here, I hear introductions are in order.

Hi. I’m Alastair.

I am currently a Master’s student at the University of British Columbia, studying mathematics. In particular, I’m aiming to use evolutionary game theory to study academic publishing and hiring practices… and from there hopefully move on to studying governments (we’ll see how the PhD goes). I figure that both those systems seem important to solving the problems we’ve built for ourselves, and both may be under increasing pressure in coming years.

But that’s not what I’m here for today! Today I’m here to tell the story of cross-impact balance analysis, a tool I was introduced to at the complex systems summer school in Santa Fe.

The story

Suppose (for example) that the local oracle has foretold that burning the forests will anger the nature gods

… and that if you do not put restrictions in place, your crops will wither and die.

Well, that doesn’t sound very good.

The merchant’s guild claims that such restrictions will cause all trade to grind to a halt.

Your most trusted generals point out that weakened trade will leave you vulnerable to invasion from all neighboring kingdoms.

The sailors guild adds that the wrath of Poseidon might make nautical trade more difficult.

The alchemists propose alternative sources of heat…

… while the druids propose special crops as a way of resisting the wrath of the gods…

… and so on.

Given this complex web of interaction, it might be a good time to consult the philosophers.

Overview of CIB

This brings us to the question of what CIB (Cross-Impact Balance) analysis is, and how to use it.

At its heart, CIB analysis demands this: first, you must consider what aspects of the world you are interested in studying. This could be environmental or economic status, military expenditure, or the laws governing genetic modification. These we refer to as “descriptors”. For each “descriptor” we must create a list of possible “states”.

For example, if the descriptor we are interested in were “global temperature change” our states might be “+5 degree”, “+4 degrees” and so on down to “-2 degrees”.

The states of a descriptor are not meant to be all-encompassing, or offer complete detail, and they need not be numerical. For example, the descriptor “Agricultural policy” might have such states as “Permaculture subsidy”, “Genetic engineering”, “Intensive farming” or “No policy”.

For each of these states, we ask our panel of experts whether such a state would increase or decrease the tendency for some other descriptor to be in a particular state.

For example, we might ask: “On a scale from -3 to 3, how much does the agricultural policy of Intensive farming increase the probability that we will see global temperature increases of +2 degrees?”

By combining the opinions of a variety of experts in each field, and weighting based on certainty and expertise, we are able to construct matrices, much like the one below:

The above matrix is a description of my ant farm. The health of my colony is determined by the population, income, and education levels of my ants. For a less ant focused version of the above, please refer to:

• Elisabeth A. Lloyd and Vanessa J. Schweizer, Objectivity and a comparison of methodological scenario approaches for climate change research, Synthese (2013).

For any possible combination of descriptor states (referred to as a scenario) we can calculate the total impact on all possible descriptors. In the current scenario we have low population, high income and medium education (see highlighted rows).

Because the current scenario has high ant income, this strongly influences us to have low population (+3) and prevents a jump to high population (-3). This combined with the non-influence from education (zeros) leads to low population being the most favoured state for our population descriptor. Thus we expect no change. We say this is “consistent”.

Education however sees a different story. Here we have a strong influence towards high education levels (summing the column gives a total of 13). Thus our current state (medium education) is inconsistent, and we would expect the abundance of ant wealth to lead to an improvements in the ant schooling system.

Classical CIB analysis acts as a way to classify which hypothetical situations are consistent, and which are not.

Now, it is all well and good to claim that some scenarios are stable, but the real use of such a tool is in predicting (and influencing) the future.

By applying a deterministic rule that determines how inconsistencies are resolved, we can produce a “succession rule”. The most straight-forward example is to replace all descriptor states with whichever state is most favoured by the current scenario. In the example above we would switch to “Low population, medium income, high education”. A generation later we would switch back to “Low population, High income, medium education”, soon finding ourselves trapped in a loop.

All such rules will always lead to either a loop or a “sink”: a self consistent scenario which is succeeded only by itself.

So, how can we use this? How will this help us deal with the wrath of the gods (or ant farms)?

Firstly: we can identify loops and consistent scenarios which we believe are most favourable. It’s all well and good imagining some future utopia, but if it is inconsistent with itself, and will immediately lead to a slide into less favourable scenarios then we should not aim for it, we should find that most favourable realistic scenario and aim for that one.

Secondly: We can examine all our consistent scenarios, and determine whose “basin of attraction” we find ourselves in: that is, which scenario are we likely to end up in.

Thirdly: Suppose we could change our influence matrix slightly? How would we change it to favour scenarios we most prefer? If you don’t like the rules, change the game—or at the very least find out WHAT we would need to change to have the best effect.

Concerns and caveats

So… what are the problems we might encounter? What are the drawbacks?

Well, first of all, we note that the real world does not tend to reach any form of eternal static scenario or perfect cycle. The fact that our model does might be regarded as reason for suspicion.

Secondly, although the classical method contains succession analysis, this analysis is not necessarily intended as a completely literal “prediction” of events. It gives a rough idea of the basins of attraction of our cycles and consistent scenarios, but is also somewhat arbitrary. What succession rule is most appropriate? Do all descriptors update simultaneously? Or only the one with the most “pressure”? Are our descriptors given in order of malleability, and only the fastest changing descriptor will change?

Thirdly, in collapsing our description of the world down into a finite number of states we are ignoring many tiny details. Most of these details are not important, but in assuming that our succession rules are deterministic, we imply that these details have no impact whatsoever.

If we instead treat succession as a somewhat random process, the first two of these problems can be solved, and the third somewhat reduced.

Stochastic succession

In the classical CIB succession analysis, some rule is selected which deterministically decides which scenario follows from the present. Stochastic succession analysis instead tells us the probability that a given scenario will lead to another.

The simplest example of a stochastic succession rule is to simply select a single descriptor at random each time step, and only consider updates that might happen to that descriptor. This we refer to as dice succession. This (in some ways) represents hidden information: two systems that might look identical on the surface from the point of view of our very blockish CIB analysis might be different enough underneath to lead to different outcomes. If we have a shaky agricultural system, but a large amount of up-and-coming research, then which of these two factors becomes important first is down to the luck of the draw. Rather than attempt to model this fine detail, we instead merely accept it and incorporate this uncertainty into our model.

Even this most simplistic change leads to dramatics effects on our system. Most importantly, almost all cycles vanish from our results, as forks in the road allow us to diverge from the path of the cycle.

We can take stochastic succession further and consider more exotic rules for our transitions, ones that allow any transition to take place, not merely those that are most favored. For example:

P(x,y) = A e^{I_x(y)/T}

Here x is our current scenario, y is some possible future scenario, and I_x(y) is the total impact score of y from the perspective of x. A is a simple normalizing constant, and T is our system’s temperature. High temperature systems are dominated by random noise, while low temperature systems are dominated by the influences described by our experts.

Impact score is calculated by summing the impact of each state of our current scenario, on each state of our target scenario. For example, for the above, suppose we want to find I_x(y) when x is the given scenario “Low population, High income, medium education” and y was the scenario “Medium population, medium income, High education”. We consider all numbers that are in rows which were states of x and in columns that are states of y. This would give:

I_x(y)= (0+0+0) + (-2 +0 +10) +(6+7+0) = 21

Here each bracket refers to the sum of a particular column.
More generically we can write the formula as:

\displaystyle{ I_x(y)= \sum_{i \subset x, \;j \subset y} M_{i,j} }

Here M_{i,j} refers to an entry in our cross-impact balance matrix, i and j are both states, and i \subset x reads as “i is a state of x”.

We refer to this function for computing transition probabilities as the Boltzmann succession law, due to its similarity to the Boltzmann distribution found in physics. We use it merely as an example, and by no means wish to imply that we expect the transitions for our true system to act in a precisely Boltzmann-like manner. Alternative functions can, and should, be experimented with. The Boltzmann succession law is however an effective example and has a number of nice properties: P(x,y) is always positive, unchanged by adding a constant to every element of the cross-impact balance matrix, contains adjustable parameters, and unbounded above.

The Boltzmann succession rule is what I will refer to as fully stochastic: it allows transitions even against our experts’ judgement (with low probability). This is in contrast to dice succession which picks a direction at random, but still contains scenarios from which our system can not escape.

Effects of stochastic succession

‘Partially stochastic’ processes such as the dice rule have very limited effect on the long term behavior of the model. Aside from removing most cycles, they behave almost exactly like our deterministic succession rules. So, let us instead discuss the more interesting fully stochastic succession rules.

In the fully stochastic system we can ask “after a very long time, what is the probability we will be in scenario x?”

By asking this question we can get some idea of the relative importance of all our future scenarios and states.

For example, if the scenario “high population, low education, low income” has a 40% probability in the long term, while most other scenarios have a probability of 0.2%, we can see that this scenario is crucial to the understanding of our system. Often scenarios already identified by deterministic succession analysis are the ones with the greatest long term probability—but by looking at long term probability we also gain information about the relative importance of each scenario.

In addition, we can encounter scenarios which are themselves inconsistent, but form cycles and/or clusters of interconnected scenarios. We can also notice scenarios that while technically ‘consistent’ in the deterministic rules are only barely so, and have limited weight due to a limited basin of attraction. We might identify scenarios that seem familiar in the real world, but are apparently highly unlikely in our analysis, indicating either that we should expect change… or perhaps suggesting a missing descriptor or a cross-impact in need of tweaking.

Armed with such a model, we can investigate what we can do to increase the short term and long term likelihood of desirable scenarios, and decrease the likelihood of undesirable scenarios.

Some further reading

As a last note, here are a few freely available resources that may prove useful. For a more formal introduction to CIB, try:

• Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

• Wolfgang Weimer-Jehle, Properties of cross-impact balance analysis.

You can find free software for doing a classical CIB analysis here:

• ZIRIUS, ScenarioWizard.

ZIRIUS is the Research Center for Interdisciplinary Risk and Innovation Studies of the University of Stuttgart.

Here are some examples of CIB in action:

• Gerhard Fuchs, Ulrich Fahl, Andreas Pyka, Udo Staber, Stefan Voegele and Wolfgang Weimer-Jehle, Generating innovation scenarios using the cross-impact methodology, Department of Economics, University of Bremen, Discussion-Papers Series No. 007-2008.

• Ortwin Renn, Alexander Jager, Jurgen Deuschle and Wolfgang Weimer-Jehle, A normative-functional concept of sustainability and its indicators, International Journal of Global Environmental Issues, 9 (2008), 291–317.

Finally, this page contains a more complete list of articles, both practical and theoretical:

• ZIRIUS, Cross-impact balance analysis: publications.


Network Theory Overview

22 February, 2014

 

Here’s a video of a talk I gave yesterday, made by Brendan Fong. You can see the slides here—and then click the items in blue, and the pictures, for more information!

The idea: nature and the world of human technology are full of networks! People like to draw diagrams of networks. Mathematical physicists know that in principle these diagrams can be understood using category theory. But why should physicists have all the fun? This is the century of understanding living systems and adapting to life on a finite planet. Math isn’t the main thing we need for this, but it’s got to be part of the solution… so one thing we should do is develop a unified and powerful theory of networks.

We are still far from doing this. In this overview, I briefly described three parts of the jigsaw puzzle, and invited everyone to help fit them together:

• electrical circuits and signal-flow graphs.

• stochastic Petri nets, chemical reaction networks and Feynman diagrams.

• Bayesian networks, information and entropy.

In my talks coming up, I’ll go into more detail on each of these. With luck, you’ll be able to see videos here.

But if you’re near Oxford, you might as well actually attend! You can see dates, times, locations, my slides, and the talks themselves as they show up by going here.


Finding and Solving Problems

18 February, 2014

Luke Muelhauser, executive director of the Machine Intelligence Research Insitute, has been doing some interviews:

Scott Aaronson on philosophical progress.

Greg Morrisett on secure and reliable systems.

Robin Hanson on serious futurism.

Recently he interviewed me. Here’s how it went.

LM: In a previous interview, I asked Scott Aaronson which “object-level research tactics” he finds helpful when trying to make progress in theoretical research, and I provided some examples. Do you have any comments on the research tactics that Scott and I listed? Which recommended tactics of your own would you add to the list?

JB: What do you mean by “object-level” research tactics? I’ve got dozens of tactics. Some of them are ways to solve problems. But equally important, or maybe more so, are tactics for coming up with problems to solve: problems that are interesting but still easy enough to solve. By “object-level”, do you mean the former?

LM: Both! Conceiving of—and crisply posing—good research problems can often be even more important than solving previously-identified research problems.

JB: Okay. Here are some of my tactics.

(1) Learn a lot. Try to understand how the whole universe works, from the philosophical, logical, mathematical and physical aspects to chemistry, biology, and the sciences based on those, to the historical sciences such as cosmology, paleontology, archaeology and history, to the social sciences such as psychology, sociology, anthropology, politics and economics, to the aspects that are captured best in literature, art and music.

It’s a never-ending quest, and obviously it pays to specialize and become more of an expert on a few things – but the more angles you can take on any subject, the more likely you are to stumble on good questions or good answers to existing questions. Also, when you get stuck on a problem, or get tired, it can be really re-energizing to learn new things.

(2) Keep synthesizing what you learn into terser, clearer formulations. The goal of learning is not to memorize vast amounts of data. You need to do serious data compression, and filter out the noise. Very often people will explain things to you in crappy ways, presenting special cases and not mentioning the general rules, stating general rules incorrectly, and so on.

This process goes on forever. When you first learn algebraic topology, for example, they teach you. homology theory. At the beginner’s level, this is presented as a rather complicated recipe for taking a topological space and getting a list of groups out of it. By looking at examples you get insight into what these groups do: the nth one counts the n-dimensional holes, in some sense. You learn how to use them to solve problems, and how to efficiently compute them.

But later—much later, in my case—you learn that algebraic topology of this sort not really about topological spaces, but something more abstract, called “homotopy types”. This is a discovery that happened rather slowly. It crystallized around the 1968, when a guy named Quillen wrote a book on “homotopical algebra”. It’s always fascinating when this happens: when people in some subject learn that its proper object of study is not what they had thought!

But even this was just the beginning: a lot has happened in math since the 1960s. Shortly thereafter, Grothendieck came along and gave us a new dream of what homotopy types might actually be. Very roughly, he realized that they should show up naturally if we think of “equality” as a process—the process of proving two thing are the same—rather than a static relationship.

I’m being pretty vague here, but I want to emphasize that this was a very fundamental discovery with widespread consequences, not a narrow technical thing.

For a long time people have struggled to make Grothendieck’s dream precise. I was involved in that myself for a while. But in the last 5 years or so, a guy named Voevodsky made a lot of progress by showing us how to redo the foundations of mathematics so that instead of treating equality as a mere relationship, it’s a kind of process. This new approach gives an alternative to set theory, where we use homotopy types right from the start as the basic objects of mathematics, instead of sets. It will take about a century for the effects of this discovery to percolate through all of math.

So, you see, by taking something important but rather technical, like algebraic topology, and refusing to be content with treating it as a bunch of recipes to be memorized, you can dig down into deep truths. But it takes great persistence. Even if you don’t discover these truths yourself, but merely learn them, you have to keep simplifying and unifying.

(3) Look for problems, not within disciplines, but in the gaps between existing disciplines. The division of knowledge into disciplines is somewhat arbitrary, and people put most of their energy into questions that lie squarely within disciplines, so it shouldn’t be surprising that many interesting things are lurking in the gaps, waiting to be discovered.

At this point, tactics (1) and (2) really come in handy. If you study lots of subjects and keep trying to distill their insights into terse, powerful formulations, you’re going to start noticing points of contact between these subjects. Sometimes these will be analogies that deserve to be made precise. Sometimes people in one subject know a trick that people in some other subject could profit from. Sometimes people in one subject have invented the hammer, and people in another have invented the nail—and neither know what these things are good for!

(4) Talk to lots of people. This is a great way to broaden your vistas and find connections between seemingly different subjects.

Talk to the smartest people who will condescend to talk to you. Don’t be afraid to ask them questions. But don’t bore them. Smart people tend to be easily bored. Try to let them talk about what’s interesting to them, instead of showing off and forcing them to listen to your brilliant ideas. But make sure to bring them some “gifts” so they’ll want to talk to you again. “Gifts” include clear explanations of things they don’t understand, and surprising facts—little nuggets of knowledge.

One of my strategies for this was to write This Week’s Finds, explaining lots of advanced math and physics. You could say that column is a big pile of gifts. I started out as a nobody, but after ten years or so, lots of smart people had found out about me. So now it’s pretty easy for me to blunder into any subject, write a blog post about it, and get experts to correct me or tell me more. I also get invited to give talks, where I meet lots of smart people.

LM: You’ve explained some tactics for how to come up with problems to solve. Once you generate a good list, how do you choose among them?

JB: Here are two bits of advice on that.

(1) Actually write down lists of problems.

When I was just getting started, I had a small stock of problems to think about – so small that I could remember most of them. Many were problems I’d heard from other people, but most of those were too hard. I would also generate my own problems, but they were often either too hard, too vague, or too trivial.

In more recent years I’ve been able to build up a huge supply of problems to think about. This means I need to actually list them. Often I generate these lists using the ‘data compression’ tactic I mentioned in part (2) of my last answer. When I learn stuff, I ask:

• Is this apparently new concept or fact a special case of some concept or fact I already know?

• Given two similar-sounding concepts or facts, can I find a third having both of these as special cases?

• Can I use the analogy between X and Y to do something new in subject Y that’s analogous to something people have already done in subject X?

• Given a rough ‘rule of thumb’, can I state it more precisely so that it holds always, or at least more often?

as well as plenty of more specific questions.

So, instead of being ‘idea-poor’, with very few problems to work on, I’m now ‘idea-rich’, and the challenge is keeping track of all the problems and finding the best ones.

I always carry around a notebook. I write down questions that seem interesting, especially when I’m bored. The mere act of writing them down either makes them less vague or reveals them to be hopelessly fuzzy. Sometimes I can solve a problem just by taking the time to state it precisely. And the act of writing down questions naturally triggers more questions.

Besides questions, I like ‘analogy charts’, consisting of two or more columns with analogous items lined up side by side. You can see one near the bottom of my 2nd article on quantropy. Quantropy is an idea born of the analogy between thermodynamics and quantum mechanics. This is a big famous analogy, which I’d known for decades, but writing down an analogy chart made me realize there was a hole in the analogy. In thermodynamics we have entropy, so what’s the analogous thing in quantum mechanics? It turns out there’s an answer: quantropy.

I later wrote a paper with Blake Pollard on quantropy, but I gave a link to the blog article because that’s another aspect of how I keep track of questions. I don’t just write lists for myself—I write blog articles about things that I want to understand better.

(2) Only work on problems when you think they’re important and you see how to solve them.

This tactic isn’t for everyone, but it works for me. When I was just getting started I would try to solve problems that I had no idea how to solve. People who are good at puzzles may succeed this way, but I generally did not.

It turns out that for me, a better approach is to make long lists of questions, and keep thinking about them on and off for many years. I slowly make progress until—poof!—I think I see something new and important. Only then do a take a problem off the back burner and start intensely working on it.

The physicist John Wheeler put it this way: you should never do a calculation until you already know the answer. That’s a bit of an exaggeration, because it’s also good to fool around and see where things go. But there’s a lot more truth to it than you might think.

Feynman had a different but related rule of thumb: he only worked on a problem when he felt he had an “inside track” on it—some insight or trick up his sleeve that nobody else had.

LM: And once you’ve chosen a problem to solve, what are some of your preferred tactics for actually solving it?

JB: By what I’ve said before, it’s clear that I get serious about a problem only after I have a darn good idea of how to solve it. At the very least, I believe I know what to do. So, I just do it.

But usually it doesn’t work quite that easily.

If you only officially tackle problems after absolutely every wrinkle has been ironed out by your previous musings, you’re being too cautious: you’ll miss working on a lot of interesting things. Many young researchers seem to fall prey to the opposite error, and waste time being completely stuck. The right balance lies in the middle. You break a problem down into sub-problems, and break those down into sub-subproblems… and you decide you’re ready to go when all these sub-subproblems seem likely to be doable, even before you’ve worked through the details.

How can you tell if they’re doable? This depends a lot on having previous experience with similar problems. If you’re a newbie, things that seem hard to you can be really easy to experts, while things that seem easy can turn out to be famously difficult.

Even with experience, some of sub-subproblems that seem likely to be routine will turn out to be harder than expected. That’s where the actual work comes in. And here it’s good to have lots of tricks. For example:

(1) If you can’t solve a problem, there should be a similar problem that’s a bit easier. Try solving that. And if you can’t solve that one… use the same principle again! Keep repeating until you get down to something you can solve. Then climb your way back up, one step at a time.

Don’t be embarrassed to simplify a problem to the point where you can actually do it.

(2) There are lots of ways to make a problem easier. Sometimes you should consider a special case. In math there are special cases of special cases of special cases… so there’s a lot of room for exploration here. If you see how enough special cases work, you’ll get ideas that may help you for your original problem.

(3) On the other hand, sometimes a problem becomes simpler when you generalize, leaving out potentially irrelevant details. Often people get stuck in clutter. But if it turns out the generalization doesn’t work, it may help you see which details were actually relevant.

(4) Sometimes instead of down or up the ladder of generality it pays to move across, by considering an analogous problem in a related field.

(5) Finally, a general hint: keep a written record of your efforts to solve a problem, including explanations of what didn’t work, and why. Look back at what you wrote from time to time. It’s amazing how often I come close to doing something right, forget about it, and come back later—sometimes years later—and see things from a slightly different angle, which makes everything fall into place. Failure can be just millimeters from success.


Relative Entropy (Part 4)

16 February, 2014

In recent posts by Manoj Gopalkrishnan and Marc Harper we’ve seen how not just entropy but relative entropy—the entropy of a probability distribution relative to the equilibrium distribution—is a driving force in chemistry and evolution. Now Tobias Fritz and I have finally finished our paper on this subject:

A Bayesian characterization of relative entropy.

Very roughly, we proved that any reasonable measure of the information you gain when you to update your assumptions about the world based on discovering what a system is really doing must be some constant c times the relative entropy.

I’ve blogged about this here before:

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, \mathrm{FinStat}, and how relative entropy defines a functor on this category.

Relative Entropy (Part 3): statement of our main theorem, which characterizes relative entropy up to a constant multiple as the only functor F : \mathrm{FinStat} \to [0,\infty) with a few nice properties.

Now that the paper is done, we’re having a nice conversation about it on the n-Category Café. Since I don’t want to splinter the conversation, I won’t enable comments here—please go there and join the fun!

But having blogged about this thrice before, what’s new?

One thing is that our conversation is getting more deeply into the category-theoretic aspects. Read the long parenthetical remarks in my post on the n-Café to get up to speed on that aspect.

Another is that by looking at our paper, you can actually see the proof of our result. As I mention on the n-Café.

The proof is surprisingly hard. Or maybe we’re just surprisingly bad at proving things. But the interesting thing is this: the proof is swift and effective in the ‘generic’ case—the case where the support of the probability measures involved is the whole set they’re living on, and the constant c is finite.

It takes some more work to handle the case where the probability measures have smaller support.

But the really hard work starts when we handle the case that, in the end, has c = \infty. Then the proof becomes more like analysis than what you normally expect in category theory. We slowly corner the result, blocking off all avenues of escape. Then we close in, grab its neck, and strangle it, crushing its larynx ever tighter, as it loses the will to fight back and finally expires… still twitching.

We haven’t gotten into discussing this much yet, perhaps because the mathematicians on the n-Café are too dainty and civilized. But someone into analysis might be able to find a more efficient proof.

That would make me a bit sad—since why didn’t we find it?—but mainly happy—since this subject deserves to be clean and elegant. We really need a category-theoretic formulation of the second law of thermodynamics that’s suitable for studying complex networks: that’s the long-term goal here.


Triangular Numbers

12 February, 2014

This post is just for fun. I’ll start with Pascal’s triangle and show you the number e hiding inside it. Using this, we’ll see how the product of all numbers in the nth row of Pascal’s triangle is related to the nth triangular number.

That’s cute, because Pascal’s triangle looks so… triangular!

But then, with a massive amount of help from Greg Egan, we’ll dig a lot deeper, and meet strange things like superfactorials, the magic properties of the number 1/12, and the Glaisher–Kinkelin constant.

First let’s get warmed up.

Warmup

Pascal’s triangle is a famous and much-studied thing. It was discovered long before Pascal. It looks like this:

We write 1’s down the edges. We get every other number in the triangle by adding the two numbers above it to its left and right. People call these numbers binomial coefficients, since you can also get them like this:

(x + y)^5 = x^5 + 5 x^4 y + 10 x^3 y^2 + 10 x^2 y^3 + 5 x y^4 + y^5

We get the 10 in 10 x^3 y^2 here because there are 10 ways to take 5 things and choose 3 to call x’s and 2 to call y’s. In general we have

\displaystyle{ (x + y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k} }

where the binomial coefficient \binom{n}{k} is the kth number on the nth row of Pascal’s triangle. Here count both rows and the numbers in a row starting from zero.

Since we can permute n things in

n! = 1 \cdot 2 \cdot 3 \cdot \cdots \cdot n

ways, there are

\displaystyle{ \frac{n!}{k! (n-k)!} }

ways to take n things and choose k of them to be x’s and (n-k) of them to be y’s. You see, permuting the x’s or the y’s doesn’t change our choice, so we have to divide by k! and (n-k)!.

So, the kth number in the nth row of Pascal’s triangle is:

\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }

All this will be painfully familiar to many of you. But I want everyone to have an fair chance at understanding the next section, where we’ll see something nice about Pascal’s triangle and the number

e \approx 2.718281828459045...

The hidden e in Pascal’s triangle

In 2012, a guy named Harlan Brothers found the number e hiding in Pascal’s triangle… in a very simple way!

If we add up all the numbers in the nth row of Pascal’s triangle we get 2^n. But what if we take the product of all these numbers? Let’s call it t_n. Then here’s what Brothers discovered:

\displaystyle{ \lim_{n \to \infty}  \frac{t_n t_{n+2}}{t^2_{n+1}} = e }

This may seem mysterious, but in fact it’s rather simple once you see the trick. I’ll use a nice argument given by Greg Egan.

We’ve said t_n is the product of all numbers in the nth row of Pascal’s triangle. Just for fun, let’s divide this by the product of all numbers in the next row:

\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }

And since this is so much fun, let’s do it again! Divide this quantity by its next value:

\displaystyle{ v_n = \frac{u_n}{u_{n+1}}   }

Now look:

\displaystyle{v_n = \frac{t_n/t_{n+1}}{t_{n+1}/t_{n+2}} = \frac{t_n t_{n+2}}{t_{n+1}^2}  }

So, this is the thing that should approach e.

But why does it approach e? To see this, first take Pascal’s triangle and divide each number by the number to its lower left. We get a second triangle of numbers. Then take each number in this second triangle and divide it by the number to its lower right! We get a third triangle, like this:

If you think a bit, you’ll see:

• In the first triangle, the product of all numbers in the nth row is t_n.

• In the second, the product of all numbers in the nth row is u_n.

• In the third, the product of all numbers in the nth row is v_n.

And look—there’s a cool pattern! In the third triangle, all the numbers in any given row are equal. In the row with n numbers, all those numbers equal

\displaystyle{ (n+1)/n = 1 + \frac{1}{n} }

So, the product of all these numbers is

\displaystyle{ \left(1 + \frac{1}{n}\right)^n }

But it’s a famous fact that

\displaystyle{ \lim_{n \to \infty}  \left(1 + \frac{1}{n}\right)^n = e}

Some people even use this as a definition of e. So,

\displaystyle{ \lim_{n \to \infty} v_n = e}

which is just what we wanted!

Puzzle 1. Use the formula I mentioned:

\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }

to show all the numbers in the same row of the third triangle are equal.

Triangular numbers

The number of balls in a triangular stack with n balls at the bottom is called the nth triangular number,

T_n = 1 + 2 + \cdots + n

If you chop a square of balls in half along a diagonal you get a triangle, so T_n is approximately half the nth square number:

\displaystyle{ T_n \approx \frac{n^2}{2} }

But if you chop the square in half this way, the balls on the diagonal get cut in half, so to get the nth triangle number we need to include their other halves:

T_n = \displaystyle{ \frac{n^2}{2} + \frac{n}{2} = \frac{n(n+1)}{2} }

These numbers go like

1, 3, 6, 10, \dots

and they are one of the simple pleasures of life, like sunshine and good bread. Spotting them in a knight-move zigzag pattern in the multiplication table was one of my first truly mathematical experiences. You can also see them in Pascal’s triangle:

because

T_n = \displaystyle{ \binom{n+1}{2} }

But today it’s time to see how T_n is related to t_n, the product of all the numbers in the nth row of Pascal’s triangle!

If we subtract each triangular number from the the one before it we get the numbers -n, and if we subtract each of these numbers from the one before it we get the number 1. This should remind you of something we’ve seen:

\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }

\displaystyle{ v_n = \frac{u_n}{u_{n+1}}   }

\displaystyle{ \lim_{n \to \infty} v_n = e }

Here we’re dividing instead of subtracting. But if take logarithms, we’ll be subtracting, and we get

\ln(u_n) = \ln(t_n) - \ln(t_{n+1})

\ln(v_n) = \ln(u_n) - \ln(u_{n+1})

\displaystyle{ \lim_{n \to \infty} \ln(v_n) = 1 }

What can we do with this? Well, suppose \ln(v_n) were equal to 1, instead of approaching it. Then \ln(t_n) could be the nth triangular number… and we’d get a cool formula for the product of all numbers in the nth row of Pascal’s triangle!

But since \ln(v_n) is just approximately equal to 1, we should only expect an approximate formula for the product of all numbers in the nth row of Pascal’s triangle:

\ln(t_n) \approx T_n

or

\displaystyle{ t_n \approx e^{T_n} = e^{n(n+1)/2}  }

This was my hope. But how good are these approximations? I left this as a puzzle on Google+, and then Greg Egan stepped in and solved it.

For starters, he graphed the ratio \ln(t_n)/T_n, and got this:

That looks pretty good: it looks like it’s approaching 1. But he also graphed the ratio t_n/e^{T_n}, and got this:

Not so good. Taking exponentials amplifies the errors. If we want a good asymptotic formula t_n, we have to work harder. And this is what Egan did.

Digging deeper

So far we’ve talked a lot about t_n, the product of all numbers in the nth row in Pascal’s triangle… but we haven’t actually computed it. Let’s try:

\begin{array}{ccl}  t_n &=& \displaystyle{ \binom{n}{0} \cdot \binom{n}{1} \cdot \cdots \cdot \binom{n}{n} }   \\  \\  &=& \displaystyle{ \frac{n!}{0! \cdot n!} \cdot \frac{n!}{1! \cdot (n-1)!} \cdot \cdots \cdot \frac{n!}{n! \cdot 0!} }   \\  \\  &=& \displaystyle{ \frac{(n!)^{n+1}}{\left(0! \cdot 1! \cdot \cdots \cdot n!\right)^2} }  \end{array}

So, we’re seeing the superfactorial

\displaystyle{ 0! \cdot 1! \cdot \cdots \cdot n! }

raise its pretty head. This is also called the Barnes G-function, presumably by people who want to make it sound more boring.

Actually that’s not fair: the Barnes G-function generalizes superfactorials to complex numbers, just as Euler’s gamma function generalizes factorials. Unfortunately, Euler made the mistake of defining his gamma function so that

\Gamma(n) = (n-1)!

when n = 1, 2, 3, \cdots. Everyone I trust assures me this was a bad idea, not some deep insight… but Barnes doubled down on Euler’s mistake and defined his G-function so that

G(n) = 0! \cdot 1! \cdot \cdots \cdot (n-2)!

So, we’ll have to be careful when looking up formulas on Wikipedia: the superfactorial of n is G(n+2). Thus, we have

\displaystyle{ t_n = \frac{(n!)^{n+1}}{G(n+2)^2} }

so

\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }

Now, there’s a great approximate formula for the logarithm of a factorial. It’s worth its weight in gold… or at least silver, so it’s called Stirling’s formula:

\displaystyle{ \ln(n!) = n \ln (n)  \; - \; n \; + \;\tfrac{1}{2} \ln(2 \pi n) \; +  \; \frac{1}{12n} \cdots }

where the dots mean stuff that goes to zero when divided by the last term, in the limit n \to \infty.

There’s also an approximate formula for the logarithm of the superfactorial! It’s a bit scary:

\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + }  \\  \\  &&  \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\  && \tfrac{1}{12} - \ln A \; + \cdots   \end{array}

where the dots mean stuff that actually goes to zero as n \to \infty.

What’s A? It’s the Glaisher–Kinkelin constant! If you’re tired of memorizing digits of pi and need a change of pace, you can look up the first 20,000 digits of the Glaisher–Kinkelin constant here, but roughly we have

A \approx 1.282427129100622636875342...

Of course most mathematicians don’t care much about the digits; what we really want to know is what this constant means!

Euler, who did some wacky nonrigorous calculations, once argued that

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

Riemann made this rigorous by defining

\displaystyle{ \zeta(s) = 1^{-s} + 2^{-s} + 3^{-s} + \cdots }

which converges when \mathrm{Re}(s) > 1, and then analytically continuing this to other values of s. He found that indeed

\displaystyle{ \zeta(-1) = -\tfrac{1}{12} }

This fact has many marvelous ramifications: for example, it’s why bosonic string theory works best in 24 dimensions! It’s also connected to the \tfrac{1}{12n} term in Stirling’s formula.

But anyway, we might wonder what happens if we compute \zeta(s) for s near -1. This is where the Glaisher–Kinkelin constant shows up, because if we try a Taylor series we get

\displaystyle{ \zeta'(-1) = \tfrac{1}{12} - \ln A }

To me, this means that \tfrac{1}{12} - \ln A is more fundamental than A itself, and indeed you’ll see it’s this combination that shows up in the approximate formula for superfactorials. So, we can simplify that formula a bit:

\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + }  \\  \\  &&  \tfrac{\ln(2\pi)}{2} \, (n+1) \; -  \\ \\ &&  \tfrac{1}{12} \ln(n+1) \; + \\ \\  &&   \zeta'(-1) \; + \cdots   \end{array}

Now, let’s use this together with Stirling’s formula to estimate the logarithm of the product of all numbers in the nth row of Pascal’s triangle. Remember, that’s

\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }

So, we get

\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\  \\  && - \displaystyle{ \Big[  \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\  && \quad \displaystyle{   \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big]  }  \end{array}

and exponentiating this gives a good approximation to t_n.

Here is a graph of t_n divided by this approximation, created by Egan:

As you can see, the ratio goes to 1 quite nicely!

So, we’ve seem some nice relationships between these things:

1 + 2 + \cdots + n = T_n

1 \cdot 2 \cdot \cdots \cdot n = n!

\binom{n}{1} \cdot \binom{n}{2} \cdot \cdots \cdot \binom{n}{n} = t_n

1! \cdot 2! \cdot \cdots \cdot n! = G(n+2)

\frac{1}{0!} +\frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots = e

and Euler’s wacky formula

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

Puzzle 2. Can you take the formula

\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\  \\  && - \displaystyle{ \Big[  \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\  && \quad \displaystyle{   \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big]  }  \end{array}

and massage it until it looks like n(n+1)/2 plus ‘correction terms’? How big are these correction terms?

References

Any errors in the formulas above are my fault. Here are the papers that first squeezed e out of Pascal’s triangle:

• Harlan J. Brothers, Pascal’s triangle: The hidden stor-e, The Mathematical Gazette, March 2012, 145.

• Harlan J. Brothers, Finding e in Pascal’s triangle, Mathematics Magazine 85 (2012), 51.

I learned the idea from here, thanks to Richard Elwes:

• Alexander Bogomolny, e in the Pascal Triangle, Interactive Mathematics Miscellany and Puzzles.

For how Euler derived his crazy identity

\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }

and how it’s relevant to string theory, try:

• John Baez, My favorite numbers: 24.


Network Theory Talks at Oxford

7 February, 2014

I’m giving some talks at Oxford:

Network Theory

Nature and the world of human technology are full of networks. People like to draw diagrams of networks: flow charts, electrical circuit diagrams, signal-flow graphs, Bayesian networks, Feynman diagrams and the like. Mathematically minded people know that in principle these diagrams fit into a common framework: category theory. But we are still far from a unified theory of networks. After an overview, we will look at three portions of the jigsaw puzzle in three separate talks:

I. Electrical circuits and signal-flow graphs.

II. Stochastic Petri nets, chemical reaction networks and Feynman diagrams.

III. Bayesian networks, information and entropy.

If you’re nearby I hope you can come! All these talks will take place in Lecture Theatre B in the Computer Science Department—see the map below. Here are the times:

• Friday 21 February 2014, 2 pm: Network Theory: overview. See the slides or watch a video.

• Tuesday 25 February, 3:30 pm: Network Theory I: electrical circuits and signal-flow graphs. See the slides or watch a video.

• Tuesday 4 March, 3:30 pm: Network Theory II: stochastic Petri nets, chemical reaction networks and Feynman diagrams. See the slides or watch a video.

• Tuesday 11 March, 3:30 pm: Network Theory III: Bayesian networks, information and entropy. See the slides or watch a video

The first talk will be part of the OASIS series, meaning the “Oxford Advanced Seminar on Informatic Structures”.

I thank Samson Abramsky, Bob Coecke and Jamie Vicary of the Computer Science Department for inviting me, and Ulrike Tillmann and Minhyong Kim of the Mathematical Institute for helping me get set up. I also thank all the people who helped do the work I’ll be talking about, most notably Jacob Biamonte, Jason Erbele, Brendan Fong, Tobias Fritz, Tom Leinster, Tu Pham, and Franciscus Rebro.

Ulrike Tillmann has also kindly invited me to give a topology seminar:

Operads and the Tree of Life

Trees are not just combinatorial structures: they are also biological structures, both in the obvious way but also in the study of evolution. Starting from DNA samples from living species, biologists use increasingly sophisticated mathematical techniques to reconstruct the most likely “phylogenetic tree” describing how these species evolved from earlier ones. In their work on this subject, they have encountered an interesting example of an operad, which is obtained by applying a variant of the Boardmann–Vogt “W construction” to the operad for commutative monoids. The operations in this operad are labelled trees of a certain sort, and it plays a universal role in the study of stochastic processes that involve branching. It also shows up in tropical algebra. This talk is based on work in progress with Nina Otter.

I’m not sure exactly where this will take place, but surely somewhere in the Mathematical Institute building:

• Monday 24 February, 3:30 pm, Operads and the Tree of Life. See the slides.

The Computer Science Department is shown in the map here:

 

The Mathematical Institute is a bit to the west:


Categories in Control – Erlangen Talk

6 February, 2014

 

I’m visiting Erlangen from now until the end of May, since my wife got a grant to do research here. I’m trying to get a lot of papers finished. But today I’m giving a talk in the math department of the university here, which with Germanic brevity is called the Friedrich-Alexander-Universität Erlangen-Nürnberg.

You can see my slides here, or maybe even come to my talk:

Categories in control, Thursday 6 February 2014, 16:15–18:00, Mathematics Department of the FAU, in Übungsraum 1.

The title is a pun. It’s about categories in control theory, the branch of engineering that studies dynamical systems with inputs and outputs, and how to optimize their behavior.

Control theorists often describe these systems using signal-flow graphs. Here is a very rough schematic signal-flow graph, describing the all-important concept of a ‘feedback loop’:

Here is a detailed one, describing a specific device called a servo:

The device is shown on top, and the signal-flow graph describing its behavior is at bottom. For details, click on the picture.

Now, if you have a drop of category-theorist’s blood in your veins, you’ll look at this signal-flow graph and think my god, that’s a string diagram for a morphism in a monoidal category!

And you’d be right. But if you want to learn what that means, and why it matters, read my talk slides!

The slides should make sense if you’re a mathematician, but maybe not otherwise. So, here’s the executive summary. The same sort of super-abstract math that handles things like Feynman diagrams:

also handles signal-flow graphs. The details are different in important and fascinating ways, and this is what I’m mainly concerned with. But we now understand how signal-flow graphs fit into the general theory of networks. This means we can proceed to use modern math to study them—and their relation to other kinds of networks, like electrical circuit diagrams:

More talks

Thanks to the Azimuth Project team, my graduate students and many other folks, the dream of network theory as a step toward ‘green mathematics’ seems to be coming true! There’s a vast amount left to be done, so I’d have trouble convincing a skeptic, but I feel the project has turned a corner. I now feel in my bones that it’s going to work: we’ll eventually develop a language for biology and ecology based in part on category theory.

So, I think it’s a good time to explain all the various aspects of this project that have been cooking away—some quite visibly, but others on secret back burners:

• Jacob Biamonte and I have written a book on Petri nets and chemical reaction networks. You may have seen parts of this on the blog. We started this project at the Centre for Quantum Technologies, but now he’s working at the Institute for Scientific Interchange, in Turin—and collaborating with people there on various aspects of network theory.

• Brendan Fong is working with me on electrical circuits. You may know him for his posts here on chemical reaction networks. He’s now a grad student in computer science at Oxford.

• Jason Erbele, a math grad student at U.C. Riverside, is working with me on control theory. This work is the main topic of my talk—but I also sketch how it ties together with what Brendan is doing. There’s a lot more to say here.

• Tobias Fritz, a postdoc at the Perimeter Institute, is working with me on category-theoretic aspects of information theory. We published a paper on entropy with Tom Leinster, and we’ve got a followup on relative entropy that’s almost done. I should be working on it right this instant! But for now, read the series of posts here on Azimuth: Relative Entropy Part 1, Part 2 and Part 3.

• Brendan Fong has also done some great work on Bayesian networks, using ideas that connect nicely to what Tobias and I are doing.

• Tu Pham and Franciscus Rebro are working on the math that underlies all these projects: bicategories of spans.

The computer science department at Oxford is a great place for category theory and diagrammatic reasoning, thanks to the presence of Samson Abramsky, Bob Coecke and others. I’m going to visit them from February 21 to March 14. It seems like a good time to give a series of talks on this stuff. So, stay tuned! I’ll try to make slides available here.


Follow

Get every new post delivered to your Inbox.

Join 3,095 other followers