## Markov Models of Social Change (Part 2)

5 March, 2014

guest post by Vanessa Schweizer

This is my first post to Azimuth. It’s a companion to the one by Alaistair Jamieson-Lane. I’m an assistant professor at the University of Waterloo in Canada with the Centre for Knowledge Integration, or CKI. Through our teaching and research, the CKI focuses on integrating what appears, at first blush, to be drastically different fields in order to make the world a better place. The very topics I would like to cover today, which are mathematics and policy design, are an example of our flavour of knowledge integration. However, before getting into that, perhaps some background on how I got here would be helpful.

### The conundrum of complex systems

For about eight years, I have focused on various problems related to long-term forecasting of social and technological change (long-term meaning in excess of 10 years). I became interested in these problems because they are particularly relevant to how we understand and respond to global environmental changes such as climate change.

In case you don’t know much about global warming or what the fuss is about, part of what makes the problem particularly difficult is that the feedback from the physical climate system to human political and economic systems is exceedingly slow. It is so slow, that under traditional economic and political analyses, an optimal policy strategy may appear to be to wait before making any major decisions – that is, wait for scientific knowledge and technologies to improve, or at least wait until the next election [1]. Let somebody else make the tough (and potentially politically unpopular) decisions!

The problem with waiting is that the greenhouse gases that scientists are most concerned about stay in the atmosphere for decades or centuries. They are also churned out by the gigatonne each year. Thus the warming trends that we have experienced for the past 30 years, for instance, are the cumulative result of emissions that happened not only recently but also long ago—in the case of carbon dioxide, as far back as the turn of the 20th century. The world in the 1910s was quainter than it is now, and as more economies around the globe industrialize and modernize, it is natural to wonder: how will we manage to power it all? Will we still rely so heavily on fossil fuels, which are the primary source of our carbon dioxide emissions?

Such questions are part of what makes climate change a controversial topic. Present-day policy decisions about energy use will influence the climatic conditions of the future, so what kind of future (both near-term and long-term) do we want?

### Futures studies and trying to learn from the past

Many approaches can be taken to answer the question of what kind of future we want. An approach familiar to the political world is for a leader to espouse his or her particular hopes and concerns for the future, then work to convince others that those ideas are more relevant than someone else’s. Alternatively, economists do better by developing and investigating different simulations of economic developments over time; however, the predictive power of even these tools drops off precipitously beyond the 10-year time horizon.

The limitations of these approaches should not be too surprising, since any stockbroker will say that when making financial investments, past performance is not necessarily indicative of future results. We can expect the same problem with rhetorical appeals, or economic models, that are based on past performances or empirical (which also implies historical) relationships.

### A different take on foresight

A different approach avoids the frustration of proving history to be a fickle tutor for the future. By setting aside the supposition that we must be able to explain why the future might play out a particular way (that is, to know the ‘history’ of a possible future outcome), alternative futures 20, 50, or 100 years hence can be conceptualized as different sets of conditions that may substantially diverge from what we see today and have seen before. This perspective is employed in cross-impact balance analysis, an algorithm that searches for conditions that can be demonstrated to be self-consistent [3].

Findings from cross-impact balance analyses have been informative for scientific assessments produced by the Intergovernmental Panel on Climate Change Research, or IPCC. To present a coherent picture of the climate change problem, the IPCC has coordinated scenario studies across economic and policy analysts as well as climate scientists since the 1990s. Prior to the development of the cross-impact balance method, these researchers had to do their best to identify appropriate ranges for rates of population growth, economic growth, energy efficiency improvements, etc. through their best judgment.

A retrospective using cross-impact balances on the first Special Report on Emissions Scenarios found that the researchers did a good job in many respects. However, they underrepresented the large number of alternative futures that would result in high greenhouse gas emissions in the absence of climate policy [4].

As part of the latest update to these coordinated scenarios, climate change researchers decided it would be useful to organize alternative futures according socio-economic conditions that pose greater or fewer challenges to mitigation and adaptation. Mitigation refers to policy actions that decrease greenhouse gas emissions, while adaptation refers to reducing harms due to climate change or to taking advantage of benefits. Some climate change researchers argued that it would be sufficient to consider alternative futures where challenges to mitigation and adaptation co-varied, e.g. three families of futures where mitigation and adaptation challenges would be low, medium, or high.

Instead, cross-impact balances revealed that mixed-outcome futures—such as socio-economic conditions simultaneously producing fewer challenges to mitigation but greater challenges to adaptation—could not be completely ignored. This counter-intuitive finding, among others, brought the importance of quality of governance to the fore [5].

Although it is generally recognized that quality of governance—e.g. control of corruption and the rule of law—affects quality of life [6], many in the climate change research community have focused on technological improvements, such as drought-resistant crops, or economic incentives, such as carbon prices, for mitigation and adaptation. The cross-impact balance results underscored that should global patterns of quality of governance across nations take a turn for the worse, poor governance could stymie these efforts. This is because the influence of quality of governance is pervasive; where corruption is permitted at the highest levels of power, it may be permitted at other levels as well—including levels that are responsible for building schools, teaching literacy, maintaining roads, enforcing public order, and so forth.

The cross-impact balance study revealed this in the abstract, as summarized in the example matrices below. Alastair included a matrix like these in his post, where he explained that numerical judgments in such a matrix can be used to calculate the net impact of simultaneous influences on system factors. My purpose in presenting these matrices is a bit different, as the matrix structure can also explain why particular outcomes behave as system attractors.

In this example, a solid light gray square means that the row factor directly influences the column factor some amount, while white space means that there is no direct influence:

Dark gray squares along the diagonal have no meaning, since everything is perfectly correlated to itself. The pink squares highlight the rows for the factors “quality of governance” and “economy.” The importance of these rows is more apparent here; the matrix above is a truncated version of this more detailed one:

(Click to enlarge.)

The pink rows are highlighted because of a striking property of these factors. They are the two most influential factors of the system, as you can see from how many solid squares appear in their rows. The direct influence of quality of governance is second only to the economy. (Careful observers will note that the economy directly influences quality of governance, while quality of governance directly influences the economy). Other scholars have meticulously documented similar findings through observations [7].

As a method for climate policy analysis, cross-impact balances fill an important gap between genius forecasting (i.e., ideas about the far-off future espoused by one person) and scientific judgments that, in the face of deep uncertainty, are overconfident (i.e. neglecting the ‘fat’ or ‘long’ tails of a distribution).

### Wanted: intrepid explorers of future possibilities

However, alternative visions of the future are only part of the information that’s needed to create the future that is desired. Descriptions of courses of action that are likely to get us there are also helpful. In this regard, the post by Jamieson-Lane describes early work on modifying cross-impact balances for studying transition scenarios rather than searching primarily for system attractors.

This is where you, as the mathematician or physicist, come in! I have been working with cross-impact balances as a policy analyst, and I can see the potential of this method to revolutionize policy discussions—not only for climate change but also for policy design in general. However, as pointed out by entrepreneurship professor Karl T. Ulrich, design problems are NP-complete. Those of us with lesser math skills can be easily intimidated by the scope of such search problems. For this reason, many analysts have resigned themselves to ad hoc explorations of the vast space of future possibilities. However, some analysts like me think it is important to develop methods that do better. I hope that some of you Azimuth readers may be up for collaborating with like-minded individuals on the challenge!

### References

The graph of carbon emissions is from reference [2]; the pictures of the matrices are adapted from reference [5]:

[1] M. Granger Morgan, Milind Kandlikar, James Risbey and Hadi Dowlatabadi, Why conventional tools for policy analysis are often inadequate for problems of global change, Climatic Change 41 (1999), 271–281.

[2] T.F. Stocker et al., Technical Summary, in Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (2013), T.F. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex, and P.M. Midgley (eds.) Cambridge University Press, New York.

[3] Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

[4] Vanessa J. Schweizer and Elmar Kriegler, Improving environmental change research with systematic techniques for qualitative scenarios, Environmental Research Letters 7 (2012), 044011.

[5] Vanessa J. Schweizer and Brian C. O’Neill, Systematic construction of global socioeconomic pathways using internally consistent element combinations, Climatic Change 122 (2014), 431–445.

[6] Daniel Kaufman, Aart Kray and Massimo Mastruzzi, Worldwide Governance Indicators (2013), The World Bank Group.

[7] Daron Acemoglu and James Robinson, The Origins of Power, Prosperity, and Poverty: Why Nations Fail. Website.

## Network Theory I

2 March, 2014

Here’s a video of a talk I gave last Tuesday—part of a series. You can see the slides here:

One reason I’m glad I gave this talk is because afterwards Jamie Vicary pointed out some very interesting consequences of the relations among signal-flow diagrams listed in my talk. It turns out they imply equations familiar from the theory of complementarity in categorical quantum mechanics!

This is the kind of mathematical surprise that makes life worthwhile for me. It seemed utterly shocking at first, but I think I’ve figured out why it happens. Now is not the time to explain… but I’ll have to do it soon, both here and in the paper that Jason Eberle are writing about control theory.

• Brendan Fong, A compositional approach to control theory.

## Markov Models of Social Change (Part 1)

24 February, 2014

guest post by Alastair Jamieson-Lane

The world is complex, and making choices in a complex world is sometimes difficult.

As any leader knows, decisions must often be made with incomplete information. To make matters worse, the experts and scientists who are meant to advise on these important matters are also doing so with incomplete information—usually limited to only one or two specialist fields. When decisions need to be made that are dependent on multiple real-world systems, and your various advisors find it difficult to communicate, this can be problematic!

The generally accepted approach is to listen to whichever advisor tells you the things you want to hear.

When such an approach fails (for whatever mysterious and inexplicable reason) it might be prudent to consider such approaches as Bayesian inference, analysis of competing hypotheses or cross-impact balance analysis.

Because these methods require experts to formalize their opinions in an explicit, discipline neutral manner, we avoid many of the problems mentioned above. Also, if everything goes horribly wrong, you can blame the algorithm, and send the rioting public down to the local university to complain there.

In this blog article I will describe cross-impact balance analysis and a recent extension to this method, explaining its use, as well as some basic mathematical underpinnings. No familiarity with cross-impact balance analysis will be required.

### Wait—who is this guy?

Since this is my first time writing a blog post here, I hear introductions are in order.

Hi. I’m Alastair.

I am currently a Master’s student at the University of British Columbia, studying mathematics. In particular, I’m aiming to use evolutionary game theory to study academic publishing and hiring practices… and from there hopefully move on to studying governments (we’ll see how the PhD goes). I figure that both those systems seem important to solving the problems we’ve built for ourselves, and both may be under increasing pressure in coming years.

But that’s not what I’m here for today! Today I’m here to tell the story of cross-impact balance analysis, a tool I was introduced to at the complex systems summer school in Santa Fe.

### The story

Suppose (for example) that the local oracle has foretold that burning the forests will anger the nature gods

… and that if you do not put restrictions in place, your crops will wither and die.

Well, that doesn’t sound very good.

The merchant’s guild claims that such restrictions will cause all trade to grind to a halt.

Your most trusted generals point out that weakened trade will leave you vulnerable to invasion from all neighboring kingdoms.

The sailors guild adds that the wrath of Poseidon might make nautical trade more difficult.

The alchemists propose alternative sources of heat…

… while the druids propose special crops as a way of resisting the wrath of the gods…

… and so on.

Given this complex web of interaction, it might be a good time to consult the philosophers.

### Overview of CIB

This brings us to the question of what CIB (Cross-Impact Balance) analysis is, and how to use it.

At its heart, CIB analysis demands this: first, you must consider what aspects of the world you are interested in studying. This could be environmental or economic status, military expenditure, or the laws governing genetic modification. These we refer to as “descriptors”. For each “descriptor” we must create a list of possible “states”.

For example, if the descriptor we are interested in were “global temperature change” our states might be “+5 degree”, “+4 degrees” and so on down to “-2 degrees”.

The states of a descriptor are not meant to be all-encompassing, or offer complete detail, and they need not be numerical. For example, the descriptor “Agricultural policy” might have such states as “Permaculture subsidy”, “Genetic engineering”, “Intensive farming” or “No policy”.

For each of these states, we ask our panel of experts whether such a state would increase or decrease the tendency for some other descriptor to be in a particular state.

For example, we might ask: “On a scale from -3 to 3, how much does the agricultural policy of Intensive farming increase the probability that we will see global temperature increases of +2 degrees?”

By combining the opinions of a variety of experts in each field, and weighting based on certainty and expertise, we are able to construct matrices, much like the one below:

The above matrix is a description of my ant farm. The health of my colony is determined by the population, income, and education levels of my ants. For a less ant focused version of the above, please refer to:

• Elisabeth A. Lloyd and Vanessa J. Schweizer, Objectivity and a comparison of methodological scenario approaches for climate change research, Synthese (2013).

For any possible combination of descriptor states (referred to as a scenario) we can calculate the total impact on all possible descriptors. In the current scenario we have low population, high income and medium education (see highlighted rows).

Because the current scenario has high ant income, this strongly influences us to have low population (+3) and prevents a jump to high population (-3). This combined with the non-influence from education (zeros) leads to low population being the most favoured state for our population descriptor. Thus we expect no change. We say this is “consistent”.

Education however sees a different story. Here we have a strong influence towards high education levels (summing the column gives a total of 13). Thus our current state (medium education) is inconsistent, and we would expect the abundance of ant wealth to lead to an improvements in the ant schooling system.

Classical CIB analysis acts as a way to classify which hypothetical situations are consistent, and which are not.

Now, it is all well and good to claim that some scenarios are stable, but the real use of such a tool is in predicting (and influencing) the future.

By applying a deterministic rule that determines how inconsistencies are resolved, we can produce a “succession rule”. The most straight-forward example is to replace all descriptor states with whichever state is most favoured by the current scenario. In the example above we would switch to “Low population, medium income, high education”. A generation later we would switch back to “Low population, High income, medium education”, soon finding ourselves trapped in a loop.

All such rules will always lead to either a loop or a “sink”: a self consistent scenario which is succeeded only by itself.

So, how can we use this? How will this help us deal with the wrath of the gods (or ant farms)?

Firstly: we can identify loops and consistent scenarios which we believe are most favourable. It’s all well and good imagining some future utopia, but if it is inconsistent with itself, and will immediately lead to a slide into less favourable scenarios then we should not aim for it, we should find that most favourable realistic scenario and aim for that one.

Secondly: We can examine all our consistent scenarios, and determine whose “basin of attraction” we find ourselves in: that is, which scenario are we likely to end up in.

Thirdly: Suppose we could change our influence matrix slightly? How would we change it to favour scenarios we most prefer? If you don’t like the rules, change the game—or at the very least find out WHAT we would need to change to have the best effect.

### Concerns and caveats

So… what are the problems we might encounter? What are the drawbacks?

Well, first of all, we note that the real world does not tend to reach any form of eternal static scenario or perfect cycle. The fact that our model does might be regarded as reason for suspicion.

Secondly, although the classical method contains succession analysis, this analysis is not necessarily intended as a completely literal “prediction” of events. It gives a rough idea of the basins of attraction of our cycles and consistent scenarios, but is also somewhat arbitrary. What succession rule is most appropriate? Do all descriptors update simultaneously? Or only the one with the most “pressure”? Are our descriptors given in order of malleability, and only the fastest changing descriptor will change?

Thirdly, in collapsing our description of the world down into a finite number of states we are ignoring many tiny details. Most of these details are not important, but in assuming that our succession rules are deterministic, we imply that these details have no impact whatsoever.

If we instead treat succession as a somewhat random process, the first two of these problems can be solved, and the third somewhat reduced.

### Stochastic succession

In the classical CIB succession analysis, some rule is selected which deterministically decides which scenario follows from the present. Stochastic succession analysis instead tells us the probability that a given scenario will lead to another.

The simplest example of a stochastic succession rule is to simply select a single descriptor at random each time step, and only consider updates that might happen to that descriptor. This we refer to as dice succession. This (in some ways) represents hidden information: two systems that might look identical on the surface from the point of view of our very blockish CIB analysis might be different enough underneath to lead to different outcomes. If we have a shaky agricultural system, but a large amount of up-and-coming research, then which of these two factors becomes important first is down to the luck of the draw. Rather than attempt to model this fine detail, we instead merely accept it and incorporate this uncertainty into our model.

Even this most simplistic change leads to dramatics effects on our system. Most importantly, almost all cycles vanish from our results, as forks in the road allow us to diverge from the path of the cycle.

We can take stochastic succession further and consider more exotic rules for our transitions, ones that allow any transition to take place, not merely those that are most favored. For example:

$P(x,y) = A e^{I_x(y)/T}$

Here $x$ is our current scenario, $y$ is some possible future scenario, and $I_x(y)$ is the total impact score of $y$ from the perspective of $x$. $A$ is a simple normalizing constant, and $T$ is our system’s temperature. High temperature systems are dominated by random noise, while low temperature systems are dominated by the influences described by our experts.

Impact score is calculated by summing the impact of each state of our current scenario, on each state of our target scenario. For example, for the above, suppose we want to find $I_x(y)$ when $x$ is the given scenario “Low population, High income, medium education” and $y$ was the scenario “Medium population, medium income, High education”. We consider all numbers that are in rows which were states of $x$ and in columns that are states of $y$. This would give:

$I_x(y)= (0+0+0) + (-2 +0 +10) +(6+7+0) = 21$

Here each bracket refers to the sum of a particular column.
More generically we can write the formula as:

$\displaystyle{ I_x(y)= \sum_{i \subset x, \;j \subset y} M_{i,j} }$

Here $M_{i,j}$ refers to an entry in our cross-impact balance matrix, $i$ and $j$ are both states, and $i \subset x$ reads as “$i$ is a state of $x$”.

We refer to this function for computing transition probabilities as the Boltzmann succession law, due to its similarity to the Boltzmann distribution found in physics. We use it merely as an example, and by no means wish to imply that we expect the transitions for our true system to act in a precisely Boltzmann-like manner. Alternative functions can, and should, be experimented with. The Boltzmann succession law is however an effective example and has a number of nice properties: $P(x,y)$ is always positive, unchanged by adding a constant to every element of the cross-impact balance matrix, contains adjustable parameters, and unbounded above.

The Boltzmann succession rule is what I will refer to as fully stochastic: it allows transitions even against our experts’ judgement (with low probability). This is in contrast to dice succession which picks a direction at random, but still contains scenarios from which our system can not escape.

### Effects of stochastic succession

‘Partially stochastic’ processes such as the dice rule have very limited effect on the long term behavior of the model. Aside from removing most cycles, they behave almost exactly like our deterministic succession rules. So, let us instead discuss the more interesting fully stochastic succession rules.

In the fully stochastic system we can ask “after a very long time, what is the probability we will be in scenario $x$?”

By asking this question we can get some idea of the relative importance of all our future scenarios and states.

For example, if the scenario “high population, low education, low income” has a 40% probability in the long term, while most other scenarios have a probability of 0.2%, we can see that this scenario is crucial to the understanding of our system. Often scenarios already identified by deterministic succession analysis are the ones with the greatest long term probability—but by looking at long term probability we also gain information about the relative importance of each scenario.

In addition, we can encounter scenarios which are themselves inconsistent, but form cycles and/or clusters of interconnected scenarios. We can also notice scenarios that while technically ‘consistent’ in the deterministic rules are only barely so, and have limited weight due to a limited basin of attraction. We might identify scenarios that seem familiar in the real world, but are apparently highly unlikely in our analysis, indicating either that we should expect change… or perhaps suggesting a missing descriptor or a cross-impact in need of tweaking.

Armed with such a model, we can investigate what we can do to increase the short term and long term likelihood of desirable scenarios, and decrease the likelihood of undesirable scenarios.

As a last note, here are a few freely available resources that may prove useful. For a more formal introduction to CIB, try:

• Wolfgang Weimer-Jehle, Cross-impact balances: a system-theoretical approach to cross-impact analysis, Technological Forecasting & Social Change 73 (2006), 334–361.

• Wolfgang Weimer-Jehle, Properties of cross-impact balance analysis.

You can find free software for doing a classical CIB analysis here:

• ZIRIUS, ScenarioWizard.

ZIRIUS is the Research Center for Interdisciplinary Risk and Innovation Studies of the University of Stuttgart.

Here are some examples of CIB in action:

• Gerhard Fuchs, Ulrich Fahl, Andreas Pyka, Udo Staber, Stefan Voegele and Wolfgang Weimer-Jehle, Generating innovation scenarios using the cross-impact methodology, Department of Economics, University of Bremen, Discussion-Papers Series No. 007-2008.

• Ortwin Renn, Alexander Jager, Jurgen Deuschle and Wolfgang Weimer-Jehle, A normative-functional concept of sustainability and its indicators, International Journal of Global Environmental Issues, 9 (2008), 291–317.

Finally, this page contains a more complete list of articles, both practical and theoretical:

## Network Theory Overview

22 February, 2014

Here’s a video of a talk I gave yesterday, made by Brendan Fong. You can see the slides here—and then click the items in blue, and the pictures, for more information!

The idea: nature and the world of human technology are full of networks! People like to draw diagrams of networks. Mathematical physicists know that in principle these diagrams can be understood using category theory. But why should physicists have all the fun? This is the century of understanding living systems and adapting to life on a finite planet. Math isn’t the main thing we need for this, but it’s got to be part of the solution… so one thing we should do is develop a unified and powerful theory of networks.

We are still far from doing this. In this overview, I briefly described three parts of the jigsaw puzzle, and invited everyone to help fit them together:

• electrical circuits and signal-flow graphs.

• stochastic Petri nets, chemical reaction networks and Feynman diagrams.

• Bayesian networks, information and entropy.

In my talks coming up, I’ll go into more detail on each of these.﻿ With luck, you’ll be able to see videos here.

But if you’re near Oxford, you might as well actually attend! You can see dates, times, locations, my slides, and the talks themselves as they show up by going here.

## Finding and Solving Problems

18 February, 2014

Luke Muelhauser, executive director of the Machine Intelligence Research Insitute, has been doing some interviews:

Recently he interviewed me. Here’s how it went.

LM: In a previous interview, I asked Scott Aaronson which “object-level research tactics” he finds helpful when trying to make progress in theoretical research, and I provided some examples. Do you have any comments on the research tactics that Scott and I listed? Which recommended tactics of your own would you add to the list?

JB: What do you mean by “object-level” research tactics? I’ve got dozens of tactics. Some of them are ways to solve problems. But equally important, or maybe more so, are tactics for coming up with problems to solve: problems that are interesting but still easy enough to solve. By “object-level”, do you mean the former?

LM: Both! Conceiving of—and crisply posing—good research problems can often be even more important than solving previously-identified research problems.

JB: Okay. Here are some of my tactics.

(1) Learn a lot. Try to understand how the whole universe works, from the philosophical, logical, mathematical and physical aspects to chemistry, biology, and the sciences based on those, to the historical sciences such as cosmology, paleontology, archaeology and history, to the social sciences such as psychology, sociology, anthropology, politics and economics, to the aspects that are captured best in literature, art and music.

It’s a never-ending quest, and obviously it pays to specialize and become more of an expert on a few things – but the more angles you can take on any subject, the more likely you are to stumble on good questions or good answers to existing questions. Also, when you get stuck on a problem, or get tired, it can be really re-energizing to learn new things.

(2) Keep synthesizing what you learn into terser, clearer formulations. The goal of learning is not to memorize vast amounts of data. You need to do serious data compression, and filter out the noise. Very often people will explain things to you in crappy ways, presenting special cases and not mentioning the general rules, stating general rules incorrectly, and so on.

This process goes on forever. When you first learn algebraic topology, for example, they teach you. homology theory. At the beginner’s level, this is presented as a rather complicated recipe for taking a topological space and getting a list of groups out of it. By looking at examples you get insight into what these groups do: the nth one counts the n-dimensional holes, in some sense. You learn how to use them to solve problems, and how to efficiently compute them.

But later—much later, in my case—you learn that algebraic topology of this sort not really about topological spaces, but something more abstract, called “homotopy types”. This is a discovery that happened rather slowly. It crystallized around the 1968, when a guy named Quillen wrote a book on “homotopical algebra”. It’s always fascinating when this happens: when people in some subject learn that its proper object of study is not what they had thought!

But even this was just the beginning: a lot has happened in math since the 1960s. Shortly thereafter, Grothendieck came along and gave us a new dream of what homotopy types might actually be. Very roughly, he realized that they should show up naturally if we think of “equality” as a process—the process of proving two thing are the same—rather than a static relationship.

I’m being pretty vague here, but I want to emphasize that this was a very fundamental discovery with widespread consequences, not a narrow technical thing.

For a long time people have struggled to make Grothendieck’s dream precise. I was involved in that myself for a while. But in the last 5 years or so, a guy named Voevodsky made a lot of progress by showing us how to redo the foundations of mathematics so that instead of treating equality as a mere relationship, it’s a kind of process. This new approach gives an alternative to set theory, where we use homotopy types right from the start as the basic objects of mathematics, instead of sets. It will take about a century for the effects of this discovery to percolate through all of math.

So, you see, by taking something important but rather technical, like algebraic topology, and refusing to be content with treating it as a bunch of recipes to be memorized, you can dig down into deep truths. But it takes great persistence. Even if you don’t discover these truths yourself, but merely learn them, you have to keep simplifying and unifying.

(3) Look for problems, not within disciplines, but in the gaps between existing disciplines. The division of knowledge into disciplines is somewhat arbitrary, and people put most of their energy into questions that lie squarely within disciplines, so it shouldn’t be surprising that many interesting things are lurking in the gaps, waiting to be discovered.

At this point, tactics (1) and (2) really come in handy. If you study lots of subjects and keep trying to distill their insights into terse, powerful formulations, you’re going to start noticing points of contact between these subjects. Sometimes these will be analogies that deserve to be made precise. Sometimes people in one subject know a trick that people in some other subject could profit from. Sometimes people in one subject have invented the hammer, and people in another have invented the nail—and neither know what these things are good for!

(4) Talk to lots of people. This is a great way to broaden your vistas and find connections between seemingly different subjects.

Talk to the smartest people who will condescend to talk to you. Don’t be afraid to ask them questions. But don’t bore them. Smart people tend to be easily bored. Try to let them talk about what’s interesting to them, instead of showing off and forcing them to listen to your brilliant ideas. But make sure to bring them some “gifts” so they’ll want to talk to you again. “Gifts” include clear explanations of things they don’t understand, and surprising facts—little nuggets of knowledge.

One of my strategies for this was to write This Week’s Finds, explaining lots of advanced math and physics. You could say that column is a big pile of gifts. I started out as a nobody, but after ten years or so, lots of smart people had found out about me. So now it’s pretty easy for me to blunder into any subject, write a blog post about it, and get experts to correct me or tell me more. I also get invited to give talks, where I meet lots of smart people.

LM: You’ve explained some tactics for how to come up with problems to solve. Once you generate a good list, how do you choose among them?

JB: Here are two bits of advice on that.

(1) Actually write down lists of problems.

When I was just getting started, I had a small stock of problems to think about – so small that I could remember most of them. Many were problems I’d heard from other people, but most of those were too hard. I would also generate my own problems, but they were often either too hard, too vague, or too trivial.

In more recent years I’ve been able to build up a huge supply of problems to think about. This means I need to actually list them. Often I generate these lists using the ‘data compression’ tactic I mentioned in part (2) of my last answer. When I learn stuff, I ask:

• Is this apparently new concept or fact a special case of some concept or fact I already know?

• Given two similar-sounding concepts or facts, can I find a third having both of these as special cases?

• Can I use the analogy between X and Y to do something new in subject Y that’s analogous to something people have already done in subject X?

• Given a rough ‘rule of thumb’, can I state it more precisely so that it holds always, or at least more often?

as well as plenty of more specific questions.

So, instead of being ‘idea-poor’, with very few problems to work on, I’m now ‘idea-rich’, and the challenge is keeping track of all the problems and finding the best ones.

I always carry around a notebook. I write down questions that seem interesting, especially when I’m bored. The mere act of writing them down either makes them less vague or reveals them to be hopelessly fuzzy. Sometimes I can solve a problem just by taking the time to state it precisely. And the act of writing down questions naturally triggers more questions.

Besides questions, I like ‘analogy charts’, consisting of two or more columns with analogous items lined up side by side. You can see one near the bottom of my 2nd article on quantropy. Quantropy is an idea born of the analogy between thermodynamics and quantum mechanics. This is a big famous analogy, which I’d known for decades, but writing down an analogy chart made me realize there was a hole in the analogy. In thermodynamics we have entropy, so what’s the analogous thing in quantum mechanics? It turns out there’s an answer: quantropy.

I later wrote a paper with Blake Pollard on quantropy, but I gave a link to the blog article because that’s another aspect of how I keep track of questions. I don’t just write lists for myself—I write blog articles about things that I want to understand better.

(2) Only work on problems when you think they’re important and you see how to solve them.

This tactic isn’t for everyone, but it works for me. When I was just getting started I would try to solve problems that I had no idea how to solve. People who are good at puzzles may succeed this way, but I generally did not.

It turns out that for me, a better approach is to make long lists of questions, and keep thinking about them on and off for many years. I slowly make progress until—poof!—I think I see something new and important. Only then do a take a problem off the back burner and start intensely working on it.

The physicist John Wheeler put it this way: you should never do a calculation until you already know the answer. That’s a bit of an exaggeration, because it’s also good to fool around and see where things go. But there’s a lot more truth to it than you might think.

Feynman had a different but related rule of thumb: he only worked on a problem when he felt he had an “inside track” on it—some insight or trick up his sleeve that nobody else had.

LM: And once you’ve chosen a problem to solve, what are some of your preferred tactics for actually solving it?

JB: By what I’ve said before, it’s clear that I get serious about a problem only after I have a darn good idea of how to solve it. At the very least, I believe I know what to do. So, I just do it.

But usually it doesn’t work quite that easily.

If you only officially tackle problems after absolutely every wrinkle has been ironed out by your previous musings, you’re being too cautious: you’ll miss working on a lot of interesting things. Many young researchers seem to fall prey to the opposite error, and waste time being completely stuck. The right balance lies in the middle. You break a problem down into sub-problems, and break those down into sub-subproblems… and you decide you’re ready to go when all these sub-subproblems seem likely to be doable, even before you’ve worked through the details.

How can you tell if they’re doable? This depends a lot on having previous experience with similar problems. If you’re a newbie, things that seem hard to you can be really easy to experts, while things that seem easy can turn out to be famously difficult.

Even with experience, some of sub-subproblems that seem likely to be routine will turn out to be harder than expected. That’s where the actual work comes in. And here it’s good to have lots of tricks. For example:

(1) If you can’t solve a problem, there should be a similar problem that’s a bit easier. Try solving that. And if you can’t solve that one… use the same principle again! Keep repeating until you get down to something you can solve. Then climb your way back up, one step at a time.

Don’t be embarrassed to simplify a problem to the point where you can actually do it.

(2) There are lots of ways to make a problem easier. Sometimes you should consider a special case. In math there are special cases of special cases of special cases… so there’s a lot of room for exploration here. If you see how enough special cases work, you’ll get ideas that may help you for your original problem.

(3) On the other hand, sometimes a problem becomes simpler when you generalize, leaving out potentially irrelevant details. Often people get stuck in clutter. But if it turns out the generalization doesn’t work, it may help you see which details were actually relevant.

(4) Sometimes instead of down or up the ladder of generality it pays to move across, by considering an analogous problem in a related field.

(5) Finally, a general hint: keep a written record of your efforts to solve a problem, including explanations of what didn’t work, and why. Look back at what you wrote from time to time. It’s amazing how often I come close to doing something right, forget about it, and come back later—sometimes years later—and see things from a slightly different angle, which makes everything fall into place. Failure can be just millimeters from success.

## Relative Entropy (Part 4)

16 February, 2014

In recent posts by Manoj Gopalkrishnan and Marc Harper we’ve seen how not just entropy but relative entropy—the entropy of a probability distribution relative to the equilibrium distribution—is a driving force in chemistry and evolution. Now Tobias Fritz and I have finally finished our paper on this subject:

Very roughly, we proved that any reasonable measure of the information you gain when you to update your assumptions about the world based on discovering what a system is really doing must be some constant $c$ times the relative entropy.

Relative Entropy (Part 1): how various structures important in probability theory arise naturally when you do linear algebra using only the nonnegative real numbers.

Relative Entropy (Part 2): a category related to statistical inference, $\mathrm{FinStat},$ and how relative entropy defines a functor on this category.

Relative Entropy (Part 3): statement of our main theorem, which characterizes relative entropy up to a constant multiple as the only functor $F : \mathrm{FinStat} \to [0,\infty)$ with a few nice properties.

Now that the paper is done, we’re having a nice conversation about it on the n-Category Café. Since I don’t want to splinter the conversation, I won’t enable comments here—please go there and join the fun!

One thing is that our conversation is getting more deeply into the category-theoretic aspects. Read the long parenthetical remarks in my post on the n-Café to get up to speed on that aspect.

Another is that by looking at our paper, you can actually see the proof of our result. As I mention on the n-Café.

The proof is surprisingly hard. Or maybe we’re just surprisingly bad at proving things. But the interesting thing is this: the proof is swift and effective in the ‘generic’ case—the case where the support of the probability measures involved is the whole set they’re living on, and the constant $c$ is finite.

It takes some more work to handle the case where the probability measures have smaller support.

But the really hard work starts when we handle the case that, in the end, has $c = \infty.$ Then the proof becomes more like analysis than what you normally expect in category theory. We slowly corner the result, blocking off all avenues of escape. Then we close in, grab its neck, and strangle it, crushing its larynx ever tighter, as it loses the will to fight back and finally expires… still twitching.

We haven’t gotten into discussing this much yet, perhaps because the mathematicians on the n-Café are too dainty and civilized. But someone into analysis might be able to find a more efficient proof.

That would make me a bit sad—since why didn’t we find it?—but mainly happy—since this subject deserves to be clean and elegant. We really need a category-theoretic formulation of the second law of thermodynamics that’s suitable for studying complex networks: that’s the long-term goal here.

## Triangular Numbers

12 February, 2014

This post is just for fun. I’ll start with Pascal’s triangle and show you the number e hiding inside it. Using this, we’ll see how the product of all numbers in the nth row of Pascal’s triangle is related to the nth triangular number.

That’s cute, because Pascal’s triangle looks so… triangular!

But then, with a massive amount of help from Greg Egan, we’ll dig a lot deeper, and meet strange things like superfactorials, the magic properties of the number 1/12, and the Glaisher–Kinkelin constant.

First let’s get warmed up.

### Warmup

Pascal’s triangle is a famous and much-studied thing. It was discovered long before Pascal. It looks like this:

We write 1′s down the edges. We get every other number in the triangle by adding the two numbers above it to its left and right. People call these numbers binomial coefficients, since you can also get them like this:

$(x + y)^5 = x^5 + 5 x^4 y + 10 x^3 y^2 + 10 x^2 y^3 + 5 x y^4 + y^5$

We get the 10 in $10 x^3 y^2$ here because there are 10 ways to take 5 things and choose 3 to call x’s and 2 to call y’s. In general we have

$\displaystyle{ (x + y)^n = \sum_{k=0}^n \binom{n}{k} x^k y^{n-k} }$

where the binomial coefficient $\binom{n}{k}$ is the kth number on the nth row of Pascal’s triangle. Here count both rows and the numbers in a row starting from zero.

Since we can permute n things in

$n! = 1 \cdot 2 \cdot 3 \cdot \cdots \cdot n$

ways, there are

$\displaystyle{ \frac{n!}{k! (n-k)!} }$

ways to take n things and choose k of them to be x’s and (n-k) of them to be y’s. You see, permuting the x’s or the y’s doesn’t change our choice, so we have to divide by $k!$ and $(n-k)!.$

So, the kth number in the nth row of Pascal’s triangle is:

$\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }$

All this will be painfully familiar to many of you. But I want everyone to have an fair chance at understanding the next section, where we’ll see something nice about Pascal’s triangle and the number

$e \approx 2.718281828459045...$

### The hidden e in Pascal’s triangle

In 2012, a guy named Harlan Brothers found the number e hiding in Pascal’s triangle… in a very simple way!

If we add up all the numbers in the nth row of Pascal’s triangle we get $2^n$. But what if we take the product of all these numbers? Let’s call it $t_n.$ Then here’s what Brothers discovered:

$\displaystyle{ \lim_{n \to \infty} \frac{t_n t_{n+2}}{t^2_{n+1}} = e }$

This may seem mysterious, but in fact it’s rather simple once you see the trick. I’ll use a nice argument given by Greg Egan.

We’ve said $t_n$ is the product of all numbers in the nth row of Pascal’s triangle. Just for fun, let’s divide this by the product of all numbers in the next row:

$\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }$

And since this is so much fun, let’s do it again! Divide this quantity by its next value:

$\displaystyle{ v_n = \frac{u_n}{u_{n+1}} }$

Now look:

$\displaystyle{v_n = \frac{t_n/t_{n+1}}{t_{n+1}/t_{n+2}} = \frac{t_n t_{n+2}}{t_{n+1}^2} }$

So, this is the thing that should approach e.

But why does it approach e? To see this, first take Pascal’s triangle and divide each number by the number to its lower left. We get a second triangle of numbers. Then take each number in this second triangle and divide it by the number to its lower right! We get a third triangle, like this:

If you think a bit, you’ll see:

• In the first triangle, the product of all numbers in the nth row is $t_n.$

• In the second, the product of all numbers in the nth row is $u_n.$

• In the third, the product of all numbers in the nth row is $v_n.$

And look—there’s a cool pattern! In the third triangle, all the numbers in any given row are equal. In the row with n numbers, all those numbers equal

$\displaystyle{ (n+1)/n = 1 + \frac{1}{n} }$

So, the product of all these numbers is

$\displaystyle{ \left(1 + \frac{1}{n}\right)^n }$

But it’s a famous fact that

$\displaystyle{ \lim_{n \to \infty} \left(1 + \frac{1}{n}\right)^n = e}$

Some people even use this as a definition of e. So,

$\displaystyle{ \lim_{n \to \infty} v_n = e}$

which is just what we wanted!

Puzzle 1. Use the formula I mentioned:

$\displaystyle{\binom{n}{k} = \frac{n!}{k! (n-k)!} }$

to show all the numbers in the same row of the third triangle are equal.

### Triangular numbers

The number of balls in a triangular stack with n balls at the bottom is called the nth triangular number,

$T_n = 1 + 2 + \cdots + n$

If you chop a square of balls in half along a diagonal you get a triangle, so $T_n$ is approximately half the nth square number:

$\displaystyle{ T_n \approx \frac{n^2}{2} }$

But if you chop the square in half this way, the balls on the diagonal get cut in half, so to get the nth triangle number we need to include their other halves:

$T_n = \displaystyle{ \frac{n^2}{2} + \frac{n}{2} = \frac{n(n+1)}{2} }$

These numbers go like

$1, 3, 6, 10, \dots$

and they are one of the simple pleasures of life, like sunshine and good bread. Spotting them in a knight-move zigzag pattern in the multiplication table was one of my first truly mathematical experiences. You can also see them in Pascal’s triangle:

because

$T_n = \displaystyle{ \binom{n+1}{2} }$

But today it’s time to see how $T_n$ is related to $t_n$, the product of all the numbers in the nth row of Pascal’s triangle!

If we subtract each triangular number from the the one before it we get the numbers $-n$, and if we subtract each of these numbers from the one before it we get the number 1. This should remind you of something we’ve seen:

$\displaystyle{ u_n = \frac{t_n}{t_{n+1}} }$

$\displaystyle{ v_n = \frac{u_n}{u_{n+1}} }$

$\displaystyle{ \lim_{n \to \infty} v_n = e }$

Here we’re dividing instead of subtracting. But if take logarithms, we’ll be subtracting, and we get

$\ln(u_n) = \ln(t_n) - \ln(t_{n+1})$

$\ln(v_n) = \ln(u_n) - \ln(u_{n+1})$

$\displaystyle{ \lim_{n \to \infty} \ln(v_n) = 1 }$

What can we do with this? Well, suppose $\ln(v_n)$ were equal to 1, instead of approaching it. Then $\ln(t_n)$ could be the nth triangular number… and we’d get a cool formula for the product of all numbers in the nth row of Pascal’s triangle!

But since $\ln(v_n)$ is just approximately equal to 1, we should only expect an approximate formula for the product of all numbers in the nth row of Pascal’s triangle:

$\ln(t_n) \approx T_n$

or

$\displaystyle{ t_n \approx e^{T_n} = e^{n(n+1)/2} }$

This was my hope. But how good are these approximations? I left this as a puzzle on Google+, and then Greg Egan stepped in and solved it.

For starters, he graphed the ratio $\ln(t_n)/T_n,$ and got this:

That looks pretty good: it looks like it’s approaching 1. But he also graphed the ratio $t_n/e^{T_n},$ and got this:

Not so good. Taking exponentials amplifies the errors. If we want a good asymptotic formula $t_n,$ we have to work harder. And this is what Egan did.

### Digging deeper

So far we’ve talked a lot about $t_n,$ the product of all numbers in the nth row in Pascal’s triangle… but we haven’t actually computed it. Let’s try:

$\begin{array}{ccl} t_n &=& \displaystyle{ \binom{n}{0} \cdot \binom{n}{1} \cdot \cdots \cdot \binom{n}{n} } \\ \\ &=& \displaystyle{ \frac{n!}{0! \cdot n!} \cdot \frac{n!}{1! \cdot (n-1)!} \cdot \cdots \cdot \frac{n!}{n! \cdot 0!} } \\ \\ &=& \displaystyle{ \frac{(n!)^{n+1}}{\left(0! \cdot 1! \cdot \cdots \cdot n!\right)^2} } \end{array}$

So, we’re seeing the superfactorial

$\displaystyle{ 0! \cdot 1! \cdot \cdots \cdot n! }$

raise its pretty head. This is also called the Barnes G-function, presumably by people who want to make it sound more boring.

Actually that’s not fair: the Barnes G-function generalizes superfactorials to complex numbers, just as Euler’s gamma function generalizes factorials. Unfortunately, Euler made the mistake of defining his gamma function so that

$\Gamma(n) = (n-1)!$

when $n = 1, 2, 3, \cdots.$ Everyone I trust assures me this was a bad idea, not some deep insight… but Barnes doubled down on Euler’s mistake and defined his G-function so that

$G(n) = 0! \cdot 1! \cdot \cdots \cdot (n-2)!$

So, we’ll have to be careful when looking up formulas on Wikipedia: the superfactorial of n is $G(n+2).$ Thus, we have

$\displaystyle{ t_n = \frac{(n!)^{n+1}}{G(n+2)^2} }$

so

$\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }$

Now, there’s a great approximate formula for the logarithm of a factorial. It’s worth its weight in gold… or at least silver, so it’s called Stirling’s formula:

$\displaystyle{ \ln(n!) = n \ln (n) \; - \; n \; + \;\tfrac{1}{2} \ln(2 \pi n) \; + \; \frac{1}{12n} \cdots }$

where the dots mean stuff that goes to zero when divided by the last term, in the limit $n \to \infty.$

There’s also an approximate formula for the logarithm of the superfactorial! It’s a bit scary:

$\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + } \\ \\ && \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\ && \tfrac{1}{12} - \ln A \; + \cdots \end{array}$

where the dots mean stuff that actually goes to zero as $n \to \infty.$

What’s A? It’s the Glaisher–Kinkelin constant! If you’re tired of memorizing digits of pi and need a change of pace, you can look up the first 20,000 digits of the Glaisher–Kinkelin constant here, but roughly we have

$A \approx 1.282427129100622636875342...$

Of course most mathematicians don’t care much about the digits; what we really want to know is what this constant means!

Euler, who did some wacky nonrigorous calculations, once argued that

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

Riemann made this rigorous by defining

$\displaystyle{ \zeta(s) = 1^{-s} + 2^{-s} + 3^{-s} + \cdots }$

which converges when $\mathrm{Re}(s) > 1,$ and then analytically continuing this to other values of $s.$ He found that indeed

$\displaystyle{ \zeta(-1) = -\tfrac{1}{12} }$

This fact has many marvelous ramifications: for example, it’s why bosonic string theory works best in 24 dimensions! It’s also connected to the $\tfrac{1}{12n}$ term in Stirling’s formula.

But anyway, we might wonder what happens if we compute $\zeta(s)$ for $s$ near -1. This is where the Glaisher–Kinkelin constant shows up, because if we try a Taylor series we get

$\displaystyle{ \zeta'(-1) = \tfrac{1}{12} - \ln A }$

To me, this means that $\tfrac{1}{12} - \ln A$ is more fundamental than A itself, and indeed you’ll see it’s this combination that shows up in the approximate formula for superfactorials. So, we can simplify that formula a bit:

$\begin{array}{ccl} \ln G(n+2) &=& \displaystyle{ \left(\tfrac{1}{2} \ln(n+1) - \tfrac{3}{4}\right) (n+1)^2 \; + } \\ \\ && \tfrac{\ln(2\pi)}{2} \, (n+1) \; - \\ \\ && \tfrac{1}{12} \ln(n+1) \; + \\ \\ && \zeta'(-1) \; + \cdots \end{array}$

Now, let’s use this together with Stirling’s formula to estimate the logarithm of the product of all numbers in the nth row of Pascal’s triangle. Remember, that’s

$\displaystyle{ \ln(t_n) = (n+1) \ln (n!) - 2 \ln G(n+2) }$

So, we get

$\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\ \\ && - \displaystyle{ \Big[ \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\ && \quad \displaystyle{ \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big] } \end{array}$

and exponentiating this gives a good approximation to $t_n.$

Here is a graph of $t_n$ divided by this approximation, created by Egan:

As you can see, the ratio goes to 1 quite nicely!

So, we’ve seem some nice relationships between these things:

$1 + 2 + \cdots + n = T_n$

$1 \cdot 2 \cdot \cdots \cdot n = n!$

$\binom{n}{1} \cdot \binom{n}{2} \cdot \cdots \cdot \binom{n}{n} = t_n$

$1! \cdot 2! \cdot \cdots \cdot n! = G(n+2)$

$\frac{1}{0!} +\frac{1}{1!} + \frac{1}{2!} + \frac{1}{3!} + \cdots = e$

and Euler’s wacky formula

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

Puzzle 2. Can you take the formula

$\begin{array}{ccl} \ln(t_n) &\approx& \displaystyle{(n+1)\Big[ n \ln(n) - n + \tfrac{1}{2} \ln(2 \pi n) + \frac{1}{12n} \Big] } \\ \\ && - \displaystyle{ \Big[ \left( \ln(n+1) - \tfrac{3}{2}\right) (n+1)^2 + \ln(2\pi) \cdot (n+1) -} \\ \\ && \quad \displaystyle{ \tfrac{1}{6} \ln(n+1) + 2\zeta'(-1) \Big] } \end{array}$

and massage it until it looks like $n(n+1)/2$ plus ‘correction terms’? How big are these correction terms?

### References

Any errors in the formulas above are my fault. Here are the papers that first squeezed e out of Pascal’s triangle:

• Harlan J. Brothers, Pascal’s triangle: The hidden stor-e, The Mathematical Gazette, March 2012, 145.

• Harlan J. Brothers, Finding e in Pascal’s triangle, Mathematics Magazine 85 (2012), 51﻿.

I learned the idea from here, thanks to Richard Elwes:

• Alexander Bogomolny, e in the Pascal Triangle, Interactive Mathematics Miscellany and Puzzles.

For how Euler derived his crazy identity

$\displaystyle{ 1 + 2 + 3 + \cdots = -\tfrac{1}{12} }$

and how it’s relevant to string theory, try:

• John Baez, My favorite numbers: 24.