Very Long Proofs

28 May, 2016


In the 1980s, the mathematician Ronald Graham asked if it’s possible to color each positive integer either red or blue, so that no triple of integers a,b,c obeying Pythagoras’ famous equation:

a^2 + b^2 = c^2

all have the same color. He offered a prize of $100.

Now it’s been solved! The answer is no. You can do it for numbers up to 7824, and a solution is shown in this picture. But you can’t do it for numbers up to 7825.

To prove this, you could try all the ways of coloring these numbers and show that nothing works. Unfortunately that would require trying

3 628 407 622 680 653 855 043 364 707 128 616 108 257 615 873 380 491 654 672 530 751 098 578 199 115 261 452 571 373 352 277 580 182 512 704 196 704 700 964 418 214 007 274 963 650 268 320 833 348 358 055 727 804 748 748 967 798 143 944 388 089 113 386 055 677 702 185 975 201 206 538 492 976 737 189 116 792 750 750 283 863 541 981 894 609 646 155 018 176 099 812 920 819 928 564 304 241 881 419 294 737 371 051 103 347 331 571 936 595 489 437 811 657 956 513 586 177 418 898 046 973 204 724 260 409 472 142 274 035 658 308 994 441 030 207 341 876 595 402 406 132 471 499 889 421 272 469 466 743 202 089 120 267 254 720 539 682 163 304 267 299 158 378 822 985 523 936 240 090 542 261 895 398 063 218 866 065 556 920 106 107 895 261 677 168 544 299 103 259 221 237 129 781 775 846 127 529 160 382 322 984 799 874 720 389 723 262 131 960 763 480 055 015 082 441 821 085 319 372 482 391 253 730 679 304 024 117 656 777 104 250 811 316 994 036 885 016 048 251 200 639 797 871 184 847 323 365 327 890 924 193 402 500 160 273 667 451 747 479 728 733 677 070 215 164 678 820 411 258 921 014 893 185 210 250 670 250 411 512 184 115 164 962 089 724 089 514 186 480 233 860 912 060 039 568 930 065 326 456 428 286 693 446 250 498 886 166 303 662 106 974 996 363 841 314 102 740 092 468 317 856 149 533 746 611 128 406 657 663 556 901 416 145 644 927 496 655 933 158 468 143 482 484 006 372 447 906 612 292 829 541 260 496 970 290 197 465 492 579 693 769 880 105 128 657 628 937 735 039 288 299 048 235 836 690 797 324 513 502 829 134 531 163 352 342 497 313 541 253 617 660 116 325 236 428 177 219 201 276 485 618 928 152 536 082 354 773 892 775 152 956 930 865 700 141 446 169 861 011 718 781 238 307 958 494 122 828 500 438 409 758 341 331 326 359 243 206 743 136 842 911 727 359 310 997 123 441 791 745 020 539 221 575 643 687 646 417 117 456 946 996 365 628 976 457 655 208 423 130 822 936 961 822 716 117 367 694 165 267 852 307 626 092 080 279 836 122 376 918 659 101 107 919 099 514 855 113 769 846 184 593 342 248 535 927 407 152 514 690 465 246 338 232 121 308 958 440 135 194 441 048 499 639 516 303 692 332 532 864 631 075 547 542 841 539 848 320 583 307 785 982 596 093 517 564 724 398 774 449 380 877 817 714 717 298 596 139 689 573 570 820 356 836 562 548 742 103 826 628 952 649 445 195 215 299 968 571 218 175 989 131 452 226 726 280 771 962 970 811 426 993 797 429 280 745 007 389 078 784 134 703 325 573 686 508 850 839 302 112 856 558 329 106 490 855 990 906 295 808 952 377 118 908 425 653 871 786 066 073 831 252 442 345 238 678 271 662 351 535 236 004 206 289 778 489 301 259 384 752 840 495 042 455 478 916 057 156 112 873 606 371 350 264 102 687 648 074 992 121 706 972 612 854 704 154 657 041 404 145 923 642 777 084 367 960 280 878 796 437 947 008 894 044 010 821 287 362 106 232 574 741 311 032 906 880 293 520 619 953 280 544 651 789 897 413 312 253 724 012 410 831 696 803 510 617 000 147 747 294 278 502 175 823 823 024 255 652 077 422 574 922 776 413 427 073 317 197 412 284 579 070 292 042 084 295 513 948 442 461 828 389 757 279 712 121 164 692 705 105 851 647 684 562 196 098 398 773 163 469 604 125 793 092 370 432

possibilities. But recently, three mathematicians cleverly figured out how to eliminate most of the options. That left fewer than a trillion to check!

So they spent 2 days on a supercomputer, running 800 processors in parallel, and checked all the options. None worked. They verified their solution on another computer.

This is one of the world’s biggest proofs: it’s 200 terabytes long! That’s about equal to all the digitized text held by the US Library of Congress. There’s also a 68-gigabyte digital signature—sort of a proof that a proof exists—if you want to skim it.

It’s interesting that these 200 terabytes were used to solve a yes-or-no question, whose answer takes a single bit to state: no.

I’m not sure breaking the world’s record for the longest proof is something to be proud of. Mathematicians prize short, insightful proofs. I bet a shorter proof of this result will eventually be found.

Still, it’s fun that we can do such things. Here’s a story about the proof:

• Evelyn Lamb, Two-hundred-terabyte maths proof is largest ever, Nature, May 26, 2016.

and here’s the actual paper:

• Marijn J. H. Heule, Oliver Kullmann and Victor W. Marek, Solving and verifying the Boolean Pythagorean triples problem via cube-and-conquer.

The ‘cube-and-conquer’ paradigm is a “hybrid SAT method for hard problems, employing both look-ahead and CDCL solvers”. The actual benefit of this huge proof is developing such methods for solving big problems! In the comments to my G+ post, Roberto Bayardo explained:

CDCL == “conflict-directed clause learning”: when you hit a dead end in backtrack search, this is the process of recording a (hopefully small) clause that prevents you from making the same mistake again…a type of memorization, essentially.

Look-ahead: in backtrack search, you repeat the process of picking an unassigned variable and then picking an assignment for that variable until you reach a “dead end” (upon deriving a contradiction). Look-ahead involves doing some amount of processing on the remaining formula after each assignment in order to simplify it. This includes identifying variables for which one of its assignments can be quickly eliminated. “Unit propagation” is a type of look-ahead, though I suspect in this case they mean something quite a bit more sophisticated.

Arnaud Spiwack has a series of blog posts introducting SAT solvers here:

Part 1: models & conflict clauses.

Part 2: finding good conflict clauses.

Part 3: SAT modulo theory.

Part 4: two-literal watch.

Part 5: Avatar.

A much longer proof

By the way, despite the title of the Nature article, in the comments to my G+ post about this, Michael Nielsen pointed out a much longer proof by Chris Jefferson, who wrote:

Darn, I had no idea one could get into the media with this kind of stuff.

I had a much larger “proof”, where we didn’t bother storing all the details, in which we enumerated 718,981,858,383,872 semigroups, towards counting the semigroups of size 10.

Uncompressed, it would have been about 63,000 terabytes just for the semigroups, and about a thousand times that to store the “proof”, which is just the tree of the search.

Of course, it would have compressed extremely well, but also I’m not sure it would have had any value, you could rebuild the search tree much faster than you could read it from a disc, and if anyone re-verified our calculation I would prefer they did it by a slightly different search, which would give us much better guarantees of correctness.

His team found a total of 12,418,001,077,381,302,684 semigroups of size 10. They only had to find 718,981,858,383,872 by a brute force search, which is 0.006% of the total:

• Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho, The semigroups of order 10, in Principles and Practice of Constraint Programming, Springer Lecture Notes in Computer Science 7514, Springer, Berlin, pp. 883–899.

Insanely long proofs

All the proofs mentioned so far are downright laconic compared to those discussed here:

• John Baez, Insanely long proofs, 19 October 2012.

For example, if you read this post you’ll learn about a fairly short theorem whose shortest proof using Peano arithmetic contains at least

\displaystyle{ 10^{10^{1000}}  }

symbols. This is so many that if you tried to write down the number of symbols in this proof—not the symbols themselves, but just the number of symbols—in ordinary decimal notation, you couldn’t do it if you wrote one digit on each proton, neutron and electron in the observable Universe!

The Busy Beaver Game

21 May, 2016

This month, a bunch of ‘logic hackers’ have been seeking to determine the precise boundary between the knowable and the unknowable. The challenge has been around for a long time. But only now have people taken it up with the kind of world-wide teamwork that the internet enables.

A Turing machine is a simple model of a computer. Imagine a machine that has some finite number of states, say N states. It’s attached to a tape, an infinitely long tape with lots of squares, with either a 0 or 1 written on each square. At each step the machine reads the number where it is. Then, based on its state and what it reads, it either halts, or it writes a number, changes to a new state, and moves either left or right.

The tape starts out with only 0’s on it. The machine starts in a particular ‘start’ state. It halts if it winds up in a special ‘halt’ state.

The Busy Beaver Game is to find the Turing machine with N states that runs as long as possible and then halts.

The number BB(N) is the number of steps that the winning machine takes before it halts.

In 1961, Tibor Radó introduced the Busy Beaver Game and proved that the sequence BB(N) is uncomputable. It grows faster than any computable function!

A few values of BB(N) can be computed, but there’s no way to figure out all of them.

As we increase N, the number of Turing machines we need to check increases faster than exponentially: it’s


Of course, many could be ruled out as potential winners by simple arguments. But the real problem is this: it becomes ever more complicated to determine which Turing machines with N states never halt, and which merely take a huge time to halt.

Indeed, matter what axiom system you use for math, as long as it has finitely many axioms and is consistent, you can never use it to correctly determine BB(N) for more than some finite number of cases.

So what do people know about BB(N)?

For starters, BB(0) = 0. At this point I should admit that people don’t count the halt state as one of our N states. This is just a convention. So, when we consider BB(0), we’re considering machines that only have a halt state. They instantly halt.

Next, BB(1) = 1.

Next, BB(2) = 6.

Next, BB(3) = 21. This was proved in 1965 by Tibor Radó and Shen Lin.

Next, BB(4) = 107. This was proved in 1983 by Allan Brady.

Next, BB(5). Nobody knows what BB(5) equals!

The current 5-state busy beaver champion was discovered by Heiner Marxen and Jürgen Buntrock in 1989. It takes 47,176,870 steps before it halts. So, we know

BB(5) ≥ 47,176,870.

People have looked at all the other 5-state Turing machines to see if any does better. But there are 43 machines that do very complicated things that nobody understands. It’s believed they never halt, but nobody has been able to prove this yet.

We may have hit the wall of ignorance here… but we don’t know.

That’s the spooky thing: the precise boundary between the knowable and the unknowable is unknown. It may even be unknowable… but I’m not sure we know that.

Next, BB(6). In 1996, Marxen and Buntrock showed it’s at least 8,690,333,381,690,951. In June 2010, Pavel Kropitz proved that

\displaystyle{ \mathrm{BB}(6) \ge 7.412 \cdot 10^{36,534} }

You may wonder how he proved this. Simple! He found a 6-state machine that runs for


steps and then halts!

Of course, I’m just kidding when I say this was simple. The machine is easy enough to describe, but proving it takes exactly this long to run takes real work! You can read about such proofs here:

• Pascal Michel, The Busy Beaver Competition: a historical survey.

I don’t understand them very well. All I can say at this point is that many of the record-holding machines known so far are similar to the famous Collatz conjecture. The idea there is that you can start with any positive integer and keep doing two things:

• if it’s even, divide it by 2;

• if it’s odd, triple it and add 1.

The conjecture is that this process will always eventually reach the number 1. Here’s a graph of how many steps it takes, as a function of the number you start with:

Nice pattern! But this image shows how it works for numbers up to 10 million, and you’ll see it doesn’t usually take very long for them to reach 1. Usually less than 600 steps is enough!

So, to get a Turing machine that takes a long time to halt, you have to take this kind of behavior and make it much more long and drawn-out. Conversely, to analyze one of the potential winners of the Busy Beaver Game, people must take that long and drawn-out behavior and figure out a way to predict much more quickly when it will halt.

Next, BB(7). In 2014, someone who goes by the name Wythagoras showed that

\displaystyle{ \textrm{BB}(7) > 10^{10^{10^{10^{10^7}}}} }

It’s fun to prove lower bounds on BB(N). For example, in 1964 Milton Green constructed a sequence of Turing machines that implies

\textrm{BB}(2N) \ge 3 \uparrow^{N-2} 3

Here I’m using Knuth’s up-arrow notation, which is a recursively defined generalization of exponentiation, so for example

\textrm{BB}(10) \ge 3 \uparrow^{3} 3 = 3 \uparrow^2 3^{3^3} = 3^{3^{3^{3^{\cdot^{\cdot^\cdot}}}}}

where there are 3^{3^3} threes in that tower.

But it’s also fun to seek the smallest N for which we can prove BB(N) is unknowable! And that’s what people are making lots of progress on right now.

Sometime in April 2016, Adam Yedidia and Scott Aaronson showed that BB(7910) cannot be determined using the widely accepted axioms for math called ZFC: that is, Zermelo—Fraenkel set theory together with the axiom of choice. It’s a great story, and you can read it here:

• Scott Aaronson, The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and me, Shtetl-Optimized, 3 May 2016.

• Adam Yedidia and Scott Aaronson, A relatively small Turing machine whose behavior is independent of set theory, 13 May 2016.

Briefly, Yedidia created a new programming language, called Laconic, which lets you write programs that compile down to small Turing machines. They took an arithmetic statement created by Harvey Friedman that’s equivalent to the consistency of the usual axioms of ZFC together with a large cardinal axiom called the ‘stationary Ramsey property’, or SRP. And they created a Turing machine with 7910 states that seeks a proof of this arithmetic statement using the axioms of ZFC.

Since ZFC can’t prove its own consistency, much less its consistency when supplemented with SRP, their machine will only halt if ZFC+SRP is inconsistent.

Since most set theorists believe ZFC+SRP is consistent, this machine probably doesn’t halt. But we can’t prove this using ZFC.

In short: if the usual axioms of set theory are consistent, we can never use them to determine the value of BB(7910).

The basic idea is nothing new: what’s new is the explicit and rather low value of the number 7910. Poetically speaking, we know the unknowable starts here… if not sooner.

However, this discovery set off a wave of improvements! On the Metamath newsgroup, Mario Carneiro and others started ‘logic hacking’, looking for smaller and smaller Turing machines that would only halt if ZF—that is, Zermelo–Fraenkel set theory, without the axiom of choice—is inconsistent.

By just May 15th, Stefan O’Rear seems to have brought the number down to 1919. He found a Turing machine with just 1919 states that searches for an inconsistency in the ZF axioms. Interestingly, this turned out to work better than using Harvey Friedman’s clever trick.

Thus, if O’Rear’s work is correct, we can only determine BB(1919) if we can determine whether ZF set theory is consistent. However, we cannot do this using ZF set theory—unless we find an inconsistency in ZF set theory.

For details, see:

• Stefan O’Rear, A Turing machine Metamath verifier, 15 May 2016.

I haven’t checked his work, but it’s available on GitHub.

What’s the point of all this? At present, it’s mainly just a game. However, it should have some interesting implications. It should, for example, help us better locate the ‘complexity barrier’.

I explained that idea here:

• John Baez, The complexity barrier, Azimuth, 28 October 2011.

Briefly, while there’s no limit on how much information a string of bits—or any finite structure—can have, there’s a limit on how much information we can prove it has!

This amount of information is pretty low, perhaps a few kilobytes. And I believe the new work on logic hacking can be used to estimate it more accurately!

El Niño Project (Part 5)

12 July, 2014


And now for some comic relief.

Last time I explained how to download some weather data and start analyzing it, using programs written by Graham Jones. When you read that, did you think “Wow, that’s easy!” Or did you think “Huh? Run programs in R? How am I supposed to do that?”

If you’re in the latter group, you’re like me. But I managed to do it. And this is the tale of how. It’s a blow-by-blow account of my first steps, my blunders, my fears.

I hope that if you’re intimidated by programming, my tale will prove that you too can do this stuff… provided you have smart friends, or read this article.

More precisely, this article explains how to:

• download and run software that runs the programming language R;

• download temperature data from the National Center for Atmospheric Research;

• use R to create a file of temperature data for a given latitude/longitude rectangle for a given time interval.

I will not attempt to explain how to program in R.

If you want to copy what I’m doing, please remember that a few details depend on the operating system. Since I don’t care about operating systems, I use a Windows PC. If you use something better, some details will differ for you.

Also: at the end of this article there are some very basic programming puzzles.

A sad history

First, let me explain a bit about my relation to computers.

I first saw a computer at the Lawrence Hall of Science in Berkeley, back when I was visiting my uncle in the summer of 1978. It was really cool! They had some terminals where you could type programs in BASIC and run them.

I got especially excited when he gave me the book Computer Lib/Dream Machines by Ted Nelson. It espoused the visionary idea that people could write texts on computers all around the world—“hypertexts” where you could click on a link in one and hop to another!

I did more programming the next year in high school, sitting in a concrete block room with a teletype terminal that was connected to a mainframe somewhere far away. I stored my programs on paper tape. But my excitement gradually dwindled, because I was having more fun doing math and physics using just pencil and paper. My own brain was more easy to program than the machine. I did not start a computer company. I did not get rich. I learned quantum mechanics, and relativity, and Gödel’s theorem.

Later I did some programming in APL in college, and still later I did a bit in Mathematica in the early 1990s… but nothing much, and nothing sophisticated. Indeed, none of these languages would be the ones you’d choose to explore sophisticated ideas in computation!

I’ve just never been very interested… until now. I now want to do a lot of data analysis. It will be embarrassing to keep asking other people to do all of it for me. I need to learn how to do it myself.

Maybe you’d like to do this stuff too—or at least watch me make a fool of myself. So here’s my tale, from the start.

Downloading and running R

To use the programs written by Graham, I need to use R, a language currently popular among statisticians. It is not the language my programmer friends would want me to learn—they’d want me to use something like Python. But tough! I can learn that later.

To download R to my Windows PC, I cleverly type download R into Google, and go to the top website it recommends:

I click the big fat button on top saying

Download R 3.1.0 for Windows

and get asked to save a file R-3.1.0-win.exe. I save it in my Downloads folder; it takes a while to download since it was 57 megabytes. When I get it, I click on it and follow the easy default installation instructions. My Desktop window now has a little icon on it that says R.

Clicking this, I get an interface where I can type commands after a red


symbol. Following Graham’s advice, I start by trying

> 2^(1:8)

which generates a list of powers of 2 from 21 to 28, like this:

[1] 2 4 8 16 32 64 128 256

Then I try

> mean(2^(1:8))

which gives the arithmetic mean of this list. Somewhat more fun is

> plot(rnorm(20))

which plots a bunch of points, apparently 20 standard normal deviates.

When I hear “20 standard normal deviates” I think of the members of a typical math department… but no, those are deviants. Standard normal deviates are random numbers chosen from a Gaussian distribution of mean zero and variance 1.

Downloading climate data

To do something more interesting, I need to input data.

The papers by Ludescher et al use surface air temperatures in a certain patch of the Pacific, so I want to get ahold of those. They’re here:

NCEP/NCAR Reanalysis 1: Surface.

NCEP is the National Centers for Environmental Prediction, and NCAR is the National Center for Atmospheric Research. They have a bunch of files here containing worldwide daily average temperatures on a 2.5 degree latitude × 2.5 degree longitude grid (that’s 144 × 73 grid points), from 1948 to 2010. And if you go here, the website will help you get data from within a chosen rectangle in a grid, for a chosen time interval.

These are NetCDF files. NetCDF stands for Network Common Data Form:

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

According to my student Blake Pollard:

… the method of downloading a bunch of raw data via ftp (file transfer protocol) is a great one to become familiar with. If you poke around on or some other ftp servers maintained by government agencies you will find all the data you could ever want. Examples of things you can download for free: raw multispectral satellite images, processed data products, ‘re-analysis’ data (which is some way of combining analysis/simulation to assimilate data), sea surface temperature anomalies at resolutions much higher than 2.5 degrees (although you pay for that in file size). Also, believe it or not, people actually use NetCDF files quite widely, so once you know how to play around with those you’ll find the world quite literally at your fingertips!

I know about ftp: I’m so old that I know this was around before the web existed. Back then it meant “faster than ponies”. But I need to get R to accept data from these NetCDF files: that’s what scares me!

Graham said that R has a “package” called RNetCDF for using NetCDF files. So, I need to get ahold of this package, download some files in the NetCDF format, and somehow get R to eat those files with the help of this package.

At first I was utterly clueless! However, after a bit of messing around, I notice that right on top of the R interface there’s a menu item called Packages. I boldly click on this and choose Install Package(s).

I am rewarded with an enormous alphabetically ordered list of packages… obviously statisticians have lots of stuff they like to do over and over! I find RNetCDF, click on that and click something like “OK”.

I’m asked if I want to use a “personal library”. I click “no”, and get an error message. So I click “yes”. The computer barfs out some promising text:

trying URL ''
Content type 'application/zip' length 548584 bytes (535 Kb)
opened URL
downloaded 535 Kb

package ‘RNetCDF’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in


But now I need to figure out how to download a file and get R to eat it and digest it with the help of RNetCDF.

At this point my deus ex machina, Graham, descends from the clouds and says:

You can download the files from your browser. It is probably easiest to do that for starters. Put
into the browser, then right-click a file and Save link as…

This code will download a bunch of them:

for (year in 1950:1979) {
download.file(url=paste0("", year, ".nc"), destfile=paste0("air.sig995.", year, ".nc"), mode="wb")

It will put them into the “working directory”, probably C:\Users\JOHN\Documents. You can find the working directory using getwd(), and change it with setwd(). But you must use / not \ in the filepath.

Compared to UNIX, the Windows operating system has the peculiarity of using \ instead of / in path names, but R uses the UNIX conventions even on Windows.

So, after some mistakes, in the R interface I type

> setwd("C:/Users/JOHN/Documents/My Backups/azimuth/el nino")

and then type

> getwd()

to see if I’ve succeeded. I’m rewarded with

[1] "C:/Users/JOHN/Documents/My Backups/azimuth/el nino"


Then, following Graham’s advice, I cut-and-paste this into the R interface:

for (year in 1950:1979) {
download.file(url=paste0("", year, ".nc"), destfile=paste0("air.sig995.", year, ".nc"), mode="wb")

It seems to be working! A little bar appears showing how each year’s data is getting downloaded. It chugs away, taking a couple minutes for each year’s worth of data.

Using R to process NetCDF files

Okay, now I’ve got all the worldwide daily average temperatures on a 2.5 degree latitude × 2.5 degree longitude grid from 1950 to 1970.

The world is MINE!

But what do I do with it? Graham’s advice is again essential, along with a little R program, or script, that he wrote:

The R script netcdf-convertor.R from

will eat the file, digest it, and spit it out again. It contains instructions.

I go to this URL, which is on GitHub, a popular free web-based service for software development. You can store programs here, edit them, and GitHub will help you keep track of the different versions. I know almost nothing about this stuff, but I’ve seen it before, so I’m not intimidated.

I click on the blue thing that says netcdf-convertor.R and see something that looks like the right script. Unfortunately I can’t see how to download it! I eventually see a button I’d overlooked, cryptically labelled “Raw”. I realize that since I don’t want a roasted or oven-broiled piece of software, I should click on this. I indeed succeed in downloading netcdf-convertor.R this way. Graham later says I could have done something better, but oh well. I’m just happy nothing has actually exploded yet.

Once I’ve downloaded this script, I open it using an text processor and look at it. At top are a bunch of comments written by Graham:


# You should be able to use this by editing this
# section only.


lat.range <- 13:14
lon.range <- 142:143

firstyear <- 1957
lastyear <- 1958

outputfilename <- paste0("Scotland-", firstyear, "-", lastyear, ".txt")


# Explanation

# 1. Use setwd() to set the working directory
# to the one containing the .nc files such as
# Example:
# setwd("C:/Users/Work/AAA/Programming/ProgramOutput/Nino")

# 2. Supply the latitude and longitude range. The
# NOAA data is every 2.5 degrees. The ranges are
# supplied as the number of steps of this size.
# For latitude, 1 means North Pole, 73 means South
# Pole. For longitude, 1 means 0 degrees East, 37
# is 90E, 73 is 180, 109 is 90W or 270E, 144 is
# 2.5W.

# These roughly cover Scotland.
# lat.range <- 13:14
# lon.range <- 142:143

# These are the area used by Ludescher et al,
# 2013. It is 27x69 points which are then
# subsampled to 9 by 23.
# lat.range <- 24:50
# lon.range <- 48:116

# 3. Supply the years
# firstyear <- 1950
# lastyear <- 1952

# 4. Supply the output name as a text string.
# paste0() concatenates strings which you may find
# handy:
# outputfilename <- paste0("Pacific-", firstyear, "-", lastyear, ".txt")


# Example of output

# S013E142 S013E143 S014E142 S014E143
# Y1950P001 281.60000272654 281.570002727211 281.60000272654 280.970002740622
# Y1950P002 280.740002745762 280.270002756268 281.070002738386 280.49000275135
# Y1950P003 280.100002760068 278.820002788678 281.120002737269 280.070002760738
# Y1950P004 281.070002738386 279.420002775267 281.620002726093 280.640002747998
# ...
# Y1950P193 285.450002640486 285.290002644062 285.720002634451 285.75000263378
# Y1950P194 285.570002637804 285.640002636239 286.070002626628 286.570002615452
# Y1950P195 285.92000262998 286.220002623275 286.200002623722 286.620002614334
# ...
# Y1950P364 276.100002849475 275.350002866238 276.37000284344 275.200002869591
# Y1950P365 276.990002829581 275.820002855733 276.020002851263 274.72000288032
# Y1951P001 278.220002802089 277.470002818853 276.700002836064 275.870002854615
# Y1951P002 277.750002812594 276.890002831817 276.650002837181 275.520002862439
# ...
# Y1952P365 280.35000275448 280.120002759621 280.370002754033 279.390002775937

# There is one row for each day, and 365 days in
# each year (leap days are omitted). In each row,
# you have temperatures in Kelvin for each grid
# point in a rectangle.

# S13E142 means 13 steps South from the North Pole
# and 142 steps East from Greenwich. The points
# are in reading order, starting at the top-left
# (Northmost, Westmost) and going along the top
# row first.

# Y1950P001 means year 1950, day 1. (P because
# longer periods might be used later.)


The instructions are admirably detailed concerning what I should do, but they don't say where the output will appear when I do it. This makes me nervous. I guess I should just try it. After all, the program is not called DestroyTheWorld.

Unfortunately, at this point a lot of things start acting weird.

It's too complicated and boring to explain in detail, but basically, I keep getting a file missing error message. I don't understand why this happens under some conditions and not others. I try lots of experiments.

Eventually I discover that one year of temperature data failed to download—the year 1949, right after the first year available! So, I'm getting the error message whenever I try to do anything involving that year of data.

To fix the problem, I simply download the 1949 data by hand from here:

(You can open ftp addresses in a web browser just like http addresses.) I put it in my working directory for R, and everything is fine again. Whew!

By the time things I get this file, I sort of know what to do—after all, I've spent about an hour trying lots of different things.

I decide to create a file listing temperatures near where I live in Riverside from 1948 to 1979. To do this, I open Graham's script netcdf-convertor.R in a word processor and change this section:

lat.range <- 13:14
lon.range <- 142:143
firstyear <- 1957
lastyear <- 1958
outputfilename <- paste0("Scotland-", firstyear, "-", lastyear, ".txt")

to this:

setwd("C:/Users/JOHN/Documents/My Backups/azimuth/el nino")
lat.range <- 23:23
lon.range <- 98:98
firstyear <- 1948
lastyear <- 1979
outputfilename <- paste0("Riverside-", firstyear, "-", lastyear, ".txt")

Why? Well, I want it to put the file in my working directory. I want the years from 1948 to 1979. And I want temperature data from where I live!

Googling the info, I see Riverside, California is at 33.9481° N, 117.3961° W. 34° N is about 56 degrees south of the North Pole, which is 22 steps of size 2.5°. And because some idiot decided everyone should count starting at 1 instead of 0 even in contexts like this, the North Pole itself is step 1, not step 0… so Riverside is latitude step 23. That's why I write:

lat.range <- 23:23

Similarly, 117.5° W is 242.5° E, which is 97 steps of size 2.5°… which counts as step 98 according to this braindead system. That's why I write:

lon.range <- 98:98

Having done this, I save the file netcdf-convertor.R under another name, Riverside.R.

And then I do some stuff that it took some fiddling around to discover.

First, in my R interface I go to the menu item File, at far left, and click on Open script. It lets me browse around, so I go to my working directory for R and choose Riverside.R. A little window called R editor opens up in my R interface, containing this script.

I'm probably not doing this optimally, but I can now right-click on the R editor and see a menu with a choice called Select all. If I click this, everything in the window turns blue. Then I can right-click again and choose Run line or selection. And the script runs!


It huffs and puffs, and then stops. I peek in my working directory, and see that a file called


has been created. When I open it, it has lots of lines, starting with these:

Y1948P001 279.95
Y1948P002 280.14
Y1948P003 282.27
Y1948P004 283.97
Y1948P005 284.27
Y1948P006 286.97

As Graham promised, each line has a year and day label, followed by a vector… which in my case is just a single number, since I only wanted the temperature in one location. I’m hoping this is the temperature near Riverside, in kelvin.

A small experiment

To see if this is working, I’d like to plot these temperatures and see if they make sense. Unfortunately I have no idea how to get R to take a file containing data of the sort I have and plot it! I need to learn how, but right now I’m exhausted, so I use another method to get the job done— a method that’s too suboptimal and embarrassing to describe here. (Hint: it involves the word “Excel”.)

I do a few things, but here’s the most interesting one—namely, not very interesting. I plot the temperatures for 1963:

I compare it to some publicly available data, not from Riverside, but from nearby Los Angeles:

As you can see, there was a cold day on January 13th, when the temperature dropped to 33°F. That seems to be visible on the graph I made, and looking at the data from which I made the graph, I see the temperature dropped to 251.4 kelvin on the 13th: that’s -7°F, very cold for here. It does get colder around Riverside than in Los Angeles in the winter, since it’s a desert, with temperatures not buffered by the ocean. So, this does seem compatible with the public records. That’s mildly reassuring.

But other features of the graph don’t match, and I’m not quite sure if they should or not. So, all this very tentative and unimpressive. However, I’ve managed to get over some of my worst fears, download some temperature data, and graph it! Now I need to learn how to use R to do statistics with this data, and graph it in a better way.


You can help me out by answering these puzzles. Later I might pose puzzles where you can help us write really interesting programs. But for now it’s just about learning R.

Puzzle 1. Given a text file with lots of lines of this form:

Y1948P001 279.95
Y1948P002 280.14
Y1948P003 282.27
Y1948P004 283.97

write an R program that creates a huge vector, or list of numbers, like this:

279.95, 280.14, 282.27, 283.97, ...

Puzzle 2: Extend the above program so that it plots this list of numbers, or outputs it to a new file.

If you want to test your programs, here’s the actual file:


More puzzles

If those puzzles are too easy, here are two more. I gave these last time, but everyone was too wimpy to tackle them.

Puzzle 3. Modify the software so that it uses the same method to predict El Niños from 1980 to 2013. You’ll have to adjust two lines in netcdf-convertor-ludescher.R:

firstyear <- 1948
lastyear <- 1980

should become

firstyear <- 1980
lastyear <- 2013

or whatever range of years you want. You’ll also have to adjust names of years in ludescher-replication.R. Search the file for the string 19 and make the necessary changes. Ask me if you get stuck.

Puzzle 4. Right now we average the link strength over all pairs (i,j) where i is a node in the El Niño basin defined by Ludescher et al and j is a node outside this basin. The basin consists of the red dots here:

What happens if you change the definition of the El Niño basin? For example, can you drop those annoying two red dots that are south of the rest, without messing things up? Can you get better results if you change the shape of the basin?

To study these questions you need to rewrite ludescher-replication.R a bit. Here’s where Graham defines the El Niño basin:

ludescher.basin <- function() {
  lats <- c( 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6)
  lons <- c(11,12,13,14,15,16,17,18,19,20,21,22,16,22)
  stopifnot(length(lats) == length(lons))

These are lists of latitude and longitude coordinates: (5,11), (5,12), (5,13), etc. A coordinate like (5,11) means the little circle that’s 5 down and 11 across in the grid on the above map. So, that’s the leftmost point in Ludescher’s El Niño basin. By changing these lists, you can change the definition of the El Niño basin.

Next time I’ll discuss some criticisms of Ludescher et al’s paper, but later we will return to analyzing temperature data, looking for interesting patterns.

Category Theory for Better Spreadsheets

5 February, 2014

Since one goal of Azimuth is to connect mathematicians to projects that can more immediately help the world, I want to pass this on. It’s a press release put out by Jocelyn Paine, who has blogged about applied category theory on the n-Category Café. I think he’s a serious guy, so I hope we can help him out!

Spreadsheet researcher Jocelyn Ireson-Paine has launched an Indiegogo campaign to fund a project to make spreadsheets safer. It will show how to write spreadsheets that are easier to read and less error-prone than when written in Excel. This is important because spreadsheet errors have cost some companies millions of pounds, even causing resignations and share-price crashes. An error in one spreadsheet, an economic model written in 2010 by Harvard economists Carmen Reinhart and Kenneth Rogoff, has even been blamed for tax rises and public-sector cuts. If he gets funding, Jocelyn will re-engineer this spreadsheet. He hopes that, because of its notoriety, this will catch public attention.

Reinhart and Rogoff’s spreadsheet was part of a paper on the association between debt and economic growth. They concluded that in countries where debt exceeds 90% of gross domestic product, growth is notably lower. But in spring 2013, University of Massachusetts student Thomas Herndon found they had omitted data when calculating an average. Because their paper’s conclusion supported governments’ austerity programmes, much criticism followed. They even received hate email blaming them for tax rises and public-sector cuts.

Jocelyn said, “The error probably didn’t change the results much. But better software would have made the nature of the error clearer, as well as the economics calculations, thus averting ill-informed and hurtful media criticism. Indeed, it might have avoided the error altogether.”

Jocelyn’s project will use two ideas. One is “literate programming”. Normally, a programmer writes a program first, then adds comments explaining how it works. But in literate programming, the programmer becomes an essayist. He or she first writes the explanation, then inserts the calculations as if putting equations into a maths essay. In ordinary spreadsheets, you’re lucky to get any documentation at all; in literate spreadsheets, documentation comes first.

The other idea is “modularity”. This means building spreadsheets from self-contained parts which can be developed, tested, and documented independently of one another. This gives the spreadsheet’s author less to think about, making mistakes less likely. It also makes it easier to replace parts that do have mistakes.

Jocelyn has embodied these ideas in a piece of software named Excelsior. He said, “‘Excelsior’ means ‘higher’ in Latin, and ‘upwards!’ in Longfellow’s poem. I think of it as meaning ‘upwards from Excel’. In fact, though, it’s the name of a wonderful Oxford café where I used to work on my ideas.”

Jocelyn also wants to show how advanced maths benefits computing. Some of his inspiration came from a paper he found on a friend’s desk in the Oxford University Department of Computer Science. Written by professor Joseph Goguen, this used a branch of maths called category theory to elucidate what it means for something to be part of a system, and how the behaviour of a system arises from the behaviours of its parts. Jocelyn said, “The ideas in the paper were extremely general, applying to many different areas. And when you think of modules as parts, they even apply to spreadsheets. This shows the value of abstraction.”

For more

For pictures or more information, please contact Jocelyn Ireson-Paine:

Postal: 23 Stratfield Road, Oxford, OX2 7BG, UK.
Tel: 07768 534 091.

Campaign Web site:

Campaign blog:

Jocelyn’s bio:

Jocelyn’s personal website, for academic and general stuff:

Background information: links to all topics mentioned can be found at the end of Paine’s campaign text at

These include literate programming, modularity, the Reinhart-Rogoff spreadsheet, category theory, and many horror stories about the damage caused by spreadsheet errors.

The Selected Papers Network (Part 4)

29 July, 2013

guest post by Christopher Lee

In my last post, I outlined four aspects of walled gardens that make them very resistant to escape:

• walled gardens make individual choice irrelevant, by transferring control to the owner, and tying your one remaining option (to leave the container) to being locked out of your professional ecosystem;

• all competition is walled garden;

• walled garden competition is winner-take-all;

• even if the “good guys” win (build the biggest walled garden), they become “bad guys” (masters of the walled garden, whose interests become diametrically opposed to that of the people stuck in their walled garden).

To state the obvious: even if someone launched a new site with the perfect interface and features for an alternative system of peer review, it would probably starve to death both for lack of users and lack of impact. Even for the rare user who found the site and switched all his activity to it, he would have little or no impact because almost no one would see his reviews or papers. Indeed, even if the Open Science community launched dozens of sites exploring various useful new approaches for scientific communication, that might make Open Science’s prospects worse rather than better. Since each of these sites would in effect be a little walled garden (for reasons I outlined last time), their number and diversity would mainly serve to fragment the community (i.e. the membership and activity on each such site might be ten times less than it would have been if there were only a few such sites). When your strengths (diversity; lots of new ideas) act as weaknesses, you need a new strategy. is an attempt to an offer such a new strategy. It represents only about two weeks of development work by one person (me), and has only been up for about a month, so it can hardly be considered the last word in the manifold possibilities of this new strategy. However, this bare bones prototype demonstrates how we can solve the four ‘walled garden dilemmas’:

Enable walled-garden users to ‘levitate’—be ‘in’ the walled garden but ‘above’ it at the same time. There’s nothing mystical about this. Think about it: that’s what search engines do all the time—a search engine pulls material out of all the worlds’ walled gardens, and gives it a new life by unifying it based on what it’s about. All does is act as a search engine that indexes content by what paper and what topics it’s about, and who wrote it.

This enables isolated posts by different people to come together in a unified conversation about a specific paper (or topic), independent of what walled gardens they came from—while simultaneously carrying on their full, normal life in their original walled garden.

Concretely, rather than telling Google+ users (for example) they should stop posting on Google+ and post only on instead (which would make their initial audience plunge to near zero), we tell them to add a few tags to their Google+ post so can easily index it. They retain their full Google+ audience, but they acquire a whole new set of potential interactions and audience (trivial example: if they post on a given paper, will display their post next to other people’s posts on the same paper, resulting in all sorts of possible crosstalk).

Some people have expressed concern that indexes Google+, rightly pointing out that Google+ is yet another walled garden. Doesn’t that undercut our strategy to escape from walled gardens? No. Our strategy is not to try to find a container that is not a walled garden; our strategy is to ‘levitate’ content from walled gardens. Google+ may be a walled garden in some respects, but it allows us to index users’ content, which is all we need.

It should be equally obvious that should not limit itself to Google+. Indeed, why should a search engine restrict itself to anything less than the whole world? Of course, there’s a spectrum of different levels of technical challenges for doing this. And this tends to produce an 80-20 rule, where 80% of the value can be attained by only 20% of the work. Social networks like Google+, Twitter etc. provide a large portion of the value (potential users), for very little effort—they provide open APIs that let us search their indexes, very easily. Blogs represent another valuable category for indexing.

More to the point, far more important than technology is building a culture where users expect their content to ‘fly’ unrestricted by walled-garden boundaries, and adopt shared practices that make that happen easily and naturally. Tagging is a simple example of that. By putting the key metadata (paper ID, topic ID) into the user’s public content, in a simple, standard way (as opposed to hidden in the walled garden’s proprietary database), tagging makes it easy for anyone and everyone to index it. And the more users get accustomed to the freedom and benefits this provides, the less willing they’ll be to accept walled gardens’ trying to take ownership (ie. control) of the users’ own content.

Don’t compete; cooperate: if we admit that it will be extremely difficult for a small new site (like to compete with the big walled gardens that surround it, you might rightly ask, what options are left? Obviously, not to compete. But concretely, what would that mean?

☆ enable users in a walled garden to liberate their own content by tagging and indexing it;

☆ add value for those users (e.g. for mathematicians, give them LaTeX equation support);

☆ use the walled garden’s public channel as your network transport—i.e. build your community within and through the walled garden’s community.

This strategy treats the walled garden not as a competitor (to kill or be killed by) but instead as a partner (that provides value to you, and that you in turn add value to). Morever, since this cooperation is designed to be open and universal rather than an exclusive partnership (concretely, anyone could index posts, because they are public), we can best describe this as public data federation.

Any number of sites could cooperate in this way, simply by:

☆ sharing a common culture of standard tagging conventions;

☆ treating public data (i.e. viewable by anybody on the web) as public (i.e. indexable by anybody);

☆ drawing on the shared index of global content (i.e. when the index has content that’s relevant to your site’s users, let them see and interact with it).

To anyone used to the traditional challenges of software interoperability, this might seem like a tall order—it might take years of software development to build such a data federation. But consider: by using Google+’s open API, has de facto established such a data federation with Google+, one of the biggest players in the business. Following the checklist:

☆ offers a very simple tagging standard, and more and more Google+ users are trying it;

☆ Google+ provides the API that enables public posts to be searched and indexed. in turn assures that posts made on are visible to Google+ by simply posting them on Google+;

☆ users can see posts from (and have discussions with) Google+ users who have never logged into (or even heard of), and vice versa.

Now consider: what if someone set up their own site based on the open source code (or even wrote their own implementation of our protocol from scratch). What would they need to do to ensure 100% interoperability (i.e. our three federation requirements above) with Nothing. That federation interoperability is built into the protocol design itself. And since this is federation, that also means they’d have 100% interoperation with Google+ as well. We can easily do so also with Twitter, WordPress, and other public networks.

There are lots of relevant websites in this space. Which of them can we actually federate with in this way? This divides into two classes: those that have open APIs vs. those that don’t. If a walled garden has an API, you can typically federate with it simply by writing some code to use their API, and encouraging its users to start tagging. Everybody wins: the users gain new capabilities for free, and you’ve added value to that walled garden’s platform. For sites that lack such an API (typically smaller sites), you need more active cooperation to establish a data exchange protocol. For example, we are just starting discussions with arXiv and MathOverflow about such ‘federation’ data exchange.

To my mind, the most crucial aspect of this is sincerity: we truly wish to cooperate with (add value to) all these walled garden sites, not to compete with them (harm them). This isn’t some insidious commie plot to infiltrate and somehow destroy them. The bottom line is that websites will only join a federation if it benefits them, by making their site more useful and more attractive to users. Re-connecting with the rest of the world (in other walled gardens) accomplishes that in a very fundamental way. The only scenario I see where this would not seem advantageous, would be for a site that truly believes that it is going to achieve market dominance across this whole space (‘one walled garden to rule them all’). Looking over the landscape of players (big players like Google, Twitter, LinkedIn, Facebook, vs. little players focused on this space like Mendeley, ResearchGate, etc.), I don’t think any of the latter can claim this is a realistic plan—especially when you consider that any success in that direction will just make all other players federate together in self-defense.

Level the playing field: these considerations lead naturally to our third concern about walled gardens: walled garden competition strongly penalizes new, small players, and makes bigger players assume a winner-takes-all outcome. Concretely, (or any other new site) is puny compared with, say, Mendeley. However, the federation strategy allows us to turn that on its head. Mendeley is puny compared with Google+, and operates in de facto federation with Google+. How likely is it that Mendeley is going to crush Google+ as a social network where people discuss science? If a user could only post to other members (a small audience), then Mendeley wins by default. But that’s not how it works: a user has all of Google+ as his potential audience. In a federation strategy, the question isn’t how big you are, but rather how big your federation is. And in this day of open APIs, it is really easy to extend that de facto federation across a big fraction of the world’s social networks. And that is level playing field.

Provide no point of control: our last concern about walled gardens was that they inevitably create a divergence of interests for the winning garden’s owner vs. the users trapped inside. Hence the best of intentions (great ideas for building a wonderful community) can truly become the road to hell—an even better walled garden. After all, that’s how the current walled garden system evolved (from the reasonable and beneficial idea of establishing journals). If any one site ‘wins’, our troubles will just start all over again. Is there any alternative?

Yes: don’t let any one site win; only build a successful federation. Since user data can flow freely throughout the federation, users can move freely within the federation, without losing their content, accumulated contacts and reputation, in short, their professional ecosystem. If a successful site starts making policies that are detrimental to users, they can easily vote with their feet. The data federation re-establishes the basis for a free market, namely unconstrained individual freedom of choice.

The key is that there is no central point of control. No one ‘owns’ (i.e. controls) the data. It will be stored in many places. No one can decide to start denying it to someone else. Anyone can access the public data under the rules of the federation. Even if multiple major players conspired together, anyone else could set up an alternative site and appeal to users: vote with your feet! As we know from history, the problem with senates and other central control mechanisms is that given enough time and resources, they can be corrupted and captured by both elites and dictators. Only a federation system with no central point of control has a basic defense: regardless of what happens at ‘the top’, all individuals in the system have freedom of choice between many alternatives, and anybody can start a new alternative at any time. Indeed, the key red flag in any such system is when the powers-that-be start pushing all sorts of new rules that hinder people from starting new alternatives, or freely migrating to alternatives.

Note that implicit in this is an assertion that a healthy ecosystem should contain many diverse alternative sites that serve different subcommunities, united in a public data federation. I am not advocating that should become the ‘one paper index to rule them all’. Instead, I’m saying we need one successful exemplar of a federated system, that can help people see how to move their content beyond the walled garden and start ‘voting with their feet’.

So: how do we get there? In my view, we need to use to prove the viability of the federation model in two ways:

☆ we need to develop the interface to be a genuinely good way to discuss scientific papers, and subscribe to others’ recommendations. It goes without saying that the current interface needs lots of improvements, e.g. to work past some of Google+’s shortcomings. Given that the current interface took only a couple of weeks of hacking by just one developer (yours truly), this is eminently doable.

☆ we need to show that is not just a prisoner of Google+, but actually an open federation system, by adding other systems to the federation, such as Twitter and independent blogs. Again, this is straightforward.

To Be or Not To Be?

All of which brings us to the real question that will determine our fates. Are you for a public data federation, or not? In my
view, if you seriously want reform of the current walled garden
system, federation is the only path forward that is actually a path forward (instead of to just another walled garden). It is the only strategy that allows the community to retain control over its own content. That is fundamental.

And if you do want a public data federation, are you willing to
work for that outcome? If not, then I think you don’t really want it—because you can contribute very easily. Even just adding #spnetwork tags to your posts—wherever you write them—is a very valuable contribution that enormously increases the value of the federation ecosystem.

One more key question: who will join me in developing the platform (both the software, and federation alliances)? As long as is a one-man effort, it must fail. We don’t need a big team, but it’s time to turn the project into a real team. The project has solid foundations that will enable rapid development of new federation partnerships—e.g. exciting, open APIs like REST — and of seamless, intuitive user interfaces — such as the MongoDB noSQL database, and AJAX methods. A small, collaborative team will be able to push this system forward quickly in exciting, useful ways. If you jump in now, you can be one of the very first people on the team.

I want to make one more appeal. Whatever you think about as it exists today, forget about it.

Why? Because it’s irrelevant to the decision we need to make today: public data federation, yes or no? First, because the many flaws of the current have almost no bearing on that critical question (they mainly reflect the limitations of a version 0.1 alpha product). Second, because the whole point of federation is to ‘let a thousand flowers bloom’— to enable a diverse ecology of different tools and interfaces, made viable because they work together as a federation, rather than starving to death as separate, warring, walled gardens.

Of course, to get to that diverse, federated ecosystem, we first
have to prove that one federated system can succeed—and
liberate a bunch of minds in the process, starting with our own. We have to assemble a nucleus of users who are committed to making this idea succeed by using it, and a team of developers who are driven to build it. Remember, talking about the federation ideal will not by itself accomplish anything. We have to act, now; specifically, we have to quickly build a system that lets more and more people see the direct benefits of public data federation. If and when that is clearly successful, and growing sustainably, we can consider branching out, but not before.

For better or worse, in a world of walled gardens, is the one effort (in my limited knowledge) to do exactly that. It may be ugly, and annoying, and alpha, but it offers people a new and different kind of social contract than the walled gardens. (If someone can point me to an equivalent effort to implement the same public data federation strategy, we will of course be delighted to work with them! That’s what federation means).

The question now for the development of public data federation is whether we are working together to make it happen, or on the contrary whether we are fragmenting and diffusing our effort. I believe that public data federation is the Manhattan Project of the war for Open Science. It really could change the world in a fundamental and enduring way. Right now the world may seem headed the opposite direction (higher and higher walls), but it does not have to be that way. I believe that all of the required ingredients are demonstrably available and ready to go. The only remaining requirement is that we rise as a community and do it.

I am speaking to you, as one person to another. You as an individual do not even have the figleaf of saying “Well, if I do this, what’s the point? One person can’t have any impact.” You as an individual can change this project. You as an individual can change the world around you through what you do on this project.

Petri Net Programming (Part 3)

19 April, 2013

guest post by David Tanzer

The role of differential equations

Last time we looked at stochastic Petri nets, which use a random event model for the reactions. Individual entities are represented by tokens that flow through the network. When the token counts get large, we observed that they can be approximated by continuous quantities, which opens the door to the application of continuous mathematics to the analysis of network dynamics.

A key result of this approach is the “rate equation,” which gives a law of motion for the expected sizes of the populations. Equilibrium can then be obtained by solving for zero motion. The rate equations are applied in chemistry, where they give the rates of change of the concentrations of the various species in a reaction.

But before discussing the rate equation, here I will talk about the mathematical form of this law of motion, which consists of differential equations. This form is naturally associated with deterministic systems involving continuous magnitudes. This includes the equations of motion for the sine wave:

and the graceful ellipses that are traced out by the orbits of the planets around the sun:

This post provides some mathematical context to programmers who have not worked on scientific applications. My goal is to get as many of you on board as possible, before setting sail with Petri net programming.

Three approaches to equations: theoretical, formula-based, and computational

Let’s first consider the major approaches to equations in general. We’ll illustrate with a Diophantine equation

x^9 + y^9 + z^9 = 2

where x, y and z are integer variables.

In the theoretical approach (aka “qualitative analysis”), we start with the meaning of the equation and then proceed to reason about its solutions. Here are some simple consequences of this equation. They can’t all be zero, can’t all be positive, can’t all be negative, can’t all be even, and can’t all be odd.

In the formula-based approach, we seek formulas to describe the solutions. Here is an example of a formula (which does not solve our equation):

\{(x,y,z) | x = n^3, y = 2n - 4, z = 4 n | 1 \leq n \leq 5 \}

Such formulas are nice to have, but the pursuit of them is diabolically difficult. In fact, for Diophantine equations, even the question of whether an arbitrarily chosen equation has any solutions whatsoever has been proven to be algorithmically undecidable.

Finally, in the computational approach, we seek algorithms to enumerate or numerically approximate the solutions to the equations.

The three approaches to differential equations

Let’s apply the preceding classification to differential equations.

Theoretical approach

A differential equation is one that constrains the rates at which the variables are changing. This can include constraints on the rates at which the rates are changing (second-order equations), etc. The equation is ordinary if there is a single independent variable, such as time, otherwise it is partial.

Consider the equation stating that a variable increases at a rate equal to its current value. The bigger it gets, the faster it increases. Given a starting value, this determines a process — the solution to the equation — which here is exponential growth.

Let X(t) be the value at time t, and let’s initialize it to 1 at time 0. So we have:

X(0) = 1

X'(t) = X(t)

These are first-order equations, because the derivative is applied at most once to any variable. They are linear equations, because the terms on each side of the equations are linear combinations of either individual variables or derivatives (in this case all of the coefficients are 1). Note also that a system of differential equations may in general have zero, one, or multiple solutions. This example belongs to a class of equations which are proven to have a unique solution for each initial condition.

You could imagine more complex systems of equations, involving multiple dependent variables, all still depending on time. That includes the rate equations for a Petri net, which have one dependent variable for each of the population sizes. The ideas for such systems are an extension of the ideas for a single-variable system. Then, a state of the system is a vector of values, with one component for each of the dependent variables. For first-order systems, such as the rate equations, where the derivatives appear on the left-hand sides, the equations determine, for each possible state of the system, a “direction” and rate of change for the state of the system.

Now here is a simple illustration of what I called the theoretical approach. Can X(t) ever become negative? No, because it starts out positive at time 0, and in order to later become negative, it must be decreasing at a time t_1 when it is still positive. That is to say, X(t_1) > 0, and X'(t_1) < 0. But that contradicts the assumption X'(t) = X(t). The general lesson here is that we don’t need a solution formula in order to make such inferences.

For the rate equations, the theoretical approach leads to substantial theorems about the existence and structure of equilibrium solutions.

Formula-based approach

It is natural to look for concise formulas to solve our equations, but the results of this overall quest are largely negative. The exponential differential equation cannot be solved by any formula that involves a finite combination of simple operations. So the solution function must be treated as a new primitive, and given a name, say \exp(t). But even when we extend our language to include this new symbol, there are many differential equations that remain beyond the reach of finite formulas. So an endless collection of primitive functions is called for. (As standard practice, we always include exp(t), and its complex extensions to the trigonometric functions, as primitives in our toolbox.)

But the hard mathematical reality does not end here, because even when solution formulas do exist, finding them may call for an ace detective. Only for certain classes of differential equations, such as the linear ones, do we have systematic solution methods.

The picture changes, however, if we let the formulas contain an infinite number of operations. Then the arithmetic operators give a far-reaching base for defining new functions. In fact, as you can verify, the power series

X(t) = 1 + t + t^2/2! + t^3/3! + ...

which we view as an “infinite polynomial” over the time parameter t, exactly satisfies our equations for exponential motion, X(0) = 1 and X'(t) = X(t). This power series therefore defines \exp(t). By the way, applying it to the input 1 produces a definition for the transcendental number e:

e = X(1) = 1 + 1 + 1/2 + 1/6 + 1/24 + 1/120 + ... \approx 2.71828

Computational approach

Let’s leave aside our troubles with formulas, and consider the computational approach. For broad classes of differential equations, there are approximation algorithms that be successfully applied.

For starters, any power series that satisfies a differential equation may work for a simple approximation method. If a series is known to converge over some range of inputs, then one can approximate the value at those points by stopping the computation after a finite number of terms.

But the standard methods work directly with the equations, provided that they can be put into the right form. The simplest one is called Euler’s method. It works over a sampling grid of points separated by some small number \epsilon. Let’s take the case where we have a first-order equation in explicit form, which means that X'(t) = f(X(t)) for some function f.

We begin with the initial value X(0). Applying f, we get X'(0) = f(X(0)). Then for the interval from 0 to \epsilon,we use a linear approximation for X(t), by assuming that the derivative remains constant at X'(0). That gives X(\epsilon) = X(0) + \epsilon \cdot X'(0). Next, X'(\epsilon) = f(X(\epsilon)), and X(2 \epsilon) = X(\epsilon) + \epsilon \cdot X'(\epsilon), etc. Formally,

X(0) = \textrm{initial}

X((n+1) \epsilon) = X(n \epsilon) + \epsilon f(X(n \epsilon))

Applying this to our exponential equation, where f(X(t)) = X(t), we get:

X(0) = 1

X((n+1) \epsilon) = X(n \epsilon) + \epsilon X(n \epsilon) = X(n \epsilon) (1 + \epsilon)


X(n \epsilon) = (1 + \epsilon) ^ n

So the approximation method gives a discrete exponential growth, which converges to a continuous exponential in the limit as the mesh size goes to zero.

Note, the case we just considered has more generality than might appear at first, because (1) the ideas here are easily extended to systems of explicit first order equations, and (2) higher-order equations that are “explicit” in an extended sense—meaning that the highest-order derivative is expressed as a function of time, of the variable, and of the lower-order derivatives—can be converted into an equivalent system of explicit first-order equations.

The challenging world of differential equations

So, is our cup half-empty or half-full? We have no toolbox of primitive formulas for building the solutions to all differential equations by finite compositions. And even for those which can be solved by formulas, there is no general method for finding the solutions. That is how the cookie crumbles. But on the positive side, there is an array of theoretical tools for analyzing and solving important classes of differential equations, and numerical methods can be applied in many cases.

The study of differential equations leads to some challenging problems, such as the Navier-Stokes equations, which describe the flow of fluids.

These are partial differential equations involving flow velocity, pressure, density and external forces (such as gravity), all of which vary over space and time. There are non-linear (multiplicative) interactions between these variables and their spatial and temporal derivatives, which leads to complexity in the solutions.

At high flow rates, this complexity can produce chaotic solutions, which involve complex behavior at a wide range of resolution scales. This is turbulence. Here is an insightful portrait of turbulence, by Leonardo da Vinci, whose studies in turbulence date back to the 15th Century.

Turbulence, which has been described by Richard Feynman as the most important unsolved problem of classical physics, also presents a mathematical puzzle. The general existence of solutions to the Navier-Stokes equations remains unsettled. This is one of the “Millennium Prize Problems”, for which a one million dollar prize is offered: in three dimensions, given initial values for the velocity and scalar fields, does there exist a solution that is smooth and everywhere defined? There are also complications with grid-based numerical methods, which will fail to produce globally accurate results if the solutions contain details at a smaller scale than the grid mesh. So the ubiquitous phenomenon of turbulence, which is so basic to the movements of the atmosphere and the seas, remains an open case.

But fortunately we have enough traction with differential equations to proceed directly with the rate equations for Petri nets. There we will find illuminating equations, which are the subject of both major theorems and open problems. They are non-linear and intractable by formula-based methods, yet, as we will see, they are well handled by numerical methods.

Petri Net Programming (Part 2)

20 December, 2012

guest post by David A. Tanzer

An introduction to stochastic Petri nets

In the previous article, I explored a simple computational model called Petri nets. They are used to model reaction networks, and have applications in a wide variety of fields, including population ecology, gene regulatory networks, and chemical reaction networks. I presented a simulator program for Petri nets, but it had an important limitation: the model and the simulator contain no notion of the rates of the reactions. But these rates critically determine the character of the dynamics of network.

Here I will introduce the topic of ‘stochastic Petri nets,’ which extends the basic model to include reaction dynamics. Stochastic means random, and it is presumed that there is an underlying random process that drives the reaction events. This topic is rich in both its mathematical foundations and its practical applications. A direct application of the theory yields the rate equation for chemical reactions, which is a cornerstone of chemical reaction theory. The theory also gives algorithms for analyzing and simulating Petri nets.

We are now entering the ‘business’ of software development for applications to science. The business logic here is nothing but math and science itself. Our study of this logic is not an academic exercise that is tangential to the implementation effort. Rather, it is the first phase of a complete software development process for scientific programming applications.

The end goals of this series are to develop working code to analyze and simulate Petri nets, and to apply these tools to informative case studies. But we have some work to do en route, because we need to truly understand the models in order to properly interpret the algorithms. The key questions here are when, why, and to what extent the algorithms give results that are empirically predictive. We will therefore be embarking on some exploratory adventures into the relevant theoretical foundations.

The overarching subject area to which stochastic Petri nets belong has been described as stochastic mechanics in the network theory series here on Azimuth. The theme development here will partly parallel that of the network theory series, but with a different focus, since I am addressing a computationally oriented reader. For an excellent text on the foundations and applications of stochastic mechanics, see:

• Darren Wilkinson, Stochastic Modelling for Systems Biology, Chapman and Hall/CRC Press, Boca Raton, Florida, 2011.

Review of basic Petri nets

A Petri net is a graph with two kinds of nodes: species and transitions. The net is populated with a collection of ‘tokens’ that represent individual entities. Each token is attached to one of the species nodes, and this attachment indicates the type of the token. We may therefore view a species node as a container that holds all of the tokens of a given type.

The transitions represent conversion reactions between the tokens. Each transition is ‘wired’ to a collection of input species-containers, and to a collection of output containers. When it ‘fires’, it removes one token from each input container, and deposits one token to each output container.

Here is the example we gave, for a simplistic model of the formation and dissociation of H2O molecules:

The circles are for species, and the boxes are for transitions.

The transition combine takes in two H tokens and one O token, and outputs one H2O token. The reverse transition is split, which takes in one H2O, and outputs two H’s and one O.

An important application of Petri nets is to the modeling of biochemical reaction networks, which include the gene regulatory networks. Since genes and enzymes are molecules, and their binding interactions are chemical reactions, the Petri net model is directly applicable. For example, consider a transition that inputs one gene G, one enzyme E, and outputs the molecular form G • E in which E is bound to a particular site on G.

Applications of Petri nets may differ widely in terms of the population sizes involved in the model. In general chemistry reactions, the populations are measured in units of moles (where a mole is ‘Avogadro’s number’ 6.022 · 1023 entities). In gene regulatory networks, on the other hand, there may only be a handful of genes and enzymes involved in a reaction.

This difference in scale leads to a qualitative difference in the modelling. With small population sizes, the stochastic effects will predominate, but with large populations, a continuous, deterministic, average-based approximation can be used.

Representing Petri nets by reaction formulas

Petri nets can also be represented by formulas used for chemical reaction networks. Here is the formula for the Petri net shown above:

H2O ↔ H + H + O

or the more compact:

H2O ↔ 2 H + O

The double arrow is a compact designation for two separate reactions, which happen to be opposites of each other.

By the way, this reaction is not physically realistic, because one doesn’t find isolated H and O atoms traveling around and meeting up to form water molecules. This is the actual reaction pair that predominates in water:

2 H2O ↔ OH + H3O+

Here, a hydrogen nucleus H+, with one unit of positive charge, gets removed from one of the H2O molecules, leaving behind the hydroxide ion OH. In the same stroke, this H+ gets re-attached to the other H2O molecule, which thereby becomes a hydronium ion, H3O+.

For a more detailed example, consider this reaction chain, which is of concern to the ocean environment:

CO2 + H2O ↔ H2CO3 ↔ H+ + HCO3

This shows the formation of carbonic acid, namely H2CO3, from water and carbon dioxide. The next reaction represents the splitting of carbonic acid into a hydrogen ion and a negatively charged bicarbonate ion, HCO3. There is a further reaction, in which a bicarbonate ion further ionizes into an H+ and a doubly negative carbonate ion CO32-. As the diagram indicates, for each of these reactions, a reverse reaction is also present. For a more detailed description of this reaction network, see:

• Stephen E. Bialkowski, Carbon dioxide and carbonic acid.

Increased levels of CO2 in the atmosphere will change the balance of these reactions, leading to a higher concentration of hydrogen ions in the water, i.e., a more acidic ocean. This is of concern because the metabolic processes of aquatic organisms is sensitive to the pH level of the water. The ultimate concern is that entire food chains could be disrupted, if some of the organisms cannot survive in a higher pH environment. See the Wikipedia page on ocean acidification for more information.

Exercise. Draw Petri net diagrams for these reaction networks.

Motivation for the study of Petri net dynamics

The relative rates of the various reactions in a network critically determine the qualitative dynamics of the network as a whole. This is because the reactions are ‘competing’ with each other, and so their relative rates determine the direction in which the state of the system is changing. For instance, if molecules are breaking down faster then they are being formed, then the system is moving towards full dissociation. When the rates are equal, the processes balance out, and the system is in an equilibrium state. Then, there are only temporary fluctuations around the equilibrium conditions.

The rate of the reactions will depend on the number of tokens present in the system. For example, if any of the input tokens are zero, then the transition can’t fire, and so its rate must be zero. More generally, when there are few input tokens available, there will be fewer reaction events, and so the firing rates will be lower.

Given a specification for the rates in a reaction network, we can then pose the following kinds of questions about its dynamics:

• Does the network have an equilibrium state?

• If so, what are the concentrations of the species at equilibrium?

• How quickly does it approach the equilibrium?

• At the equilibrium state, there will still be temporary fluctuations around the equilibrium concentrations. What are the variances of these fluctuations?

• Are there modes in which the network will oscillate between states?

This is the grail we seek.

Aside from actually performing empirical experiments, such questions can be addressed either analytically or through simulation methods. In either case, our first step is to define a theoretical model for the dynamics of a Petri net.

Stochastic Petri nets

A stochastic Petri net (with kinetics) is a Petri net that is augmented with a specification for the reaction dynamics. It is defined by the following:

• An underlying Petri net, which consists of species, transitions, an input map, and an output map. These maps assign to each transition a multiset of species. (Multiset means that duplicates are allowed.) Recall that the state of the net is defined by a marking function, that maps each species to its population count.

• A rate constant that is associated with each transition.

• A kinetic model, that gives the expected firing rate for each transition as a function of the current marking. Normally, this kinetic function will include the rate constant as a multiplicative factor.

A further ‘sanity constraint’ can be put on the kinetic function for a transition: it should give a positive value if and only if all of its inputs are positive.

• A stochastic model, which defines the probability distribution of the time intervals between firing events. This specific distribution of the firing intervals for a transition will be a function of the expected firing rate in the current marking.

This definition is based on the standard treatments found, for example in:

• M. Ajmone Marsan, Stochastic Petri nets: an elementary introduction, in Advances in Petri Nets, Springer, Berlin, 1989, 1–23.

or Wilkinson’s book mentioned above. I have also added an explicit mention of the kinetic model, based on the ‘kinetics’ described in here:

• Martin Feinberg, Lectures on chemical reaction networks.

There is an implied random process that drives the reaction events. A classical random process is given by a container with ‘particles’ that are randomly traveling around, bouncing off the walls, and colliding with each other. This is the general idea behind Brownian motion. It is called a random process because the outcome results from an ‘experiment’ that is not fully determined by the input specification. In this experiment, you pour in the ingredients (particles of different types), set the temperature (the distributions of the velocities), give it a stir, and then see what happens. The outcome consists of the paths taken by each of the particles.

In an important limiting case, the stochastic behavior becomes deterministic, and the population sizes become continuous. To see this, consider a graph of population sizes over time. With larger population sizes, the relative jumps caused by the firing of individual transitions become smaller, and graphs look more like continuous curves. In the limit, we obtain an approximation for high population counts, in which the graphs are continuous curves, and the concentrations are treated as continuous magnitudes. In a similar way, a pitcher of sugar can be approximately viewed as a continuous fluid.

This simplification permits the application of continuous mathematics to study of reaction network processes. It leads to the basic rate equation for reaction networks, which specifies the direction of change of the system as a function of the current state of the system.

In this article we will be exploring this continuous deterministic formulation of Petri nets, under what is known as the mass action kinetics. This kinetics is one implementation of the general specification of a kinetic model, as defined above. This means that it will define the expected firing rate of each transition, in a given marking of the net. The probabilistic variations in the spacing of the reactions—around the mean given by the expected firing rate—is part of the stochastic dynamics, and will be addressed in a subsequent article.

The mass-action kinetics

Under the mass action kinetics, the expected firing rate of a transition is proportional to the product of the concentrations of its input species. For instance, if the reaction were A + C → D, then the firing rate would be proportional to the concentration of A times the concentration of C, and if the reaction were A + A → D, it would be proportional to the square of the concentration of A.

This principle is explained by Feinberg as follows:

For the reaction A+C → D, an occurrence requires that a molecule of A meet a molecule of C in the reaction, and we take the probability of such an encounter to be proportional to the product [of the concentrations of A and C]. Although we do not presume that every such encounter yields a molecule of D, we nevertheless take the occurrence rate of A+C → D to be governed by [the product of the concentrations].

For an in-depth proof of the mass action law, see this article:

• Daniel Gillespie, A rigorous definition of the chemical master equation, 1992.

Note that we can easily pass back and forth between speaking of the population counts for the species, and the concentrations of the species, which is just the population count divided by the total volume V of the system. The mass action law applies to both cases, the only difference being that the constant factors of (1/V) used for concentrations will get absorbed into the rate constants.

The mass action kinetics is a basic law of empirical chemistry. But there are limits to its validity. First, as indicated in the proof in the Gillespie, the mass action law rests on the assumptions that the system is well-stirred and in thermal equilibrium. Further limits are discussed here:

• Georg Job and Regina Ruffler, Physical Chemistry (first five chapters), Section 5.2, 2010.

They write:

…precise measurements show that the relation above is not strictly adhered to. At higher concentrations, values depart quite noticeably from this relation. If we gradually move to lower concentrations, the differences become smaller. The equation here expresses a so-called “limiting law“ which strictly applies only when c → 0.

In practice, this relation serves as a useful approximation up to rather high concentrations. In the case of electrically neutral substances, deviations are only noticeable above 100 mol m−3. For ions, deviations become observable above 1 mol m−3, but they are so small that they are easily neglected if accuracy is not of prime concern.

Why would the mass action kinetics break down at high concentrations? According to the book quoted, it is due to “molecular and ionic interactions.” I haven’t yet found a more detailed explanation, but here is my supposition about what is meant by molecular interactions in this context. Doubling the number of A molecules doubles the number of expected collisions between A and C molecules, but it also reduces the probability that any given A and C molecules that are within reacting distance will actually react. The reaction probability is reduced because the A molecules are ‘competing’ for reactions with the C molecules. With more A molecules, it becomes more likely that a C molecule will simultaneously be within reacting distance of several A molecules; each of these A molecules reduces the probability that the other A molecules will react with the C molecule. This is most pronounced when the concentrations in a gas get high enough that the molecules start to pack together to form a liquid.

The equilibrium relation for a pair of opposite reactions

Suppose we have two opposite reactions:

T: A + B \stackrel{u}{\longrightarrow} C + D

T': C + D \stackrel{v}{\longrightarrow} A + B

Since the reactions have exactly opposite effects on the population sizes, in order for the population sizes to be in a stable equilibrium, the expected firing rates of T and T' must be equal:

\mathrm{rate}(T') = \mathrm{rate}(T)

By mass action kinetics:

\mathrm{rate}(T) = u [A] [B]

\mathrm{rate}(T') = v [C] [D]

where [X] means the concentration of X.

Hence at equilibrium:

u [A] [B] = v [C] [D]


\displaystyle{ \frac{[A][B]}{[C][D]} = \frac{v}{u} = K }

where K is the equilibrium constant for the reaction pair.

Equilibrium solution for the formation and dissociation of a diatomic molecule

Let A be some type of atom, and let D = A2 be the diatomic form of A. Then consider the opposite reactions:

A + A \stackrel{u}{\longrightarrow} D

D \stackrel{v}{\longrightarrow} A + A

From the preceding analysis, at equilibrium the following relation holds:

u [A]^2 = v [D]

Let N(A) and N(B) be the population counts for A and B, and let

N = N(A) + 2 N(D)

be the total number of units of A in the system, whether they be in the form of atoms or diatoms.

The value of N is an invariant property of the system. The reactions cannot change it, because they are just shuffling the units of A from one arrangement to the other. By way of contrast, N(A) is not an invariant quantity.

Dividing this equation by the total volume V, we get:

[N] = [A] + 2 [D]

where [N] is the concentration of the units of A.

Given a fixed value for [N] and the rate constants u and v, we can then solve for the concentrations at equilibrium:

\displaystyle{u [A]^2 = v [D] = v ([N] - [A]) / 2 }

\displaystyle{2 u [A]^2 + v [A] - v [N] = 0 }

\displaystyle{[A] = (-v \pm \sqrt{v^2 + 8 u v [N]}) / 4 u }

Since [A] can’t be negative, only the positive square root is valid.

Here is the solution for the case where u = v = 1:

\displaystyle{[A] = (\sqrt{8 [N] + 1} - 1) / 4 }

\displaystyle{[D] = ([N] - [A]) / 2 }


We’ve covered a lot of ground, starting with the introduction of the stochastic Petri net model, followed by a general discussion of reaction network dynamics, the mass action laws, and calculating equilibrium solutions for simple reaction networks.

We still have a number of topics to cover on our journey into the foundations, before being able to write informed programs to solve problems with stochastic Petri nets. Upcoming topics are (1) the deterministic rate equation for general reaction networks and its application to finding equilibrium solutions, and (2) an exploration of the stochastic dynamics of a Petri net. These are the themes that will support our upcoming software development.


Get every new post delivered to your Inbox.

Join 3,232 other followers