Azimuth Backup Project (Part 1)

16 December, 2016

This blog page is to help organize the Azimuth Environmental Data Backup Project, or Azimuth Backup Project for short. This is part of the larger but decentralized, frantic and somewhat disorganized project discussed elsewhere:

Saving Climate Data (Part 2), Azimuth, 15 December 2016.

Here I’ll just say what we’re doing at Azimuth.

Jan Galkowski is a statistician and engineer at Akamai Technologies, a company in Cambridge Massachusetts whose content delivery network is one of the world’s largest distributed computing platforms, responsible for serving at least 15% of all web traffic. He has begun copying some of the publicly accessible US government climate databases. On 11 December he wrote:

John, so I have just started trying to mirror all of CDIAC [the Carbon Dioxide Information Analysis Center]. We’ll see. I’ll put it in a tarball, and then throw it up on Google. It should keep everything intact. Using WinHTTrack. I have coordinated with Eric Holthaus via Twitter, creating, per your suggestion, a new personal account which I am using exclusively to follow the principals.

Once CDIAC is done, and checked over, I’ll move on to other sites.

There are things beyond our control, such as paper records, or records which are online but are not within visibility of the public.

Oh, and I’ve formally requested time off from work for latter half of December so I can work this on vacation. (I have a number of other projects I want to work in parallel, anyway.)

By 14 December he was wanting some more storage space. He asked David Tanzer and me:

Do either of you have a large Google account, or the “unlimited storage” option at Amazon?

I’m using WebDrive, a commercial product. What I’m (now) doing is defining an FTP map at a .gov server, and then a map to my Amazon Cloud Drive. I’m using Windows 7, so these appear as standard drives (or mounts, in *nix terms). I navigate to an appropriate place on the Amazon Drive, and then I proceed to copy from .gov to Amazon.

There is no compression, and, in order to be sure I don’t abuse the .gov site, I’m deliberately passing this over a wireless network in my home, which limits the transfer rate. If necessary, and if the .gov site permits, I could hardwire the workstation to our FIOS router and get appreciably faster transfer. (I often do that for large work files.)

The nice thing is I get to work from home 3 days a week, so I can keep an eye on this. And I’m taking days off just to do this.

I’m thinking about how I might get a second workstation in the act.

The Web sites themselves I’m downloading, as mentioned, using HTTrack. I intended to tarball-up the site structure and then upload to Amazon. I’m still working on CDIAC at ORNL. For future sites, I’m going to try to get HTTrack to mirror directly to Amazon using one of the mounts.

I asked around for more storage space, and my request was kindly answer by Scott Maxwell. Scott lives in Pasadena California and he used to work for NASA: he even had a job driving a Mars rover! He is now a site reliability engineer at Google, and he works on Google Drive. Scott is setting up a 10-terabyte account on Google Drive, which Jan and others will be able to use.

Meanwhile, Jan noticed some interesting technical problems: for some reason WebDrive is barely using the capacity of his network connection, so things are moving much more slowly than they could in theory.

Most recently, Sakari Maaranen offered his assistance. Sakari is a systems architect at Ubisecure, a firm in Finland that specializes in identity management, advanced user authentication, authorization, single sign-on, and federation. He wrote:

I have several terabytes worth in Helsinki (can get more) and a gigabit connection. I registered my offer but they [the DataRefuge people] didn’t reply though. I’m glad if that means you have help already and don’t need a copy in Helsinki.﻿

I replied saying that the absence of a reply probably means that they’re overwhelmed by offers of help and are struggling to figure out exactly what to do. Scott said:

Hey, Sakari! Thank you for the generous offer!

I’m setting these guys up with Google Drive storage, as at least a short-term solution.

IMHO, our first order of business is just to get a copy of the data into a location we control—one that can’t easily be taken away from us. That’s the rationale for Google Drive: it fits into Jan’s existing workflow, so it’s the lowest-friction path to getting a copy of the data that’s under our control.

How about if I propose this: we let Jan go ahead with the plan of backing up the data in Drive. Then I’ll look evaluate moving it from there to whatever other location we come up with. (Or copying instead of moving! More copies is better. :-) How does that sound to you?

I admit I haven’t gotten as far as thinking about Web serving at all—and it’s not my area of expertise anyway. Maybe you’d be kind enough to elaborate on your thoughts there.

Sakari responded with some information about servers. In late January, U. C. Riverside may help me with this—until then they are busy trying to get more storage space, for wholly unrelated reasons. But right now it seems the main job is to identify important data and get it under our control.

There are a lot of ways you could help.

Computer skills. Personally I’m not much use with anything technical about computers, but the rest of the Azimuth Data Backup gang probably has technical questions that some of you out there could answer… so, I encourage discussion of those questions. (Clearly some discussions are best done privately, and at some point we may encounter unfriendly forces, but this is a good place for roaming experts to answer questions.)

Security. Having a backup of climate data is not very useful if there are also fake databases floating around and you can’t prove yours is authentic. How can we create a kind of digital certificate that our database matches what was on a specific website at a specific time? We should do this if someone here has the expertise.

Money. If we wind up wanting to set up a permanent database with a nice front end, accessible from the web, we may want money. We could do a Kickstarter campaign. People may be more interested in giving money now than later, unless the political situation immediately gets worse after January 20th.

Strategy. We should talk a bit about what to do next, though too much talk tends to prevent action. Eventually, if all goes well, our homegrown effort will be overshadowed by others, at least in sheer quantity. About 3 hours ago Eric Holthaus tweeted “we just got a donation of several petabytes”. If it becomes clear that others are putting together huge, secure databases with nice front ends, we can either quit or—better—cooperate with them, and specialize on something we’re good at and enjoy.

Globular

14 December, 2016

One of my goals is to turn category theory, and even higher category theory, into a practical tool for science. For this we need good scientific ideas—but we also need good software.

My friend Jamie Vicary has been developing some of this software, together with Aleks Kissinger and Krzysztof Bar and others. Jamie demonstrated it at the Simons Institute workshop on compositionality. You can watch his demonstration here:

But since Globular runs on a web browser, you can also try it out yourself here:

Globular.

You can see his talk slides:

• Jamie Vicary, Data structures for quasistrict higher categories. (Talk slides here.)

Abstract. Higher category theory is one of the most general approaches to compositionality, with broad and striking applications across computer science, mathematics and physics. We present a new, simple way to define higher categories, in which many important compositional properties emerge as theorems, rather than axioms. Our approach is amenable to computer implementation, and we present a new proof assistant we have developed, with a powerful graphical calculus. In particular, we will outline a substantial new proof we have developed in our setting.

And in December 2015, he wrote an article about this software on the n-Category Café. It’s been improved since then, but it can’t hurt to read what he wrote—so I append it here!

Globular: the basic idea

When you’re trying to prove something in a monoidal category, or a higher category, string diagrams are a really useful technique, especially when you’re trying to get an intuition for what you’re doing. But when it comes to writing up your results, the problems start to mount. For a complex proof, it’s hard to be sure your result is correct—a slip of the pen could lead to a false proof, and an error that’s hard to find. And writing up your results can be a huge time-sink, requiring weeks or months using a graphics package, all just for some nice pictures that tell you little about the correctness of the proof, and become useless if you decide to change your approach. Computers should be able help with all these things, in the way that proof assistants like Coq and Agda are allowing us to work with traditional syntactic proofs in a more sophisticated way.

The purpose of this post is to introduce Globular, a new proof assistant for working with higher-categorical proofs using string diagrams. It’s available at http://globular.science, with documentation on the nab. It’s web-based, so everything happens right in your browser: build formal proofs, visualize and step through them; keep your proofs private, share them with collaborators, or make them publicly available.

Before we get into the technical details, here’s a screenshot of Globular in action:

The main part of the screen shows a diagram, which in this case is 2-dimensional. It represents a composite 2-cell in a finitely-presented 2-category, with the blue and red regions representing objects, the lines representing 1-cells, and the vertices representing 2-cells. In fact, this 2d diagram is just an intermediate state of a 3d proof, through which we’re navigating with the ‘Slice’ controls in the top-right. The proof itself has been built up by composing the generators listed in the signature, down the left-hand side of the screen. (If you want to take a look at this proof yourself, you can go straight there; in the top-right, set ‘Project’ to 0, then increment the second ‘Slice’ counter to scroll through the proof.)

Globular has been developed so far in the Quantum Group in
the Oxford Computer Science department, by Krzysztof
Bar
, Katherine Casey, Aleks Kissinger, Jamie Vicary and Caspar Wylie. We haven’t quite got around to it yet, but Globular will be open-source, and we’re really keen for people to get involved and help build the software—there’s a huge amount to do! If you want to help out, get in touch.

Mathematical foundations

Globular is based on the theory of finitely-presented semistrict n-categories; at the moment, it works up to the level of 3-categories, with an extension to 4-categories actively in development. (You can build cells of any dimension, but from 4-cells and up, some structures are missing.)

Definitions of n-category vary in how strict they are; a definition is semistrict when it’s as strict as possible, while still having the property that every weak n-category satisfies it, up to equivalence. Definitions of semistrict n-category are not unique: in dimension 3, Gray categories put all the weak structure in the interchangers, while Simpson snucategories put it all in the unitors. Globular implements the axioms of a Gray category, because this is the most appropriate for the graphical calculus: the interchangers can be seen graphically, as changes in height of the components of the diagram. By the theory of k-tuply monoidal n-categories, this also lets you build proofs in a monoidal category, or a braided monoidal category, or a monoidal 2-category.

The only things that Globular understands are $k$-cells, for some value of $k$. So if you want to build an n-category where an equation $f=g$ holds between n-cells, you have to do it by adding $(n+1)$-cells $a:f \to g$ and $b:g \to f$. If you then build some composite $C(f)$ involving $f$, you can apply the cell $a$ to obtain $C(g)$, and we interpret this as the equation $C(f) = C(g)$. In a slogan, this is equality via rewriting. This is consistent with the basic premise of homotopy type theory: treat your proofs as first-order structures, which can in turn be reasoned about themselves.

Globular can also handle invertibility in a nice way. For a cell $F:A \to B$ to be invertible, indicated by ticking a box in the signature, means that there also exists an invertible cell $F^{-1}: B \to A$, and invertible cells $\text{id}_A \to F . F^{-1}$ and $\text{id}_B \to F^{-1} . F$. This is a coinductive definition (see Mike Shulman’s nice post on this topic), since we’re defining the notion of invertibility in terms of itself in a higher dimension. This sort of a definition is great for proof assistants to work with, as it allows a lot of structure to be generated from a single compact definition.

How it works

For a lot more details, take a look at the nLab page. Everything that happens in Globular involves in interaction between the signature on the left-hand side, and the diagram in the main part of the screen. The signature stores the ‘library’ of cells you have available, and the diagram is a particular composite of cells that you have constructed.

To construct a new diagram, clear whatever is currently displayed by clicking the ‘Clear’ button on the right, or pressing ‘c’. Then start by clicking the icon of a n-cell in your signature, which will make a diagram consisting just of that cell. Clicking on the icons of other $k$-generators for $0 < k \leq n$ will display a list of ways the cell can be attached, and when you choose one of these ways, the attachment will be performed, growing your n-diagram. (If you’re starting with a blank workspace you will only have a single 0-cell available, so you won’t be able to do this yet!) Clicking an $(n+1)$-cell $G$ displays a list of ways that your n-diagram $D$ can be rewritten, by identifying the source of $G$ as a subdiagram of $D$. Selecting one of these ways will implement the rewrite, by ‘cutting out’ the chosen subdiagram of $D$, and replacing it with the target of $G$.

Another way to modify the diagram is to click directly on it. Clicking near the edge of the diagram performs an attachment, while clicking in the interior of the diagram performs a rewrite. If more than one attachment or rewrite is consistent with your click, a little menu will pop up for you to choose what you want to do. When you move your mouse pointer over the diagram, a little label pops up to show you what your cursor is hovering over, which is helpful when choosing where to click.

You can also click-and-drag on the diagram. This will attach or rewrite with an interchanger, or naturality for an interchanger, or invertibility for an interchanger, depending on where you have clicked and the direction of the drag. Clicking and dragging is designed to work as if you were really ‘touching’ the strings. So if you want to braid one strand over another, click the strand to ‘grab’ it, and ‘pull’ it to one side. If you want to pull a vertex through a braiding, click the vertex to ‘grab’ it, and ‘pull’ it up or down through its adjacent braiding. Of course, Globular will only carry out the command if the move you are attempting to make is actually valid in that location.

Example theorems

Here are four worked examples of nontrivial proofs in Globular:

Frobenius implies associative: http://globular.science/1512.004. In a monoidal category, if multiplication and comultiplication morphisms are unital, counital and Frobenius, then they are associative and coassociative.

Strengthening an equivalence: http://globular.science/1512.007. In a 2-category, an equivalence gives rise to an adjoint equivalence, satisfying the snake equations.

Swallowtail comes for free: http://globular.science/1512.006. In a monoidal 2-category, a weakly-dual pair of objects gives rise to a strongly-dual pair, satisfying the swallowtail equations.

Pentagon and triangle implies $\lambda_I = \rho_I$: http://globular.science/1512.002. In a monoidal 2-category, if a pseudomonoid object satisfies pentagon and triangle equations, then it satisfies $\lambda_I = \rho_I$.

We’ll focus on the second example project “Strengthening an equivalence” listed above, and see how it was constructed. This project investigates the factthat every equivalence in a 2-category gives rise to an adjoint equivalence. To start, we therefore need the basic data that exhibits an equivalence in a 2-category: two 0-cells $A$ and $B$, and an invertible 1-cell $F:A \to B$, by the weak definition of ‘invertible’ discussed above. This gives us the following signature:

The 2-cells that witness invertibility of $F$ look like cups and caps in the graphical calculus, but they won’t satisfy the snake equations that define an adjoint equivalence. The idea of this proof is to define a new cap, built out of the invertible structure of $F$, which does satisfy the snake equations with the existing cup.

By starting with a diagram consisting of $F$ alone, pressing ‘i’ to take the identity diagram, and clicking-and-dragging, we build the following 2-diagram, out of the invertible structure associated to $F$:

This is our candidate for our redefined cup. Its source is the identity on $A$, and its target is $F$ composed with $F^{-1}$. It looks a bit like the curved end of a hockey stick.

To store it for later use, we now click the ‘Theorem’ button. Writing $D$ for the 2-diagram we have constructed, this does two things. First, it creates a 2-cell generator that we call “New Cup”, whose source is $s(D)$, and whose target is $t(D)$. This is the redefined cup that we can use in future expressions. Second, it creates an invertible 3-cell generator that we call “New Cup Definition”, with source given by “New Cup”, and with target given by our hockey-stick diagram $D$. This says what “New Cup” means in terms of our original structure. This adds the following cells to our signature:

Because “New Cup Definition” is a 3-cell, by default we see two little icons: one for its source, and one for its target. See how its source is a little picture of “New Cup”, and its target is a little picture of the hockey stick, just as we defined it.

We’re now ready to prove one of the snake equations. We start by building the snake composite, using “New Cup” for the cup, and the invertible structure of $F$ for the cap:

Now have to prove that this equals the identity. Since equality is implemented by rewriting, we must construct a 3-diagram whose source is this snake composite, and whose target is the identity on $F$. To start, we click the ‘Identity’ button to convert our diagram into an identity 3-diagram. The only apparent effect this has is to add a number scroller to the ‘Slice’ area of the controls in the top-right. At the moment we can set this to ‘0’ and ‘1’, representing the source and target of our identity 3-diagram respectively. We set it to ‘1’, because we want to compose things to the target.

We now build up our proof. First, we click on the pink vertex that represents “New Cup”. This will attach our 3-cell “New Cup Definition”, replacing “New Cup” with our hockey-stick picture. By clicking-and-dragging on the diagram, we obtain the following sequence
of pictures:

Pictures 3 to 10 were created by attaching interchangers, and pictures 11 to 15 were created by attaching higher structure generated by the invertibility of $F$. In all cases, this structure was attached just by clicking-and-dragging on the appropriate vertices of the diagram. We’ve turned the snake into the identity, so we’ve finished our proof, which required 14 3-cells. By using the ‘Slice’ control in the top-right, we can navigate through the 15 slices that make up our proof, and review what we just did. Even better, turning the ‘Project’ control to the value ‘1’ tells Globular to project out the lowest dimension. This means that our entire 3-diagram proof can be viewed as a single 2-dimensional diagram, as follows:

This is just like the Morse singularity graphics used by topologists to study the structure of higher-dimensional manifolds. In this picture, the vertices are 3-cells, the lines are 2-cells, and the regions are 1-cells (in fact, every region is the 1-cell $F$.) By moving your mouse pointer over the different parts of the diagram, you can see what the different components are. Interchangers are represented in this projection by braidings.

Now we can do something cool: we can modify our proof, by clicking-and-dragging on the Morse projection. For example, just to the right of centre, there is a crossing, out of which emerge two long vertical lines that travel up a long way before annihilating with one another. Our proof would be much simpler if these two lines just annihilated with each other right after the interchanger. So, we click the vertex at the top of the lines, and drag it down repeatedly, until it gets to where we want it:

We’ve simplified our proof. By clicking-and-dragging some more, you can change the proof in lots of different ways, although you probably won’t get it much simpler than this. Putting the ‘Project’ control back to ‘0’, and navigating through the stages of the proof with the ‘Slice’ control as we were doing before, we can see that our proof has indeed been modified.

This project has been in development for about 18 months, so it feels great to finally launch. We hope the whole community will get clicking-and-dragging, and let us know how easy it is to use, and what other features would be useful. There are certain to still be bugs, so let us know about them too, and we’ll get right on them.

Programming with Data Flow Graphs

5 June, 2016

Network theory is catching on—in a very practical way!

Google recently started a new open source library called TensorFlow. It’s for software built using data flow graphs. These are graphs where the edges represent tensors—that is, multidimensional arrays of numbers—and the nodes represent operations on tensors. Thus, they are reminiscent of the spin networks used in quantum gravity and gauge theory, or the tensor networks used in renormalization theory. However, I bet the operations involved are nonlinear! If so, they’re more general.

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

What is a Data Flow Graph?

Data flow graphs describe mathematical computation with a directed graph of nodes & edges. Nodes typically implement mathematical operations, but can also represent endpoints to feed in data, push out results, or read/write persistent variables. Edges describe the input/output relationships between nodes. These data edges carry dynamically-sized multidimensional data arrays, or tensors. The flow of tensors through the graph is where TensorFlow gets its name. Nodes are assigned to computational devices and execute asynchronously and in parallel once all the tensors on their incoming edges becomes available.

TensorFlow Features

Deep Flexibility. TensorFlow isn’t a rigid neural networks library. If you can express your computation as a data flow graph, you can use TensorFlow. You construct the graph, and you write the inner loop that drives computation. We provide helpful tools to assemble subgraphs common in neural networks, but users can write their own higher-level libraries on top of TensorFlow. Defining handy new compositions of operators is as easy as writing a Python function and costs you nothing in performance. And if you don’t see the low-level data operator you need, write a bit of C++ to add a new one.

True Portability. TensorFlow runs on CPUs or GPUs, and on desktop, server, or mobile computing platforms. Want to play around with a machine learning idea on your laptop without need of any special hardware? TensorFlow has you covered. Ready to scale-up and train that model faster on GPUs with no code changes? TensorFlow has you covered. Want to deploy that trained model on mobile as part of your product? TensorFlow has you covered. Changed your mind and want to run the model as a service in the cloud? Containerize with Docker and TensorFlow just works.

Connect Research and Production. Gone are the days when moving a machine learning idea from research to product require a major rewrite. At Google, research scientists experiment with new algorithms in TensorFlow, and product teams use TensorFlow to train and serve models live to real customers. Using TensorFlow allows industrial researchers to push ideas to products faster, and allows academic researchers to share code more directly and with greater scientific reproducibility.

Auto-Differentiation. Gradient based machine learning algorithms will benefit from TensorFlow’s automatic differentiation capabilities. As a TensorFlow user, you define the computational architecture of your predictive model, combine that with your objective function, and just add data — TensorFlow handles computing the derivatives for you. Computing the derivative of some values w.r.t. other values in the model just extends your graph, so you can always see exactly what’s going on.

Language Options. TensorFlow comes with an easy to use Python interface and a no-nonsense C++ interface to build and execute your computational graphs. Write stand-alone TensorFlow Python or C++ programs, or try things out in an interactive TensorFlow iPython notebook where you can keep notes, code, and visualizations logically grouped. This is just the start though — we’re hoping to entice you to contribute SWIG interfaces to your favorite language — be it Go, Java, Lua, JavaScript, or R.

Maximize Performance. Want to use every ounce of muscle in that workstation with 32 CPU cores and 4 GPU cards? With first-class support for threads, queues, and asynchronous computation, TensorFlow allows you to make the most of your available hardware. Freely assign compute elements of your TensorFlow graph to different devices, and let TensorFlow handle the copies.

Who Can Use TensorFlow?

TensorFlow is for everyone. It’s for students, researchers, hobbyists, hackers, engineers, developers, inventors and innovators and is being open sourced under the Apache 2.0 open source license.

TensorFlow is not complete; it is intended to be built upon and extended. We have made an initial release of the source code, and continue to work actively to make it better. We hope to build an active open source community that drives the future of this library, both by providing feedback and by actively contributing to the source code.

Why Did Google Open Source This?

If TensorFlow is so great, why open source it rather than keep it proprietary? The answer is simpler than you might think: We believe that machine learning is a key ingredient to the innovative products and technologies of the future. Research in this area is global and growing fast, but lacks standard tools. By sharing what we believe to be one of the best machine learning toolboxes in the world, we hope to create an open standard for exchanging research ideas and putting machine learning in products. Google engineers really do use TensorFlow in user-facing products and services, and our research group intends to share TensorFlow implementations along side many of our research publications.

For more details, try this:

Very Long Proofs

28 May, 2016

In the 1980s, the mathematician Ronald Graham asked if it’s possible to color each positive integer either red or blue, so that no triple of integers $a,b,c$ obeying Pythagoras’ famous equation:

$a^2 + b^2 = c^2$

all have the same color. He offered a prize of \$100.

Now it’s been solved! The answer is no. You can do it for numbers up to 7824, and a solution is shown in this picture. But you can’t do it for numbers up to 7825.

To prove this, you could try all the ways of coloring these numbers and show that nothing works. Unfortunately that would require trying

3 628 407 622 680 653 855 043 364 707 128 616 108 257 615 873 380 491 654 672 530 751 098 578 199 115 261 452 571 373 352 277 580 182 512 704 196 704 700 964 418 214 007 274 963 650 268 320 833 348 358 055 727 804 748 748 967 798 143 944 388 089 113 386 055 677 702 185 975 201 206 538 492 976 737 189 116 792 750 750 283 863 541 981 894 609 646 155 018 176 099 812 920 819 928 564 304 241 881 419 294 737 371 051 103 347 331 571 936 595 489 437 811 657 956 513 586 177 418 898 046 973 204 724 260 409 472 142 274 035 658 308 994 441 030 207 341 876 595 402 406 132 471 499 889 421 272 469 466 743 202 089 120 267 254 720 539 682 163 304 267 299 158 378 822 985 523 936 240 090 542 261 895 398 063 218 866 065 556 920 106 107 895 261 677 168 544 299 103 259 221 237 129 781 775 846 127 529 160 382 322 984 799 874 720 389 723 262 131 960 763 480 055 015 082 441 821 085 319 372 482 391 253 730 679 304 024 117 656 777 104 250 811 316 994 036 885 016 048 251 200 639 797 871 184 847 323 365 327 890 924 193 402 500 160 273 667 451 747 479 728 733 677 070 215 164 678 820 411 258 921 014 893 185 210 250 670 250 411 512 184 115 164 962 089 724 089 514 186 480 233 860 912 060 039 568 930 065 326 456 428 286 693 446 250 498 886 166 303 662 106 974 996 363 841 314 102 740 092 468 317 856 149 533 746 611 128 406 657 663 556 901 416 145 644 927 496 655 933 158 468 143 482 484 006 372 447 906 612 292 829 541 260 496 970 290 197 465 492 579 693 769 880 105 128 657 628 937 735 039 288 299 048 235 836 690 797 324 513 502 829 134 531 163 352 342 497 313 541 253 617 660 116 325 236 428 177 219 201 276 485 618 928 152 536 082 354 773 892 775 152 956 930 865 700 141 446 169 861 011 718 781 238 307 958 494 122 828 500 438 409 758 341 331 326 359 243 206 743 136 842 911 727 359 310 997 123 441 791 745 020 539 221 575 643 687 646 417 117 456 946 996 365 628 976 457 655 208 423 130 822 936 961 822 716 117 367 694 165 267 852 307 626 092 080 279 836 122 376 918 659 101 107 919 099 514 855 113 769 846 184 593 342 248 535 927 407 152 514 690 465 246 338 232 121 308 958 440 135 194 441 048 499 639 516 303 692 332 532 864 631 075 547 542 841 539 848 320 583 307 785 982 596 093 517 564 724 398 774 449 380 877 817 714 717 298 596 139 689 573 570 820 356 836 562 548 742 103 826 628 952 649 445 195 215 299 968 571 218 175 989 131 452 226 726 280 771 962 970 811 426 993 797 429 280 745 007 389 078 784 134 703 325 573 686 508 850 839 302 112 856 558 329 106 490 855 990 906 295 808 952 377 118 908 425 653 871 786 066 073 831 252 442 345 238 678 271 662 351 535 236 004 206 289 778 489 301 259 384 752 840 495 042 455 478 916 057 156 112 873 606 371 350 264 102 687 648 074 992 121 706 972 612 854 704 154 657 041 404 145 923 642 777 084 367 960 280 878 796 437 947 008 894 044 010 821 287 362 106 232 574 741 311 032 906 880 293 520 619 953 280 544 651 789 897 413 312 253 724 012 410 831 696 803 510 617 000 147 747 294 278 502 175 823 823 024 255 652 077 422 574 922 776 413 427 073 317 197 412 284 579 070 292 042 084 295 513 948 442 461 828 389 757 279 712 121 164 692 705 105 851 647 684 562 196 098 398 773 163 469 604 125 793 092 370 432

possibilities. But recently, three mathematicians cleverly figured out how to eliminate most of the options. That left fewer than a trillion to check!

So they spent 2 days on a supercomputer, running 800 processors in parallel, and checked all the options. None worked. They verified their solution on another computer.

This is one of the world’s biggest proofs: it’s 200 terabytes long! That’s about equal to all the digitized text held by the US Library of Congress. There’s also a 68-gigabyte digital signature—sort of a proof that a proof exists—if you want to skim it.

It’s interesting that these 200 terabytes were used to solve a yes-or-no question, whose answer takes a single bit to state: no.

I’m not sure breaking the world’s record for the longest proof is something to be proud of. Mathematicians prize short, insightful proofs. I bet a shorter proof of this result will eventually be found.

Still, it’s fun that we can do such things. Here’s a story about the proof:

• Evelyn Lamb, Two-hundred-terabyte maths proof is largest ever, Nature, May 26, 2016.

and here’s the actual paper:

• Marijn J. H. Heule, Oliver Kullmann and Victor W. Marek, Solving and verifying the Boolean Pythagorean triples problem via cube-and-conquer.

The ‘cube-and-conquer’ paradigm is a “hybrid SAT method for hard problems, employing both look-ahead and CDCL solvers”. The actual benefit of this huge proof is developing such methods for solving big problems! In the comments to my G+ post, Roberto Bayardo explained:

CDCL == “conflict-directed clause learning”: when you hit a dead end in backtrack search, this is the process of recording a (hopefully small) clause that prevents you from making the same mistake again…a type of memorization, essentially.

Look-ahead: in backtrack search, you repeat the process of picking an unassigned variable and then picking an assignment for that variable until you reach a “dead end” (upon deriving a contradiction). Look-ahead involves doing some amount of processing on the remaining formula after each assignment in order to simplify it. This includes identifying variables for which one of its assignments can be quickly eliminated. “Unit propagation” is a type of look-ahead, though I suspect in this case they mean something quite a bit more sophisticated.﻿

Arnaud Spiwack has a series of blog posts introducting SAT solvers here:

A much longer proof

By the way, despite the title of the Nature article, in the comments to my G+ post about this, Michael Nielsen pointed out a much longer proof by Chris Jefferson, who wrote:

Darn, I had no idea one could get into the media with this kind of stuff.

I had a much larger “proof”, where we didn’t bother storing all the details, in which we enumerated 718,981,858,383,872 semigroups, towards counting the semigroups of size 10.

Uncompressed, it would have been about 63,000 terabytes just for the semigroups, and about a thousand times that to store the “proof”, which is just the tree of the search.

Of course, it would have compressed extremely well, but also I’m not sure it would have had any value, you could rebuild the search tree much faster than you could read it from a disc, and if anyone re-verified our calculation I would prefer they did it by a slightly different search, which would give us much better guarantees of correctness.

His team found a total of 12,418,001,077,381,302,684 semigroups of size 10. They only had to find 718,981,858,383,872 by a brute force search, which is 0.006% of the total:

• Andreas Distler, Chris Jefferson, Tom Kelsey, and Lars Kottho, The semigroups of order 10, in Principles and Practice of Constraint Programming, Springer Lecture Notes in Computer Science 7514, Springer, Berlin, pp. 883–899.

Insanely long proofs

All the proofs mentioned so far are downright laconic compared to those discussed here:

• John Baez, Insanely long proofs, 19 October 2012.

For example, if you read this post you’ll learn about a fairly short theorem whose shortest proof using Peano arithmetic contains at least

$\displaystyle{ 10^{10^{1000}} }$

symbols. This is so many that if you tried to write down the number of symbols in this proof—not the symbols themselves, but just the number of symbols—in ordinary decimal notation, you couldn’t do it if you wrote one digit on each proton, neutron and electron in the observable Universe!

The Busy Beaver Game

21 May, 2016

This month, a bunch of ‘logic hackers’ have been seeking to determine the precise boundary between the knowable and the unknowable. The challenge has been around for a long time. But only now have people taken it up with the kind of world-wide teamwork that the internet enables.

A Turing machine is a simple model of a computer. Imagine a machine that has some finite number of states, say N states. It’s attached to a tape, an infinitely long tape with lots of squares, with either a 0 or 1 written on each square. At each step the machine reads the number where it is. Then, based on its state and what it reads, it either halts, or it writes a number, changes to a new state, and moves either left or right.

The tape starts out with only 0’s on it. The machine starts in a particular ‘start’ state. It halts if it winds up in a special ‘halt’ state.

The Busy Beaver Game is to find the Turing machine with N states that runs as long as possible and then halts.

The number BB(N) is the number of steps that the winning machine takes before it halts.

In 1961, Tibor Radó introduced the Busy Beaver Game and proved that the sequence BB(N) is uncomputable. It grows faster than any computable function!

A few values of BB(N) can be computed, but there’s no way to figure out all of them.

As we increase N, the number of Turing machines we need to check increases faster than exponentially: it’s

$(4(n+1))^{2n}$

Of course, many could be ruled out as potential winners by simple arguments. But the real problem is this: it becomes ever more complicated to determine which Turing machines with N states never halt, and which merely take a huge time to halt.

Indeed, matter what axiom system you use for math, as long as it has finitely many axioms and is consistent, you can never use it to correctly determine BB(N) for more than some finite number of cases.

So what do people know about BB(N)?

For starters, BB(0) = 0. At this point I should admit that people don’t count the halt state as one of our N states. This is just a convention. So, when we consider BB(0), we’re considering machines that only have a halt state. They instantly halt.

Next, BB(1) = 1.

Next, BB(2) = 6.

Next, BB(3) = 21. This was proved in 1965 by Tibor Radó and Shen Lin.

Next, BB(4) = 107. This was proved in 1983 by Allan Brady.

Next, BB(5). Nobody knows what BB(5) equals!

The current 5-state busy beaver champion was discovered by Heiner Marxen and Jürgen Buntrock in 1989. It takes 47,176,870 steps before it halts. So, we know

BB(5) ≥ 47,176,870.

People have looked at all the other 5-state Turing machines to see if any does better. But there are 43 machines that do very complicated things that nobody understands. It’s believed they never halt, but nobody has been able to prove this yet.

We may have hit the wall of ignorance here… but we don’t know.

That’s the spooky thing: the precise boundary between the knowable and the unknowable is unknown. It may even be unknowable… but I’m not sure we know that.

Next, BB(6). In 1996, Marxen and Buntrock showed it’s at least 8,690,333,381,690,951. In June 2010, Pavel Kropitz proved that

$\displaystyle{ \mathrm{BB}(6) \ge 7.412 \cdot 10^{36,534} }$

You may wonder how he proved this. Simple! He found a 6-state machine that runs for

74120785350949561017417256114460496971828161169529
80089256690109516566242803284854935655097454968325
70980660176344529980240910175257246046044979228025
75771151854805208765058993515648321741354119777796
52792554935324476497129358489749784615398677842157
90591584199376184970716056712502662159444663041207
99923528301392867571069769762663780101572566619882
47506945421793112446656331449750558811894710601772
36559599539650767076706500145117029475276686016750
65295229541290448711078495182631695097472697111587
32776867610634089559834790714490552734463125404673
70809010860451321212976919019625935072889346503212
31429040253205457633515626107868771760119916923828
74680371458459216127416939655179359625797982162506
60314494227818293289779254614732935080486701454668
21939225145869908038228128045266110256571782631958
92689852569468996669587422751961691118179060839800
19742149153987715916968833647534774800748757776661
12880815431005997890623859440699663891591940342058
44534513595160016891606589527654143070266884025253
13506538908970519826326303859380836606399479857223
51182179370081120817877269116559398341767052162655
91720120243332830032375287823064283197180507239529
73532295517310483710218115976193011673158268680311
96305710080419119304779715796727850295295594397972
94500432643483372677378872480519292318129075141594
30017042374851948272409014919776223009518089664572
07992743507711148214208698063203898384663486444006
34378985820511533007233636175063664244348363429381
71686527767592717386843513084430760538597788497854
57039288736337621694394446002685937650424904397470
70469396499307353236961551408770635423051623551580
20502046433028802860624333066826817081190086396068
05449212705508208665911831651713621990622680221463
40355100698032539208500321737935916572824167109243
92179632770888698806462239286922365720234049353262
29319760109739336919551555856149717013936570835108
29337138019755841416703048425473095781543947284695
30557891048303296887105203445899799657005379321565
69516567462536723341045028533527590132444550041637
29329181785691615066366432555395455966924963740865
92347905851808900172725987566577427348338512074400
14671953393767922133480111674181667283600828299423
61956450241322000295203110123701834947807654204715
50872484529282734610014634854736508912653344685939
21925381546951641870418349782007870841424352960246
81621927943512800969148833336725244172361946188547
20963814880877462649154982581243612288332193203522
41878334479833109122807808425070272194370533803098
15576207436359412405607991428582586135325000600450
34044570755614842353801605755138514728637444538199
91270653752636827482388619627545586769702982563550
21579190594347001678124794147520737780545710725238
09263070578948251221705378404200557518123183429763
74391628221317569903581457033268409573939140317537
92951945222572832854376667354589981221872208463941
92173302390371597480313550832469764638860694385735
23920879420538715366934472880272772745254215764827
22658077210282649639911775387884460117481542574020
98604710597497363577816224906240529468176628001243
37027642430572009172724680494845807607875336391296
35595374936463756499152341721363955306855005063147
84058597424341392606532443545414393065035952175386
45638222669677018885647203581909266795843083183075
45078527771378321186170870661268143673661410440879
12940056479135404302810843179830761186081727156785
64098233869131431441387859096996327035057252704630
66502399928551829284912464275322457081097679293349
77256939108943396587781783827866809713105339479801
94252766851530607523746692117596241149131822801952
28443629054406029354014078168448839468854310977774
64971341943282207403959764566321636691138805567444
40040338242074160211898209536897949768268405300638
55020960995862149067127133242923955216689288381118
44058888209904636044250765206694970737949801463627
09477533118591401481473656166664409698471099509772
67427126852419476331478775678441642258692918630399
93094799728916927267392317125315003396591007151226
51178203821487177649041575923087270542299624201754
57070699334124035606469963629320951287401378625068
24738877579310019018071588005151785675631912063264
63879171290239696789072427830303321073398269847363
45629019926879365533487397619450023732220399774409
78878227032599708324913637134795947392057672257001
88982988598790346622775744604841009275542606005747
73489847857869077885071196141198763333605601699259
29619179821052298166718147760782068394323831299733
35022262733114475600303463447691380426320406458971
00672747110856632877598317707068109285992387448288
96303378384828434300860309575950421131368727514681
55719991530038357093718116282958853868614897146782
54967080333475258187514533923483088945298272830364
47705805899342687014494268126923276698359373141482
44674928751199880396209243410394589961770757345685
64543015202758133002674347175029218497929457573786
65467651289853262700253064391422801251703856704304
39933674129974352639333328279021853998732340493391
52439866339607669375777654460893584609291320819482
35450294931218519646257691473024784635292462805654
60565812545710189564825444516533104654549626307774
13759501791681585406819992391995410832229305735031
79743073679478401659929359532688412917629928632646
23531586811022948074724601440807980838830594224020
70309042157187760781779401504832688794481346645989
97848941467191367820110325917723306165886986506294
31831412363348418517790881203602332829570523258317
17949497171355624217480958863040591015617418162750
63724305524405091305861072355204855928475793642357
60246280346642123778396019497710295060768286004460
92308617300735231303196433523457326103470236858178
28414431828300312373660803542368392168889742383761
80821770347653926560938368076716961022633531512872
88504183987441820108573621090001961860938961602460
20885687763639795010335265588767970024085673382770
49445342795888876360045283740672969599921612120088
19088433242165085295954910707655578576243692034551
97614559215669274189870537223818067080510833662246
14856941296324679132997700908284769865178966760885
53161005611550761795254034078533136648007541637608
33678935806596748974138821535451231444972327449291
10024213166459016651306588184925060362024845259323
65173413805791200232437686840453953297502340211872
17360259379142737381790192340837264502917404521851
49249430081886421605763562692675510215455806969064
63544907025795646739155477402570724663876748602049
93436166161865545758100837460724945336759484902958
05760448798855031700989947223661484740412761111279
72682354237393559488179028185454100490417580488953
17216043021530545928659613553982781316650550537867
00952857558642344328764468061901146269419736379089
31879223575291135204858555802576520665674856000998
87879817374651045887072894889964276716532631796097
12769213622199519840449246106705404347452689144525
02026902997192020285019028617442866626487265731300
40640551447578058727968270741870141068657514616959
20677169731946885529018487968298771083015182717866
76088530960537199256279472232540485815576475023989
54150471113235298546458042675161613703655941414055
14600875711454906941647699297619365796793820814580
39814831564180619061865499399314083189554983356803
30767015598546845567749092553092817193708115207411
30039726042496587009377011208504194686832850560577
71332112325173133436614303950199710563675659743576
95759634767858347057063619337247953842775381829735
01515839943757098551455066444703952999001985213970
88527752763275564055679719982684702573490505326753
98021282801078182123611019529338931475797349672464
44059385046714080201178146810297989489194773283941
93746291180257355629914922000638130878140351659284
17485358899365619763286647381859607448645462954784
59233969830735095006570618760591874509688179105104
12456865774195864509969928029317962965086359995739
53548814859217251629847330330353163380838028768031
61203651855417256064885345072718378483364209631654
63908303387213995060644412231834039370317361626721
33451703923209110038936674869051927213642317876528
16991320616980772154778362044248178351052875315790
59407440453142692201342020196027082359311172043450
63647014817599680233754740166551904281645680384785
43484207291054858039349689807941261676589504440279
34675739151384342020909767349894791903987566481434
15238843747734338550634527977710083665707008945548
12980194777531956507697526221024482444025315826484
41683017177169605153673188813829296522594387128245
65901287931783268300595479085143271190752306807114
75945173848401996529051879487394089712075068830376
53688488908938496328770371731709863096656986444104
08201803469169112306001808844056491446464723441228
80657997624356240757329628378856760617602118493595
76037880180779827784647086182400197605967361763950
33673997643549829889211843819703562151131479726729
01802880267974602161706988320836675081916966310882
49095313995134342308211792489590022680899453895917
74944902836882433137054714577056337316253774263170
52294019030895743857006850581035142676723449462789
28264714750222749382953079695438542590392058673291
39096257478555758969160065468207714202648605111186
23874795826486193186947393782106560542813451140273
43780784300043577743478580825356921145531672555409
70149321650226039685363112051645618451238774970868
00014499436813245398575403118912847356591290588765
75653595733135742217401557961347275793910801273534
29151807248390239645759760752048078546363519519946
78919336290268584412830837345667768883056990324807
63173327386242877190676977493730442619179902126530
09880564208648705195942161723657549836039821614124
19940671663992530293860119765744117402843445712267
13596618543665796686329880389747123731107419115403
45207462458782741411300537230539221887374778233703
83605096596611557973677807766580326262439332267121
11801923744981394260402383843712773562047935390674
42122563315255005117319806544749303592608240355114
26104287190561845186438803878522806658022994173893
84778533884131972707349318882891948374079553673814
52850343823429237584760779384386062926213863866427
81961360138375377545545650428193002638139604593657
31337406162040202984032689042877403357513000261872
62895135097140548323692495424233162737319444152387
76746662103742710883076108190383757894072689353642
64969030654139724556329796612143291052231412044970
37933420967501497840698712388746483202475604070974
28823745682524686454563102344797165959894997090034
44051049030017408384964948728694659966590270138458
13453290239761803750653458018811684218923119008085
91762200196368137317672026076675105255423940046735
45014486702306358119905764811368254565294469758077
56929475913717952306191477036094256081328646465135
88173202952685000478508674285854433060409037157131
34973953490623757649011875332905719971957353757223
09860503547981930039783388507068159541449119588220
38967528182569374339481592073458636154289283223650
19534546781485375722855718519447096773869485755174
38100581579701010217915862939108556064322256968608
64061646106615054106258657747708850915917029017031
45625886387599673950676041820159666498173631174677
64716193096172970604794524963250374628801095983779
56073534151057391381495922022764751197295965014861
95636807693589605464925125373492393655880278853499
02202446706284772379836648849167504828201710381073
03329458670141724685763992293288349334389759164917
97309423042332708701046852013961258231103600450165
34505411303924779865224301545810028949776070035913
31854675753493005328299653077777661036596594334161
18324140736770437225608805982478257555946825524468
43743031331166759021115160275501148125345230987606
77278160953638658876659824121654739540845026103104
56833241900758722085690632764275681521803243648281
75973088546131339174823173625957848652825117498954
13479595716866331789714691850880571150460499972976
43306369801233196879814397180168695393291032751573
92506158006680468011940918143780196654320279894118
99753676684493284932246345070256837568979217094824
13674789294795851372211654038911266565104851609104
36913412156057173851727044502460820221614272608195
13894166831606579837662570513633907356874954240367
61054951791262883573121076674516756368643088470606
54365581172433912025679081223772154694705809408962
69112615546800941866814965362918061068014102459091
46087743968661858764328075373663888207948725625707
08747688827166843119576034872969317512228990778772
87050904881869406146583157468517895291237675578225
89024394102022506915147947735521950610377154115619
18117495558879264747451407598816062773916529397563
83562243551858599818978259130319451464037816343233
06633058393645640234765483567475994622676485484999
55277646683491016566937482924707993950782403274121
53574422245717762195720278348883131018490842980937
95687038636668821642422745084346013789736450796552
77272084564445898912543737560152633299759359959420
51990771767713693321032039215107832734360378720851
09136762349886344362420984720955074780889202541797
70896763846953214720553922486992302733707196483348
49045969833114793301657429813958969539406732142636
92502178082237337231815692752660602500625642690902
48328985111428648135448597379991004313275485637303
77657758862782442714471712782977203952113306637505
99526051279198553751017345873305211333857760858608
28684951160042807909556692892506555761946160549835
20303885701870763423255286037095591483157305753323
90742350781364515849011549346797178301358198056103
42477861647880999927308727218642361092720037983209
92492109020448284198786534092136303978056649046760
46986040724567578002859838619110484477846477503720
50610100383123165074971031850256994835659647733530
18102379912398920890647330875072013095255495682868
99218106145284129097154350663342836841523804131925
94548014661761166732470886782425501725751052498528
19766225123357293850179242144805633465265465905934
85940544983902174680151695063515178026513270513373
14567430101451394436010539789612192204784076741162
02598379558248660340254801433826543073346880809175
33794003463907669978751710212827335152650286849811
73373300353615348808238739424609173004246887262560
12109804136735339867925129597497616688796356759848
54311863756767089107147427840772086928447453283837
70900225008489928559780170100832519186908501709113
03429578260203366213647163963514273177141793258212
70129903691271912759011710088979532621678654470430
81224819170877249988068180433429813000194708364122
65880853306383812583157641813642029350388222120649
08993143533172049134282598134601427505321082386102
96641697114646047681037048275294879590675546238961
22511345901203153781514214990931579674719005995452
69360291389396593588517951759097436505189999310858
97899228473407418165051239458663407082579219063351
69221909938428450917040668276528126542834183887723
37308687968877323296188638808928460012302773700078
70065837663746381648888008236867292692703324810208
21191152713868278958122711338568954056108558339496
39688210557308597484533729528356864688107193747735
65682131726787955429457687066390524734027375900484
94014381838353463379587602239201589869366921779214
00650533231084382211721311712842354720530958884987
55043951760209301641240570251935929483202398985064
01127949408135378762836687221506379804091561420489
95319428463913516967083275489061836976589369361669
92232599237862885826800721062057774065285577658956
08567726912177628446395314028140718885884417469719
40995239084087377128760976350345980702249228280657
48985114060542624187997015459894041486547255798132
59016156893986505363351455934516022571657000511969
63364078453953442051819839062916836973503211443235
36474735357860570085136285632030631500478179833304
99800682580344897169351621775022943484116507698528
54499366926348099904655866479825346766284305965206
58548938005306528588644226075291218639460859881010
53843832526089853965121985896567112431020946618961
60263246569884467142995224200214985903732185320564
13645583944220113320571903734419681519647268176440
72268766271760668375452471249796875977741923733307
01446073918712502971204655640711174439878554539601
21956175234574438805654552770685162438998516074279
89283178151266467822868370900746468416658852633006
42510798459220886536081340421882720060398236598449
13841586932985432819518800346922653513675772280557
58466448137203029042196278568835120842164396569247
33319030990329406645032723523253309176296741474039
69491394804908661080074948439992404585193352085053
09330332512851734540875138034948953970107554992837
25469264930630900931579777556892751293755577688575
75357774429395052634349211740548847867692095519205
17699024598176549937631734736458995116976781129272
63202646609659513696443442087992621407581818380904
14744945867220301128198294741475121079329121236086
41323148039824447971931349360127464522185082768744
03180156798758715856884779512012055254508937236140
78939505326048765866117108935218142707910114350460
74353501694827332583764635697133794254802641888996
87913418861391825942398310699443294378064810421916
36641257207248956877756345317415840161437244655168
75304856274379634975849417916789483691875055902865
18398650158930683922703754119022369253524344945915
72155384819819961291221629082483354519614388928289
11148160489016637902881331691919347083264362558491
33955648462626291884858567370226518877896536048408
73744522109056681738413460309253412525678038038964
72501859470776876779209754549293563256277168162177
52667449731255992079828345654966814836569388956343
42904345058260363639446282809052087932690706797000
74240602327232553076811874826485551456431407715195
46095671918646666760239836781094793929785589321896
60105515770494189377077318842984857150688833466597
67873006834384199643085303023369579906932498775902
98298391813274179420482526911798729149720989952114
72503417489527247775311783591486298722278199904522
91317033049722862054204732050563957980743822757495
41170686057631593358748161282948487565036294435952
43802641826408557340015251538440908071972626469253
35669966286640071824184801009495201185896687340306
46711830686618180737245339363712327731476370208255
37354634496570455353470479079995111794315681691296
26684518805748751642847384627370638321580719294749
19778254463611140131322463114379893205517832231772
81658157349479812300869694103735382886808072545111
54980271730008673426149474830482385937373996048689
70676203938236347012289061965532898796528960064177
72592915524362701132166185105888514998283778233792
90683631740502877246805331618901384782387956924470
20430891474567033737676759364696487984277760319017
54124070029194325093600825837936415834980674275196
25567523839068737435232552721973963767167401533932
55164265174152406859802799708331567170008010888109
85633153836876717009088587847684157861719851890067
80601296164015477525961334994818419517562211350086
32568919833666392153524361473386539741964955984720
37448859861264953986984660523954497130466421921294
34352011971909145010038990480537675490804792798960
02335487981061723356671856618419111379350218406183
60014249461759955259245055371126596518713470579754
43833478288862803623021917201984348951536123102836
48317702248444053032752947990790393950632070253139
72349953184406861226132851468607167036974898023131
58647946195859301983184566013688170241533801527247
57011566582912032538695530313579328051739626585030
94812965683415715842324324576448306610483102691986
50122809162200977042104273569907096975843914647362
76686555266544360523628469436056445811575098216374
74650286221778007278738282039500933724220757198566
63808427422727677846423529592696120773721931205951
57552021549753607600855673968249852015294426475521
53999285492450486367556298997990634291469802900197
51054224681502319297181372108063268533489382824589
50809414454870209670524032770690571247423406528636
34733767166335777635943719498773398479154565023031
60854295033213421642837903387899795782579128072673
31503512096661498826616948582559435768175108716922
94224762979998504772149467096685583864190281516967
81779668091097801140992680071084513522266799153637
46461649141077551468131132063337637827362591828848
70277014659501753332070729727520807182116526879225
16203992098883918006142068688375571834634988318331
27687883618850734782939117519577624789959269179476
40872808507667548587249382853072119804227594688696
30698618906631685300257429734614228164160174853134
89712770338297146503773683082925441028484520858091
25796032704398598828302033251004062051314827302245
35015670388517203516426505963764666476551390887086
89180474461807362368873178978001667165892881421488
16167123834546655723492211848290831098097567359535
81721182323815119656265560935653361656815499161813
37807958386329323377296558389531527819719654901224
46058863417452675741010825146473478017543955579942
93030502263028520112745647216501845004319033153126
40371572959976276990660248497949726695076901784395
13662456254441684664490915853787199152311813509376
33218526610085160073867815791780631467048848912653
43577978665784964015551083273542683527148588024446
53500520955111872202196242770734919030521968391768
76189131349176826737907636627825350426345483048015
83056559247046664024511272868516591991130061230889
00768815881536735390908055571891184596448949113205
38012098809875184013749087806248011864416900847625
89290233796121789219019136221872572530277024302878
13598322471103754045955640650664405395216317543343
76356257437836390341758928019613119394530675826328
81699566127350927274329031500791260707239914957174
72342249190062454784782496508606891242518376613090
89810037493784203099325680519186496917658479437634
62474815362448251341106923834255537054760626728367
84353501846445608591994708782490713452600754620212
37654458397397721985685664235339024412284487030078
81366561248579779108938180966300065805899231548006
40281431634764708308767699284267674863099374345555
14448145668824159080936881969206315703386025432855
34122191331935947544223496910646305888272768507449
57407678258447637704095407352062349994042470258984
80500745308544533680238231870524495823289886265644
45483339843093524183279678300528876209455482159209
81025371405999175784413871996829844219531364845461
72478302176269972592695347623168082186092141710963
69966742354198221603563833915170134203447117943290
22675807314744316867133134879664829520234325266411
54874433858728761623211243214927783400558098748738
32413005042751326844407910029374506542074345413639
92806155664034624944129299554544315672751326958262
22958629783603884369424985934080781181681534347061
66252353598635687448737089275281731808347458828818
81798108175504255176583933602675803276606597052890
22401556052554902399988250724615836510691518810625
19396468323865173197592819431166745335939340346029
42406877883634318008197456920759904661940295501974
41351516607613670668265669570800796898495300707479
85746577660570305353376394163675575724153525786910
72101785199939424200395541386155250427435810475404
74266307925462192126166099869832098095253680036965
59071509487671036599777983368361311057929815125953
03576629595660226203279779989568868521030237646474
85882097407730271561694861947452566776214029789052
64707296525947137093271267011937747113840860616968
69663548225972442703314913144011499100926569149203
79129503359479838920742529432704431262443617230949
91813796420707303625383578796693661964853048148509
85738217203512564120768025383484445153971635623430
74971994537849060453745299007465225973424539915610
56004083714384400768047255172493253205806162075641
27165266908610075409838165767502239507666797334818
04200946983934751625339660004763238002545306283362
37620552930583636902933504511521335469194932829108
47433207804468636210836676395788367169193401870353
66281527039401611108473953726486066451730262680605
30951657640772770504057469408853154978103950989430
12288969060965013714183010710246952729794637740634
74650649505427764478357727163654262729228469677452
53866472835271598739571911536260545809120793301375
94385061132511498328259666284170774982319343715698
62596091372657528693640001255925324053452604625210
02192134608756468302251329172661133599304107462656
76421345618822019839480230808737713008574851995557
42902025967654909244725979460267807505055355544577
88183455568184623716406738484674881216810438947129
42424253822337942342396684870143701961847122730006
97527463110809874668231405700676923117228143434800
79705259561363371191589216659596460387088282208647
46659166755140183778373106734797631382064700839797
23608865167350848607794665023847421547511130680349
21836254329898910151704652266789364917284374866172
21409415510005400095663128111630786452888317869078
86705623439746521289503824460489905596734600860962
65869115157904993399593615143266963994605010126303
65264851279569995793606443577504674189470139948827
80732701526257819098598077288033523150930630993640
28554012527540423824478945178779671260604681648840
79243261254097596633683779443753577000979402674481
55411218315026598694889295524421405575254221786788
12689568960142588606563462298201550569280235135408
91551869416490705739820870162793930431835953020411
75538916881133186575549973802728942306922816124893
84328618794345894041147858850703842815435695955268
54604636601831720513984432052505883454118854211409
62277433851221051432369107658497370446927143429634
32741216798855689801402749961021214489079686952173
74816330003249658220010815208739003557117090058915
39260941647522717101001822461469198454404259691368
11160528258042790396182432982255302576424018487611
82108811718205167340130206882625265533608582477936
40996990541316946145071012448731864534347341565881
45952426468147168769076238720670076693996518128972
17661485695611456289875945575536246110132460342695
13949813045952981045256199502552725006062775090950
21112996278631717344378151528446608295456563985535
02863219231501970059165209018705673346441956078354
10978596286215047498049495879254996517157922009547
13467505279056362907045367065463054391330939433419
32561753337708995115018216500103133436822925515528
72470951859706413346437315061034278142063895247807
79638191708822808641043224144057548480066262263622
84865108600166566552198738373021273054540786632857
10237864460334373105775448913780176468769360231405
44809965105445004587746851390928690562100862404738
25351078769295888504866076475066807117317619797258
70216271969664859447019374500056442787633659026784
88468325873593027240906584026524160395492209392478
24998320213447280415528464974717582766127914504769
78394726173591548842520880672777589569857846665575
09359090955333626345267284481392018113699807502917
78270693692625828316377680602959448213821780078202
64095567581992141153215589034041050436693161866352
03375197313604619175053865815748662228641715403512
47176917247555655158933344596036086599469086198096
32887651893720128619987149952015200673431254965712
78516247107068007704240582566192894463092650904663
67920658737849797778283817394980396961898938022447
74033643867327139927264647335898116535609813029651
61318661546436639774687265777784346842026627963848
57747240127921180120799971167234434034055984879310
06316787429449101749577889555663153277445508121297
97893231849524617938827213409843654638497892446504
73899879967895057997435232065101118524861572857845
15334562067709732148950491306421966867582410017358
99912362374319420643955424880431537060598320632545
01556148893822628560302757136541039243390348268825
92297721155902090201990832611774621362130994392758
37332612891833890592973681544564182393712211047159
32804246255929723935396648344357182519336263026317
83406631373471407037905149014119097283352862388764
81302570090572565407034180269740775532616223613864
46417948717813908193216479439000687756820066709452
28599718946445752303322392055083844853454302374759
20681171563187929477821332835065499926274436773936
94186415342682904785931018063421143017613836359930
01007511264125201640231577870379328925877581650481
24708409654308131745014606613308037095758248803838
53703094351328499011986696617756357203317205482638
98018585859130767851534607651147641778753812085838
96509166738189451920411415869541186904652986515240
42746288863339499157257060527793168087082446034471
43167833456601302629965035077092142438499381068287
10596669075348236542190810034528337520828595848843
51797054043516394986261271860645371135668378744705
74439684442037114924923886084313957103406519224407
47022132643598215293327338839416343363171329200897
09730771857207311526370010430111977075941812602307
72376014916565189341889310193270446357645932285406
51675603176960581128605870085156306964338652918116
47470819414013086540664201041699067627359002476183
97441734031754830743452227112683715049815859152908
02862983168967228070833913519337198385661136554339
62851438948135974452279480136078914693182156818148
16297396478610087712467686268064886982681725715164
24929395190591315386573936767998999968870705845576
08893755357532620316432446928838057972884766228974
10131776538813050682735963180571500151178162311474
69990036265878868229927231767265567356432956517643
59103619542851170727473847169547858711173803896624
78801571561070123296147380608747007717085842780541
44314696513348875950813445481618617390462875640560
14223487405828994564351656056960117293612995285025
95365239779744792189608943593074512316856916984777
96309674486800136913387853328160702324267484311332
24204455544008718250784737266968027073730861657613
29014341798367958342778305666662518827856377071193
54004210396531585096287782401454883328134534630356
21887634170142130804926549347082096161768826532124
20946016310337676277784171017740238812684225399008
58075983464299660971438560230143092504564484906894
33859515325280873687522552440873389822259898572887
08616952425456830986513532384167684218033165913388
75193709124723078541516745612984826500055683548584
17852290843312089512362627423048016086699676471637
35147854111012464519653717464329869869173930623135
45980351110901360322158544586508241657603119286683
49558826020916663125429973494194055461581225633949
80703308633040999177305264352224174430826697774318
76369465566079898997510774644419064529504930651379
13908840809054542426918377991067744703455451227335
16122974857001011282482364400153502188778628986672
52014237402183844341392004432201798203108406472425
35265458437883895286526648490617313714847219300499
14728265224929738624430756990275707141233922070270
33003862028855177775324754018761369511084610482939
99966110560450228658375682663588269830548281134843
48593533341354255022753142468562466603367442116939
49111012266149800196830812162487975146084789262372
22250383995968198718352419711082802784003163572887
61976823017819013623813583857687425342963348092655
82299549543848306228309087395785215163121420804111
85414674824399818902039299598141119367854509189125
82509411930329585586279087316095471630032607570795
41798965557328074489221493475705896364007850102309
98608982137394898962877566829533222382120121277885
46286354339766910445850887107156199679184646499267
51994585788293143788095067363774602774785316545938
80586516906808069516586376041874835414910416438046
98020384720894486533059556453463277404919782891158
67249118745843018273227887282763135556994164394750
79236624098209838711196489146565110755004581971099
36519894915141581119449211885137869996513445924335
74179788395256600003697852315986680131174264650635
55422498449995716648520919145951012273733741715265
61886397525516905954642193344472414059672799512206
51343669584112913727903560817212679255453578098847
29619778656382752283703523073380043645305772938365
71382902731985761581552942189374425534261116939281
46939822047487536381747440387576628596190610523132
62526398714808427005031359405815456065589000738436
84875317248057834596408531829111584209126136971880
29730134243256309343213621070504158874966417544034
40637823514645022279848007360189376647467363278724
44164801997811912761889162074192834565275689346835
60397590580503621377726704540986029025118559388222
08650976977926409452158196552468257766254708041097
17625576428249908865033360778907337875043811148414
66965978224351886674427982057073646781057580910376
82496118558637199155904663531858354485468312877361
59016065276103471553526718665313781441486352447115
75933405758127776214650925846732615829278713641448
78065303195660876027013972950344124525648658281202
98644663126368482770781755280015743411976115069754
58885197224051756810220210344457105752159620520010
58901976514020860989473496121460553667610009208155
60641902463244064221782103519393453021339137535920
18219814516400137820973109685531195344121241497589
32648947132549657653880734314031250318397656798122
67800306483529552691880979740190819931928334439138
31036854955282952306101854660968166155042764270651
02479337471160875801455244803852479812754986345246
26439730846216309598240053255131870073015249288288
44150272316980035035371688072705937991233573103320
76810393162230532462449856153185470499629581779256
29737290506005000054376861872612136884705332860904
26237702503255705858711275434358903547610345778147
56042042166725129709808043190716588723380785336603
22966448060374352642763233712959166808891103525594
17462453611474123311826492400534657790529695637970
14940034362660479559404460065139243888015015079321
43881759139711791807591446947442401651321344563206
38703939480221794666579368899985060132503516636784
54849052507185881787900024915433075335630318016881
19535355859169933891361128774659358105685232413330
18206162177873663026713933468489708054602754322993
09338706902030891605919350179098941371298507942794
63178175281493035748693516046930020542835004496284
12437420573192673009625527488432138708155169004508
33487458990723468801118556524546582638316269527209
27569251690991029393597687960780818916005205512001
67977712443572110745259310850411512558482821094977
86737098252380505747502563657220320175399483837098
94871855977863419223979999039165469590933447472735
64390946693354609937656222506242038767369337494021
99273573670542945658981146062475248967563610321708
62760505309923165497906143508759602888497331721652
29370337857080366232139446761828793609673433032334
32585557536041970672179308034877273668488251635719
14432495327697411636009656282211309130203882825069
26676570351832671688238787401929833790450599655059
71332476576201495740387143194066286128191290185098
78041988386520042018089673233535007442736556268111
24565699383811116389175299087394994079229065025901
09932362839121349567025369695891762635747444753948
11323396655839159294220494170850891175875434376079
00912177562705938184442800001221268949974532769905
85186469046165318475130195596186195695693301361574
13092936825139939155744166752791806298991315142756
04751619971580109933777371206938349703659450545917
99086435789023264890032497734833661139947200260743
05805938712242853874676609731910812703733209567890
91914646607426909877903750638235352500143012922346
84207321742378981403305477034941755260627158862866
39547771864848075851449765843196982379504784274547
10166411077534765361442880635680994357957101057192
74648013603032719711174406107922040006156900695275
92189653858205255927880288280372077585984295829915
70356175070958821709085211649615848293841308860736
89513009850391223097508490164263520408717702361215
96530251705369377644672577502898564911413385213577
59655576051348180738326237828923361030508930002718
76498693801052511910466326786799575951334548987940
94767081107273610897653963259507032202194746026667
53556501284564352923685161312692205113923111965659
23809253379322987640218108966330037544515321606450
76565942988897950720262586988263749869310709817026
33625926769445884564762341943691593418228788258329
66608714346918244407463976272653026631998188220588
63943436931229964879033230091279601828273066188517
38227218390715490370390317107411134038467560864087
09529898615806032736061553938080612875467642224957
58609812612669466849788545193495345754592869245291
04827616552643571689209316367953691428498500020434
18738942352565376594741749369768853312898488149905
59914195951380269708251544052057214524548773659044
84797633817530336049853924925315149276545906051913
70313566666349292144343487834912587102830709775895
86302329433440274220160031259093667954707442785996
09895426614987890087606066195547820996024984086973
40197915444342987229441664529015714679475579971637
96345053175305295311487784317564157202424808898125
74030925458897973764040363008447298817079317321949
35828013381138804774850874697541473568955917253382
90875162194816133883653251152435404821931403571934
23083405143009825506375030953405042506624062447165
17113728845547477715226484387676705672853467409579
77663621447167449425124941685624394269650531262340
06717273036326293435607524697042540723616179395393
33893301349198923975607651537354928215382694266011
41757508273896892391736574632063784909558019481334
00865655804108829123890977650354951367777973842170
75868014247898823178741452147122239560266142666409
52374887226747227761434033018712705272938680809111
85289596582679848237670101390122656182959680517361
46076619939139036787876060299301996426764725427403
64011297421238171255442319681637731281152043413431
85060894391599524452878044334616995055050095307491
88403737341503747816446538950384710455426239295288
28998531209709108913722340683342671325423513366061
63474488522182030819462026736771976453958939935828
32050459497790182712782471389976977879426526570877
30339646201949991550918186788970663139748645058559
72718469140463791494759075349756092667385993283854
20778335173462088786041537266587943086744729710814
62579154103447702796046390902553975412328182470276
13052411058118309311147944260478608

steps and then halts!

Of course, I’m just kidding when I say this was simple. The machine is easy enough to describe, but proving it takes exactly this long to run takes real work! You can read about such proofs here:

• Pascal Michel, The Busy Beaver Competition: a historical survey.

I don’t understand them very well. All I can say at this point is that many of the record-holding machines known so far are similar to the famous Collatz conjecture. The idea there is that you can start with any positive integer and keep doing two things:

• if it’s even, divide it by 2;

• if it’s odd, triple it and add 1.

The conjecture is that this process will always eventually reach the number 1. Here’s a graph of how many steps it takes, as a function of the number you start with:

Nice pattern! But this image shows how it works for numbers up to 10 million, and you’ll see it doesn’t usually take very long for them to reach 1. Usually less than 600 steps is enough!

So, to get a Turing machine that takes a long time to halt, you have to take this kind of behavior and make it much more long and drawn-out. Conversely, to analyze one of the potential winners of the Busy Beaver Game, people must take that long and drawn-out behavior and figure out a way to predict much more quickly when it will halt.

Next, BB(7). In 2014, someone who goes by the name Wythagoras showed that

$\displaystyle{ \textrm{BB}(7) > 10^{10^{10^{10^{10^7}}}} }$

It’s fun to prove lower bounds on BB(N). For example, in 1964 Milton Green constructed a sequence of Turing machines that implies

$\textrm{BB}(2N) \ge 3 \uparrow^{N-2} 3$

Here I’m using Knuth’s up-arrow notation, which is a recursively defined generalization of exponentiation, so for example

$\textrm{BB}(10) \ge 3 \uparrow^{3} 3 = 3 \uparrow^2 3^{3^3} = 3^{3^{3^{3^{\cdot^{\cdot^\cdot}}}}}$

where there are $3^{3^3}$ threes in that tower.

But it’s also fun to seek the smallest N for which we can prove BB(N) is unknowable! And that’s what people are making lots of progress on right now.

Sometime in April 2016, Adam Yedidia and Scott Aaronson showed that BB(7910) cannot be determined using the widely accepted axioms for math called ZFC: that is, Zermelo—Fraenkel set theory together with the axiom of choice. It’s a great story, and you can read it here:

• Scott Aaronson, The 8000th Busy Beaver number eludes ZF set theory: new paper by Adam Yedidia and me, Shtetl-Optimized, 3 May 2016.

• Adam Yedidia and Scott Aaronson, A relatively small Turing machine whose behavior is independent of set theory, 13 May 2016.

Briefly, Yedidia created a new programming language, called Laconic, which lets you write programs that compile down to small Turing machines. They took an arithmetic statement created by Harvey Friedman that’s equivalent to the consistency of the usual axioms of ZFC together with a large cardinal axiom called the ‘stationary Ramsey property’, or SRP. And they created a Turing machine with 7910 states that seeks a proof of this arithmetic statement using the axioms of ZFC.

Since ZFC can’t prove its own consistency, much less its consistency when supplemented with SRP, their machine will only halt if ZFC+SRP is inconsistent.

Since most set theorists believe ZFC+SRP is consistent, this machine probably doesn’t halt. But we can’t prove this using ZFC.

In short: if the usual axioms of set theory are consistent, we can never use them to determine the value of BB(7910).

The basic idea is nothing new: what’s new is the explicit and rather low value of the number 7910. Poetically speaking, we know the unknowable starts here… if not sooner.

However, this discovery set off a wave of improvements! On the Metamath newsgroup, Mario Carneiro and others started ‘logic hacking’, looking for smaller and smaller Turing machines that would only halt if ZF—that is, Zermelo–Fraenkel set theory, without the axiom of choice—is inconsistent.

By just May 15th, Stefan O’Rear seems to have brought the number down to 1919. He found a Turing machine with just 1919 states that searches for an inconsistency in the ZF axioms. Interestingly, this turned out to work better than using Harvey Friedman’s clever trick.

Thus, if O’Rear’s work is correct, we can only determine BB(1919) if we can determine whether ZF set theory is consistent. However, we cannot do this using ZF set theory—unless we find an inconsistency in ZF set theory.

For details, see:

• Stefan O’Rear, A Turing machine Metamath verifier, 15 May 2016.

I haven’t checked his work, but it’s available on GitHub.

What’s the point of all this? At present, it’s mainly just a game. However, it should have some interesting implications. It should, for example, help us better locate the ‘complexity barrier’.

I explained that idea here:

• John Baez, The complexity barrier, Azimuth, 28 October 2011.

Briefly, while there’s no limit on how much information a string of bits—or any finite structure—can have, there’s a limit on how much information we can prove it has!

This amount of information is pretty low, perhaps a few kilobytes. And I believe the new work on logic hacking can be used to estimate it more accurately!

El Niño Project (Part 5)

12 July, 2014

And now for some comic relief.

Last time I explained how to download some weather data and start analyzing it, using programs written by Graham Jones. When you read that, did you think “Wow, that’s easy!” Or did you think “Huh? Run programs in R? How am I supposed to do that?”

If you’re in the latter group, you’re like me. But I managed to do it. And this is the tale of how. It’s a blow-by-blow account of my first steps, my blunders, my fears.

I hope that if you’re intimidated by programming, my tale will prove that you too can do this stuff… provided you have smart friends, or read this article.

• use R to create a file of temperature data for a given latitude/longitude rectangle for a given time interval.

I will not attempt to explain how to program in R.

If you want to copy what I’m doing, please remember that a few details depend on the operating system. Since I don’t care about operating systems, I use a Windows PC. If you use something better, some details will differ for you.

Also: at the end of this article there are some very basic programming puzzles.

First, let me explain a bit about my relation to computers.

I first saw a computer at the Lawrence Hall of Science in Berkeley, back when I was visiting my uncle in the summer of 1978. It was really cool! They had some terminals where you could type programs in BASIC and run them.

I got especially excited when he gave me the book Computer Lib/Dream Machines by Ted Nelson. It espoused the visionary idea that people could write texts on computers all around the world—“hypertexts” where you could click on a link in one and hop to another!

I did more programming the next year in high school, sitting in a concrete block room with a teletype terminal that was connected to a mainframe somewhere far away. I stored my programs on paper tape. But my excitement gradually dwindled, because I was having more fun doing math and physics using just pencil and paper. My own brain was more easy to program than the machine. I did not start a computer company. I did not get rich. I learned quantum mechanics, and relativity, and Gödel’s theorem.

Later I did some programming in APL in college, and still later I did a bit in Mathematica in the early 1990s… but nothing much, and nothing sophisticated. Indeed, none of these languages would be the ones you’d choose to explore sophisticated ideas in computation!

I’ve just never been very interested… until now. I now want to do a lot of data analysis. It will be embarrassing to keep asking other people to do all of it for me. I need to learn how to do it myself.

Maybe you’d like to do this stuff too—or at least watch me make a fool of myself. So here’s my tale, from the start.

To use the programs written by Graham, I need to use R, a language currently popular among statisticians. It is not the language my programmer friends would want me to learn—they’d want me to use something like Python. But tough! I can learn that later.

I click the big fat button on top saying

and get asked to save a file R-3.1.0-win.exe. I save it in my Downloads folder; it takes a while to download since it was 57 megabytes. When I get it, I click on it and follow the easy default installation instructions. My Desktop window now has a little icon on it that says R.

Clicking this, I get an interface where I can type commands after a red

>

symbol. Following Graham’s advice, I start by trying

> 2^(1:8)

which generates a list of powers of 2 from 21 to 28, like this:

[1] 2 4 8 16 32 64 128 256

Then I try

> mean(2^(1:8))

which gives the arithmetic mean of this list. Somewhat more fun is

> plot(rnorm(20))

which plots a bunch of points, apparently 20 standard normal deviates.

When I hear “20 standard normal deviates” I think of the members of a typical math department… but no, those are deviants. Standard normal deviates are random numbers chosen from a Gaussian distribution of mean zero and variance 1.

To do something more interesting, I need to input data.

The papers by Ludescher et al use surface air temperatures in a certain patch of the Pacific, so I want to get ahold of those. They’re here:

NCEP is the National Centers for Environmental Prediction, and NCAR is the National Center for Atmospheric Research. They have a bunch of files here containing worldwide daily average temperatures on a 2.5 degree latitude × 2.5 degree longitude grid (that’s 144 × 73 grid points), from 1948 to 2010. And if you go here, the website will help you get data from within a chosen rectangle in a grid, for a chosen time interval.

These are NetCDF files. NetCDF stands for Network Common Data Form:

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

According to my student Blake Pollard:

I know about ftp: I’m so old that I know this was around before the web existed. Back then it meant “faster than ponies”. But I need to get R to accept data from these NetCDF files: that’s what scares me!

Graham said that R has a “package” called RNetCDF for using NetCDF files. So, I need to get ahold of this package, download some files in the NetCDF format, and somehow get R to eat those files with the help of this package.

At first I was utterly clueless! However, after a bit of messing around, I notice that right on top of the R interface there’s a menu item called Packages. I boldly click on this and choose Install Package(s).

I am rewarded with an enormous alphabetically ordered list of packages… obviously statisticians have lots of stuff they like to do over and over! I find RNetCDF, click on that and click something like “OK”.

I’m asked if I want to use a “personal library”. I click “no”, and get an error message. So I click “yes”. The computer barfs out some promising text:

trying URL 'http://cran.stat.nus.edu.sg/bin/windows/contrib/3.1/RNetCDF_1.6.2-3.zip'
Content type 'application/zip' length 548584 bytes (535 Kb)
opened URL

package ‘RNetCDF’ successfully unpacked and MD5 sums checked

Success!

But now I need to figure out how to download a file and get R to eat it and digest it with the help of RNetCDF.

At this point my deus ex machina, Graham, descends from the clouds and says:

You can download the files from your browser. It is probably easiest to do that for starters. Put
ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.dailyavgs/surface/
into the browser, then right-click a file and Save link as…

for (year in 1950:1979) {
}

It will put them into the “working directory”, probably C:\Users\JOHN\Documents. You can find the working directory using getwd(), and change it with setwd(). But you must use / not \ in the filepath.

Compared to UNIX, the Windows operating system has the peculiarity of using \ instead of / in path names, but R uses the UNIX conventions even on Windows.

So, after some mistakes, in the R interface I type

> setwd("C:/Users/JOHN/Documents/My Backups/azimuth/el nino")

and then type

> getwd()

to see if I’ve succeeded. I’m rewarded with

[1] "C:/Users/JOHN/Documents/My Backups/azimuth/el nino"

Good!

Then, following Graham’s advice, I cut-and-paste this into the R interface:

for (year in 1950:1979) {
}

It seems to be working! A little bar appears showing how each year’s data is getting downloaded. It chugs away, taking a couple minutes for each year’s worth of data.

Using R to process NetCDF files

Okay, now I’ve got all the worldwide daily average temperatures on a 2.5 degree latitude × 2.5 degree longitude grid from 1950 to 1970.

The world is MINE!

But what do I do with it? Graham’s advice is again essential, along with a little R program, or script, that he wrote:

The R script netcdf-convertor.R from

https://github.com/azimuth-project/el-nino/tree/master/R

will eat the file, digest it, and spit it out again. It contains instructions.

I go to this URL, which is on GitHub, a popular free web-based service for software development. You can store programs here, edit them, and GitHub will help you keep track of the different versions. I know almost nothing about this stuff, but I’ve seen it before, so I’m not intimidated.

I click on the blue thing that says netcdf-convertor.R and see something that looks like the right script. Unfortunately I can’t see how to download it! I eventually see a button I’d overlooked, cryptically labelled “Raw”. I realize that since I don’t want a roasted or oven-broiled piece of software, I should click on this. I indeed succeed in downloading netcdf-convertor.R this way. Graham later says I could have done something better, but oh well. I’m just happy nothing has actually exploded yet.

Once I’ve downloaded this script, I open it using an text processor and look at it. At top are a bunch of comments written by Graham:

######################################################
######################################################

# You should be able to use this by editing this
# section only.

setwd("C:/Users/Work/AAA/Programming/ProgramOutput/Nino")

lat.range <- 13:14
lon.range <- 142:143

firstyear <- 1957
lastyear <- 1958

outputfilename <- paste0("Scotland-", firstyear, "-", lastyear, ".txt")

######################################################
######################################################

# Explanation

# 1. Use setwd() to set the working directory
# to the one containing the .nc files such as
# air.sig995.1951.nc.
# Example:
# setwd("C:/Users/Work/AAA/Programming/ProgramOutput/Nino")

# 2. Supply the latitude and longitude range. The
# NOAA data is every 2.5 degrees. The ranges are
# supplied as the number of steps of this size.
# For latitude, 1 means North Pole, 73 means South
# Pole. For longitude, 1 means 0 degrees East, 37
# is 90E, 73 is 180, 109 is 90W or 270E, 144 is
# 2.5W.

# These roughly cover Scotland.
# lat.range <- 13:14
# lon.range <- 142:143

# These are the area used by Ludescher et al,
# 2013. It is 27x69 points which are then
# subsampled to 9 by 23.
# lat.range <- 24:50
# lon.range <- 48:116

# 3. Supply the years
# firstyear <- 1950
# lastyear <- 1952

# 4. Supply the output name as a text string.
# paste0() concatenates strings which you may find
# handy:
# outputfilename <- paste0("Pacific-", firstyear, "-", lastyear, ".txt")

######################################################
######################################################

# Example of output

# S013E142 S013E143 S014E142 S014E143
# Y1950P001 281.60000272654 281.570002727211 281.60000272654 280.970002740622
# Y1950P002 280.740002745762 280.270002756268 281.070002738386 280.49000275135
# Y1950P003 280.100002760068 278.820002788678 281.120002737269 280.070002760738
# Y1950P004 281.070002738386 279.420002775267 281.620002726093 280.640002747998
# ...
# Y1950P193 285.450002640486 285.290002644062 285.720002634451 285.75000263378
# Y1950P194 285.570002637804 285.640002636239 286.070002626628 286.570002615452
# Y1950P195 285.92000262998 286.220002623275 286.200002623722 286.620002614334
# ...
# Y1950P364 276.100002849475 275.350002866238 276.37000284344 275.200002869591
# Y1950P365 276.990002829581 275.820002855733 276.020002851263 274.72000288032
# Y1951P001 278.220002802089 277.470002818853 276.700002836064 275.870002854615
# Y1951P002 277.750002812594 276.890002831817 276.650002837181 275.520002862439
# ...
# Y1952P365 280.35000275448 280.120002759621 280.370002754033 279.390002775937

# There is one row for each day, and 365 days in
# each year (leap days are omitted). In each row,
# you have temperatures in Kelvin for each grid
# point in a rectangle.

# S13E142 means 13 steps South from the North Pole
# and 142 steps East from Greenwich. The points
# are in reading order, starting at the top-left
# (Northmost, Westmost) and going along the top
# row first.

# Y1950P001 means year 1950, day 1. (P because
# longer periods might be used later.)

######################################################
######################################################

The instructions are admirably detailed concerning what I should do, but they don't say where the output will appear when I do it. This makes me nervous. I guess I should just try it. After all, the program is not called DestroyTheWorld.

Unfortunately, at this point a lot of things start acting weird.

It's too complicated and boring to explain in detail, but basically, I keep getting a file missing error message. I don't understand why this happens under some conditions and not others. I try lots of experiments.

Eventually I discover that one year of temperature data failed to download—the year 1949, right after the first year available! So, I'm getting the error message whenever I try to do anything involving that year of data.

To fix the problem, I simply download the 1949 data by hand from here:

(You can open ftp addresses in a web browser just like http addresses.) I put it in my working directory for R, and everything is fine again. Whew!

By the time things I get this file, I sort of know what to do—after all, I've spent about an hour trying lots of different things.

I decide to create a file listing temperatures near where I live in Riverside from 1948 to 1979. To do this, I open Graham's script netcdf-convertor.R in a word processor and change this section:

setwd("C:/Users/Work/AAA/Programming/ProgramOutput/Nino")
lat.range <- 13:14
lon.range <- 142:143
firstyear <- 1957
lastyear <- 1958
outputfilename <- paste0("Scotland-", firstyear, "-", lastyear, ".txt")

to this:

setwd("C:/Users/JOHN/Documents/My Backups/azimuth/el nino")
lat.range <- 23:23
lon.range <- 98:98
firstyear <- 1948
lastyear <- 1979
outputfilename <- paste0("Riverside-", firstyear, "-", lastyear, ".txt")

Why? Well, I want it to put the file in my working directory. I want the years from 1948 to 1979. And I want temperature data from where I live!

Googling the info, I see Riverside, California is at 33.9481° N, 117.3961° W. 34° N is about 56 degrees south of the North Pole, which is 22 steps of size 2.5°. And because some idiot decided everyone should count starting at 1 instead of 0 even in contexts like this, the North Pole itself is step 1, not step 0… so Riverside is latitude step 23. That's why I write:

lat.range <- 23:23

Similarly, 117.5° W is 242.5° E, which is 97 steps of size 2.5°… which counts as step 98 according to this braindead system. That's why I write:

lon.range <- 98:98

Having done this, I save the file netcdf-convertor.R under another name, Riverside.R.

And then I do some stuff that it took some fiddling around to discover.

First, in my R interface I go to the menu item File, at far left, and click on Open script. It lets me browse around, so I go to my working directory for R and choose Riverside.R. A little window called R editor opens up in my R interface, containing this script.

I'm probably not doing this optimally, but I can now right-click on the R editor and see a menu with a choice called Select all. If I click this, everything in the window turns blue. Then I can right-click again and choose Run line or selection. And the script runs!

Voilà!

It huffs and puffs, and then stops. I peek in my working directory, and see that a file called

Riverside.1948-1979.txt

has been created. When I open it, it has lots of lines, starting with these:

S023E098
Y1948P001 279.95
Y1948P002 280.14
Y1948P003 282.27
Y1948P004 283.97
Y1948P005 284.27
Y1948P006 286.97

As Graham promised, each line has a year and day label, followed by a vector… which in my case is just a single number, since I only wanted the temperature in one location. I’m hoping this is the temperature near Riverside, in kelvin.

A small experiment

To see if this is working, I’d like to plot these temperatures and see if they make sense. Unfortunately I have no idea how to get R to take a file containing data of the sort I have and plot it! I need to learn how, but right now I’m exhausted, so I use another method to get the job done— a method that’s too suboptimal and embarrassing to describe here. (Hint: it involves the word “Excel”.)

I do a few things, but here’s the most interesting one—namely, not very interesting. I plot the temperatures for 1963:

I compare it to some publicly available data, not from Riverside, but from nearby Los Angeles:

As you can see, there was a cold day on January 13th, when the temperature dropped to 33°F. That seems to be visible on the graph I made, and looking at the data from which I made the graph, I see the temperature dropped to 251.4 kelvin on the 13th: that’s -7°F, very cold for here. It does get colder around Riverside than in Los Angeles in the winter, since it’s a desert, with temperatures not buffered by the ocean. So, this does seem compatible with the public records. That’s mildly reassuring.

But other features of the graph don’t match, and I’m not quite sure if they should or not. So, all this very tentative and unimpressive. However, I’ve managed to get over some of my worst fears, download some temperature data, and graph it! Now I need to learn how to use R to do statistics with this data, and graph it in a better way.

Puzzles

You can help me out by answering these puzzles. Later I might pose puzzles where you can help us write really interesting programs. But for now it’s just about learning R.

Puzzle 1. Given a text file with lots of lines of this form:

S023E098
Y1948P001 279.95
Y1948P002 280.14
Y1948P003 282.27
Y1948P004 283.97

write an R program that creates a huge vector, or list of numbers, like this:

279.95, 280.14, 282.27, 283.97, ...

Puzzle 2: Extend the above program so that it plots this list of numbers, or outputs it to a new file.

If you want to test your programs, here’s the actual file:

More puzzles

If those puzzles are too easy, here are two more. I gave these last time, but everyone was too wimpy to tackle them.

Puzzle 3. Modify the software so that it uses the same method to predict El Niños from 1980 to 2013. You’ll have to adjust two lines in netcdf-convertor-ludescher.R:

firstyear <- 1948
lastyear <- 1980

should become

firstyear <- 1980
lastyear <- 2013

or whatever range of years you want. You’ll also have to adjust names of years in ludescher-replication.R. Search the file for the string 19 and make the necessary changes. Ask me if you get stuck.

Puzzle 4. Right now we average the link strength over all pairs $(i,j)$ where $i$ is a node in the El Niño basin defined by Ludescher et al and $j$ is a node outside this basin. The basin consists of the red dots here:

What happens if you change the definition of the El Niño basin? For example, can you drop those annoying two red dots that are south of the rest, without messing things up? Can you get better results if you change the shape of the basin?

To study these questions you need to rewrite ludescher-replication.R a bit. Here’s where Graham defines the El Niño basin:

ludescher.basin <- function() {
lats <- c( 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6)
lons <- c(11,12,13,14,15,16,17,18,19,20,21,22,16,22)
stopifnot(length(lats) == length(lons))
list(lats=lats,lons=lons)
}

These are lists of latitude and longitude coordinates: (5,11), (5,12), (5,13), etc. A coordinate like (5,11) means the little circle that’s 5 down and 11 across in the grid on the above map. So, that’s the leftmost point in Ludescher’s El Niño basin. By changing these lists, you can change the definition of the El Niño basin.

Next time I’ll discuss some criticisms of Ludescher et al’s paper, but later we will return to analyzing temperature data, looking for interesting patterns.

5 February, 2014

Since one goal of Azimuth is to connect mathematicians to projects that can more immediately help the world, I want to pass this on. It’s a press release put out by Jocelyn Paine, who has blogged about applied category theory on the n-Category Café. I think he’s a serious guy, so I hope we can help him out!

Spreadsheet researcher Jocelyn Ireson-Paine has launched an Indiegogo campaign to fund a project to make spreadsheets safer. It will show how to write spreadsheets that are easier to read and less error-prone than when written in Excel. This is important because spreadsheet errors have cost some companies millions of pounds, even causing resignations and share-price crashes. An error in one spreadsheet, an economic model written in 2010 by Harvard economists Carmen Reinhart and Kenneth Rogoff, has even been blamed for tax rises and public-sector cuts. If he gets funding, Jocelyn will re-engineer this spreadsheet. He hopes that, because of its notoriety, this will catch public attention.

Reinhart and Rogoff’s spreadsheet was part of a paper on the association between debt and economic growth. They concluded that in countries where debt exceeds 90% of gross domestic product, growth is notably lower. But in spring 2013, University of Massachusetts student Thomas Herndon found they had omitted data when calculating an average. Because their paper’s conclusion supported governments’ austerity programmes, much criticism followed. They even received hate email blaming them for tax rises and public-sector cuts.

Jocelyn said, “The error probably didn’t change the results much. But better software would have made the nature of the error clearer, as well as the economics calculations, thus averting ill-informed and hurtful media criticism. Indeed, it might have avoided the error altogether.”

Jocelyn’s project will use two ideas. One is “literate programming”. Normally, a programmer writes a program first, then adds comments explaining how it works. But in literate programming, the programmer becomes an essayist. He or she first writes the explanation, then inserts the calculations as if putting equations into a maths essay. In ordinary spreadsheets, you’re lucky to get any documentation at all; in literate spreadsheets, documentation comes first.

The other idea is “modularity”. This means building spreadsheets from self-contained parts which can be developed, tested, and documented independently of one another. This gives the spreadsheet’s author less to think about, making mistakes less likely. It also makes it easier to replace parts that do have mistakes.

Jocelyn has embodied these ideas in a piece of software named Excelsior. He said, “‘Excelsior’ means ‘higher’ in Latin, and ‘upwards!’ in Longfellow’s poem. I think of it as meaning ‘upwards from Excel’. In fact, though, it’s the name of a wonderful Oxford café where I used to work on my ideas.”

Jocelyn also wants to show how advanced maths benefits computing. Some of his inspiration came from a paper he found on a friend’s desk in the Oxford University Department of Computer Science. Written by professor Joseph Goguen, this used a branch of maths called category theory to elucidate what it means for something to be part of a system, and how the behaviour of a system arises from the behaviours of its parts. Jocelyn said, “The ideas in the paper were extremely general, applying to many different areas. And when you think of modules as parts, they even apply to spreadsheets. This shows the value of abstraction.”

For more

Postal: 23 Stratfield Road, Oxford, OX2 7BG, UK.
Tel: 07768 534 091.