I’m always looking for great images, so if you know about one, please tell me about it! If not, you may still enjoy taking a look.
Here are three of my favorite images from that blog, and a bit about the people who created them.
I suspect that these images, and many more on Visual Insight, are all just different glimpses of the same big structure. I have a rough idea what that structure is. Sometimes I dream of a computer program that would let you tour the whole thing. Unfortunately, a lot of it lives in more than 3 dimensions.
Less ambitiously, I sometimes dream of teaming up with lots of mathematicians and creating a gorgeous coffee-table book about this stuff.
This picture drawn by Katherine Stange shows what happens when we apply fractional linear transformations
to the real line sitting in the complex plane, where are Eisenstein integers: that is, complex numbers of the form
where are integers. The result is a complicated set of circles and lines called the ‘Schmidt arrangement’ of the Eisenstein integers. For more details go here.
Katherine Stange did her Ph.D. with Joseph H. Silverman, an expert on elliptic curves at Brown University. Now she is an assistant professor at the University of Colorado, Boulder. She works on arithmetic geometry, elliptic curves, algebraic and integer sequences, cryptography, arithmetic dynamics, Apollonian circle packings, and game theory.
This is the {7,3,3} honeycomb as drawn by Danny Calegari. The {7,3,3} honeycomb is built of regular heptagons in 3-dimensional hyperbolic space. It’s made of infinite sheets of regular heptagons in which 3 heptagons meet at vertex. 3 such sheets meet at each edge of each heptagon, explaining the second ‘3’ in the symbol {7,3,3}.
The 3-dimensional regions bounded by these sheets are unbounded: they go off to infinity. They show up as holes here. In this image, hyperbolic space has been compressed down to an open ball using the so-called Poincaré ball model. For more details, go here.
Danny Calegari did his Ph.D. work with Andrew Casson and William Thurston on foliations of three-dimensional manifolds. Now he’s a professor at the University of Chicago, and he works on these and related topics, especially geometric group theory.
This picture, by Roice Nelson, is another view of the {7,3,3} honeycomb. It shows the ‘boundary’ of this honeycomb—that is, the set of points on the surface of the Poincaré ball that are limits of points in the {7,3,3} honeycomb.
Roice Nelson used stereographic projection to draw part of the surface of the Poincaré ball as a plane. The circles here are holes, not contained in the boundary of the {7,3,3} honeycomb. There are infinitely many holes, and the actual boundary, the region left over, is a fractal with area zero. The white region on the outside of the picture is yet another hole. For more details, and a different version of this picture, go here.
Roice Nelson is a software developer for a flight data analysis company. There’s a good chance the data recorded on the airplane from your last flight moved through one of his systems! He enjoys motorcycling and recreational mathematics, he has a blog with lots of articles about geometry, and he makes plastic models of interesting geometrical objects using a 3d printer.
100,000 years ago, some of my ancestors came out of Africa and arrived in the Middle East. 50,000 years ago, some of them reached Asia. But between those dates, about 70,000 years ago, two stars passed through the outer reaches of the Solar System, where icy comets float in dark space!
One was a tiny red dwarf called Scholz’s star. It’s only 90 times as heavy as Jupiter. Right now it’s 20 light years from us, so faint that it was discovered only in 2013, by Ralf-Dieter Scholz—an expert on nearby stars, high-velocity stars, and dwarf stars.
The other was a brown dwarf: a star so small that it doesn’t produce energy by fusion. This one is only 65 times the mass of Jupiter, and it orbits its companion at a distance of 80 AU.
(An AU, or astronomical unit, is the distance between the Earth and the Sun.)
A team of scientists has just computed that while some of my ancestors were making their way to Asia, these stars passed about 0.8 light years from our Sun. That’s not very close. But it’s close enough to penetrate the large cloud of comets surrounding the Sun: the Oort cloud.
They say this event didn’t affect the comets very much. But if it shook some comets loose from the Oort cloud, they would take about 2 million years to get here! So, they won’t arrive for a long time.
At its closest approach, Scholz’s star would have had an apparent magnitude of about 11.4. This is a bit too faint to see, even with binoculars. So, don’t look for it myths and legends!
As usual, the paper that made this discovery is expensive in journals but free on the arXiv:
• Eric E. Mamajek, Scott A. Barenfeld, Valentin D. Ivanov, Alexei Y. Kniazev, Petri Vaisanen, Yuri Beletsky, Henri M. J. Boffin, The closest known flyby of a star to the Solar System.
It must be tough being a scientist named ‘Boffin’, especially in England! Here’s a nice account of how the discovery was made:
• University of Rochester, A close call of 0.8 light years, 16 February 2015.
The brown dwarf companion to Scholz’s star is a ‘class T’ star. What does that mean? It’s pretty interesting. Let’s look at an example just 7 light years from Earth!
Thanks to some great new telescopes, astronomers have been learning about weather on brown dwarfs! It may look like this artist’s picture. (It may not.)
Luhman 16 is a pair of brown dwarfs orbiting each other just 7 light years from us. The smaller one, Luhman 16B, is half covered by huge clouds. These clouds are hot—1200 °C—so they’re probably made of sand, iron or salts. Some of them have been seen to disappear! Why? Maybe ‘rain’ is carrying this stuff further down into the star, where it melts.
So, we’re learning more about something cool: the ‘L/T transition’.
Brown dwarfs can’t fuse ordinary hydrogen, but a lot of them fuse the isotope of hydrogen called deuterium that people use in H-bombs—at least until this runs out. The atmosphere of a hot brown dwarf is similar to that of a sunspot: it contains molecular hydrogen, carbon monoxide and water vapor. This is called a class M brown dwarf.
But as they run out of fuel, they cool down. The cooler class L brown dwarfs have clouds! But the even cooler class T brown dwarfs do not. Why not?
This is the mystery we may be starting to understand: the clouds may rain down, with material moving deeper into the star! Luhman 16B is right near the L/T transition, and we seem to be watching how the clouds can disappear as a brown dwarf cools. (Its larger companion, Luhman 16A, is firmly in class L.)
Finally, as brown dwarfs cool below 300 °C, astronomers expect that ice clouds start to form: first water ice, and eventually ammonia ice. These are the class Y brown dwarfs. Wouldn’t that be neat to see? A star with icy clouds!
Could there be life on some of these stars?
Caroline Morley regularly blogs about astronomy. If you want to know more about weather on Luhman 16B, try this:
• Caroline Morley, Swirling, patchy clouds on a teenage brown dwarf, 28 February 2012.
She doesn’t like how people call brown dwarfs “failed stars”. I agree! It’s like calling a horse a “failed giraffe”.
For more, try:
• Brown dwarfs, Scholarpedia.
This summer there will be a conference on higher-dimensional algebra and rewrite rules in Warsaw. They want people to submit papers! I’ll give a talk about presentations of symmetric monoidal categories that arise in electrical engineering and control theory. This is part of the network theory program, which we talk about so often here on Azimuth.
There should also be interesting talks about combinatorial algebra, homotopical aspects of rewriting theory, and more:
• Higher-Dimensional Rewriting and Applications, 28-29 June 2015, Warsaw, Poland. Co-located with the RDP, RTA and TLCA conferences. Organized by Yves Guiraud, Philippe Malbos and Samuel Mimram.
Over recent years, rewriting methods have been generalized from strings and terms to richer algebraic structures such as operads, monoidal categories, and more generally higher dimensional categories. These extensions of rewriting fit in the general scope of higher-dimensional rewriting theory, which has emerged as a unifying algebraic framework. This approach allows one to perform homotopical and homological analysis of rewriting systems (Squier theory). It also provides new computational methods in combinatorial algebra (Artin-Tits monoids, Coxeter and Garside structures), in homotopical and homological algebra (construction of cofibrant replacements, Koszulness property). The workshop is open to all topics concerning higher-dimensional generalizations and applications of rewriting theory, including
• higher-dimensional rewriting: polygraphs / computads, higher-dimensional generalizations of string/term/graph rewriting systems, etc.
• homotopical invariants of rewriting systems: homotopical and homological finiteness properties, Squier theory, algebraic Morse theory, coherence results in algebra and higher-dimensional category theory, etc.
• linear rewriting: presentations and resolutions of algebras and operads, Gröbner bases and generalizations, homotopy and homology of algebras and operads, Koszul duality theory, etc.
• applications of higher-dimensional and linear rewriting and their interactions with other fields: calculi for quantum computations, algebraic lambda-calculi, proof nets, topological models for concurrency, homotopy type theory, combinatorial group theory, etc.
• implementations: the workshop will also be interested in implementation issues in higher-dimensional rewriting and will allow demonstrations of prototypes of existing and new tools in higher-dimensional rewriting.
Important dates:
• Submission: April 15, 2015
• Notification: May 6, 2015
• Final version: May 20, 2015
• Conference: 28-29 June, 2015
Submissions should consist of an extended abstract, approximately 5 pages long, in standard article format, in PDF. The page for uploading those is here. The accepted extended abstracts will be made available electronically before the
workshop.
Program committee:
• Vladimir Dotsenko (Trinity College, Dublin)
• Yves Guiraud (INRIA / Université Paris 7)
• Jean-Pierre Jouannaud (École Polytechnique)
• Philippe Malbos (Université Claude Bernard Lyon 1)
• Paul-André Melliès (Université Paris 7)
• Samuel Mimram (École Polytechnique)
• Tim Porter (University of Wales, Bangor)
• Femke van Raamsdonk (VU University, Amsterdam)
Can red dwarf stars have Earth-like planets with life?
This is an important question, at least in the long run, because 80% of the stars in the Milky Way are red dwarfs, even though none are visible to the naked eye. 20 of the 30 nearest stars are red dwarfs! It would be nice to know if they can have planets with life.
Also, red dwarf stars live a long time! They’re small—and the smaller a star is, the longer it lives. Calculations show that a red dwarf one-tenth the mass of our Sun should last for 10 trillion years!
So if life is possible on planets orbiting red dwarf stars—or if life could get there—we could someday have very, very old civilizations. That idea excites me. Imagine what a galactic civilization spanning the 80 billion red dwarfs in our galaxy could do in 10 trillion years!
(No: you can’t imagine it. You don’t have time to think of all the amazing things they could do.)
Let’s start close to home. Proxima Centauri, the nearest star to the Sun, is a red dwarf. If we ever explore interstellar space, we may stop by this star. So, it’s worth knowing a bit about it.
We don’t know if it has planets. But it could be part of a triple star system! The closest neighboring stars, Alpha Centauri A and B, orbit each other every 80 years. One is a bit bigger than the Sun, the other a bit smaller. They orbit in a fairly eccentric ellipse. At their closest, their distance is like the distance from Saturn to the Sun. At their farthest, it’s more like the distance from Pluto to the Sun.
Proxima Centauri is fairly far from both: a quarter of a light year away. That’s about 350 times the distance from Pluto to the Sun! We’re not even sure Proxima Centauri is gravitationally bound to the other stars. If it is, its orbital period could easily exceed 500,000 years.
If Proxima Centauri had an Earth-like planet, there’s a bit of a problem: it’s a flare star.
You see, convection stirs up this star’s whole interior, unlike the Sun. Convection of charged plasma makes strong magnetic fields. Magnetic fields get tied in knots, and the energy gets released through enormous flares! They can become as large as the star itself, and get so hot that they radiate lots of X-rays.
This could be bad for life on nearby planets… especially since an Earth-like planet would have to be very close. You see, Proxima Centauri is very faint: just 0.17% the brightness of our Sun!
In fact many red dwarfs are flare stars, for the same reasons. Proxima Centauri is actually fairly tame as red dwarfs go, because it’s 4.9 billion years old. Younger ones are more lively, with bigger flares.
Proxima Centauri is just 4.24 light-years away. If explore interstellar space it may be a good place to visit. It’s actually getting closer: it’ll come within about 3 light-years of us in roughly 27,000 years, and then drift by. We should take advantage of this and go visit it soon, like in a few centuries!
Gliese 667C is a red dwarf just 1.4% as bright as our Sun. Unremarkable: such stars are a dime a dozen. But it’s famous, because we know it has at least two planets, one of which is quite Earth-like!
This planet, called Gliese 667 Cc, is one of the most Earth-like ones we know today. But it’s weirdly different from our home in many ways. Its mass is 3.8 times that of Earth. It should be a bit warmer than Earth—but dimly lit as seen by our eyes, since most of the light it gets is in the infrared.
Being close to its dim red dwarf star, its year is just 28 Earth days long. But there’s something even cooler about this planet. You can see it in the NASA artist’s depiction above. The red dwarf Gliese 667C is part of a triple star system!
The largest star in this system, Gliese 667 A, is three-quarters the mass of our Sun, but only 12% as bright. It’s an orange dwarf, intermediate between a red dwarf and our Sun, which is considered a yellow dwarf.
The second largest, Gliese 667 B, is also an orange dwarf, only 5% as bright as our sun.
These two orbit each other every 42 years. The red dwarf Gliese 667 C is considerably farther away, orbiting this pair.
What could the planet Gliese 667 Cc be like?
Since a planet needs to be close to a red dwarf to be warm enough for liquid water, such planets are likely to be be tidally locked, with one side facing their sun all the time.
For a long time, this made scientists believe the day side of such a planet would be hot and dry, with all the water locked in ice on the night side, as shown above. People call this a water-trapped world. Perhaps not so good for life!
But a new paper argues that other kinds of worlds are likely too!
In a thin ice waterworld, an ocean covers most of the planet. It’s covered with ice on the night side, maybe 10 meters thick. The day side has open ocean. Ice melts near the edge of the ice, pours into the ocean on the day side… while on the night side, water freezes onto the bottom of the ice layer.
In an ice sheet-ocean world, there’s a big ocean on the day side and a big continent on the night side. As in the water-trapped world, a lot of ice forms on the night side, up to a kilometer thick. But if there’s enough geothermal heat, and enough water, not all the water gets frozen on the night side: enough melts to form an ocean on the day side.
Needless to say, these new scenarios are exciting because they could be more conducive to life!
Read more here:
• Jun Yang, Yonggang Liu, Yongyun Hu and Dorian S. Abbot, Water trapping on tidally locked terrestrial planets requires special conditions.
Abstract: Surface liquid water is essential for standard planetary habitability. Calculations of atmospheric circulation on tidally locked planets around M stars suggest that this peculiar orbital configuration lends itself to the trapping of large amounts of water in kilometers-thick ice on the night side, potentially removing all liquid water from the day side where photosynthesis is possible. We study this problem using a global climate model including coupled atmosphere, ocean, land, and sea-ice components as well as a continental ice sheet model driven by the climate model output.
For a waterworld we find that surface winds transport sea ice toward the day side and the ocean carries heat toward the night side. As a result, night-side sea ice remains about 10 meters thick and night-side water trapping is insignificant. If a planet has large continents on its night side, they can grow ice sheets about a kilometer thick if the geothermal heat flux is similar to Earth’s or smaller. Planets with a water complement similar to Earth’s would therefore experience a large decrease in sea level when plate tectonics drives their continents onto the night side, but would not experience complete day-side dessication. Only planets with a geothermal heat flux lower than Earth’s, much of their surface covered by continents, and a surface water reservoir about 10% of Earth’s would be susceptible to complete water trapping.
From a technical viewpoint, what’s fun about this new paper is that it uses detailed climate models that have been radically hacked to deal with a red dwarf star. Paraphrasing:
We perform climate simulations with the Community Climate System Model version 3.0 (CCSM3) which was originally developed by the National Center for Atmospheric Research to study the climate of Earth. The model contains four coupled components: atmosphere, ocean, sea ice, and land. The atmosphere component calculates atmospheric circulation and parameterizes sub-grid processes such as convection, precipitation, clouds, and boundary- layer mixing. The ocean component computes ocean circulation using the hydrostatic and Boussinesq approximations. The sea-ice component predicts ice fraction, ice thickness, ice velocity, and energy exchanges between the ice and the atmosphere/ ocean. The land component calculates surface temperature, soil water content, and evaporation.
We modify CCSM3 to simulate the climate of habitable planets around M stars following Rosenbloom et al., Liu et al., and Hu & Yang. The stellar spectrum we use is a blackbody with an effective temperature of 3400 K. We employ planetary parameters typical of a super-Earth: a radius of 1.5 R_{⊕}, gravity of 1.38 g_{⊕}, and an orbital period of 37 Earth-days. The orbital period of habitable zone planets around M stars is roughly 10–100 days. We set the insolation to 866 watts per square meter and both the obliquity and eccentricity to zero. The atmospheric surface pressure is 1.0 bar, including N_{2}, H_{2}O, and 355 parts per million CO_{2}.
And so on. Way cool! They consider a variety of different kinds of continents and oceans… including one where they’re just like those here on Earth—just because the data for that is easy to get!
Here’s a question I don’t know the answer to. To what extent can models like Community Climate System Model version 3.0 be tweaked to handle different planets? And what are the main things we should worry about: ways Earth-like planets can be different enough to seriously throw off the models?
We live in exciting times, where just as we’re making huge progress trying to understand the Earth’s climate in time to make wise decisions, we’re discovering hundreds of new planets with their own very different climates.
This blog article is about the temperature data used in the reports of the Intergovernmental panel on Climate Change (IPCC). I present the results of an investigation into the completeness of global land surface temperature records. There are noticeable gaps in the data records, but I leave discussion about the implications of these gaps to the readers.
The data used in the newest IPCC report, namely the Fifth Assessment Report (AR5) is, as it seems, at the time of writing not yet available at the IPCC data distribution centre.
The temperature databases used for the previous report, AR4, are listed here on the website of the IPCC. These databases are:
• CRUTEM3,
• NCDC (probably as a guess using the data set GHCNM v3),
• GISTEMP, and
• the collection of Lugina et al.
The temperature collection CRUTEM3 was put together by the Climatic Research Unit (CRU) at the University of East Anglia. According to the CRU temperature page the CRUTEM3 data and in particular the CRUTEM3 land air temperature anomalies on a 5° × 5° grid-box basis has now been superseded by the so-called CRUTEM4 collection.
Since the CRUTEM collection appeared to be an important data source for the IPCC, I started by investigating the land air temperature data collection CRUTEM4. In what follows, only the availability of so-called land air temperature measurements will be investigated. (The collections often also contain sea surface temperature (SST) measurements.)
Usually only ‘temperature grid data’ or other averaged data is used for the climate assessments. Here ‘grid’ means that data is averaged over regions that cover the earth in a grid. However, the data is originally generated by temperature measuring stations around the world. So, I was interested in this original data and its quality. For the CRUTEM collection the latest station data is called the CRUTEM4 station data collection.
I downloaded the station’s data file, which is a simple text file, from the bottom of the CRUTEM4 station data page. I noticed on a first glance that there are big gaps in the file in some regions of the world. The file is huge, though: it contains monthly measurements starting in January 1701 ending in 2011 and there are altogether 4634 stations. Quickly finding a gap in such a huge file was a sufficiently disconcerting experience that persuaded my husband Tim Hoffmann to help me to investigate this station data in more accessible way, via a visualization.
The visualization takes a long time to load, and due to some unfortunate software configuration issues (not on our side) it sometimes doesn’t work at all. Please open it now in a separate tab while reading this article:
• Nadja Kutz and Tim Hoffman, Temperature data from stations around the globe, collected by CRUTEM 4.
For those who are too lazy to explore the data themselves, or in case the visualization is not working, here are some screenshots from the visualization which documents the missing data in the CRUTEM4 dataset.
The images should speak for themselves. However, an additional explanation is provided after the images. One should in particular mention that it looks as if the deterioration of the CRUTEM4 data set has been greater in the years 2000-2009 than in the years 1980-2000.
Now you could say: okay, we know that there are budget cuts in the UK, and so probably the University of East Anglia was subject to those, but what about all these other collections in the world? This will be addressed after the images.
North America
Africa
Asia
Eurasia/Northern Africa
Jan 1980
Arctic
These screenshots comprise various regions of the world for the month of January for the years 1980, 2000 and 2009. Each station is represented by a small rectangle around its coordinates. The color of a rectangle indicates the monthly temperature value for that station: blue is the coldest, red is the hotttest. Black rectangles are what CRU calls ‘missing data’, denoted with -999 in the file. I prefer instead to call it ‘invalid’ data, in order to distinguish it from the missing data due to stations that have been closed down. In the visualization, closed down stations are encoded by a transparent rectangle and their markers are also present.
We couldn’t find the reasons for this invalid data. At the end of the post John Baez has provided some more literature on this question. It is worth noting that satellites can replace surface measurements only to a certain degree, as was highlighted by Stefan Rahmstorf in a blog post on RealClimate:
the satellites cannot measure the near-surface temperatures but only those overhead at a certain altitude range in the troposphere. And secondly, there are a few question marks about the long-term stability of these measurements (temporal drift).
Apart from the already mentioned collections, which were used in the IPCC’s AR4 report, there are actually some more institutional collections, and I also found some private weather collections. However among those private collections I haven’t found any collection that goes back in time as far as CRUTEM4. However, it could be that some of those private collections might be more complete in terms of actual data than the collections that reach further back in time.
After discussing our visualization on the Azimuth Forum it turned out that Nick Stokes, who runs the blog MOYHU in Australia, had the same idea as me—however, already in 2011. That is in this year he had visualized station data. For his visualization he used Google Earth. Moreover, for his visualization he used different temperature collections.
If you have Google Earth installed then you can see his visualizations here:
• Nick Stokes, click here.
The link is from the documentation page of Nick Stoke’s website.
As far as we can tell, the major global collections of temperature data that go back to the 18th or 19th or at least early 20th century seem to be following. First, there are the collections already mentioned, which are also used in the AR4 report:
• The CRUTEM collection from the University of East Anglia (UK).
• the GISTEMP collection from the Goddard Institute of Space Science (GISS) at NASA (US).
• the collection of Lugina et al, which is a cooperative project involving NCDC/NOAA (US) (see also below), the University of Maryland (US), St. Petersburg State University (Russia) and the State Hydrological Institute, St. Petersburg, (Russia).
• the GHCN collection from NOAA.
Then there are these:
• the Berkeley Earth collection, called BEST
• The GSOD (Global Summary Of the Day) and Global Historical Climatology Network (GHCN) collections. Both these are run by the National Climatic Data Center (NCDC) at National Oceanic and Atmospheric Administration (NOAA) (US). It is not clear to me to what extent these two databases overlap with those of Lugina et al, which were made in cooperation with NCDC/NOAA. It is also not clear to me whether the GHCN collection had been used for the AR4 report (it seems so). There is currently also a very partially working visualization of the GSOD data here. The sparse data in specific regions (see images above) is also apparent in this visualization.
• There is a comparatively new initiative called International Surface Temperatures Initiative (ISTI) which gathers collections in a databank and seeks to provide temperature data “from hourly to century timescales”. As written on their blog, this data seems not to be quality controlled:
The ISTI dataset is not quality controlled, so, after re-reading section 3.3 of Lawrimore et al 2011, I implemented an extremely simple quality control scheme, MADQC.
As far as I had understood in the visualization by Nick Stokes—which you just opened—the collection BEST (before 1850-2010), the collections GSOD (1921-2010) and GHCN v2 (before 1850-1990) from NOAA and CRUTEM3 (before 1850-2000) are represented.
CRUTEM3 is also visualized in another way Clive Best. In Clive Best’s visualization, it seems however that one has apart from the station name no further access to other data, like station temperatures, etc. Moreover, it is not possible to set a recent time range, which is important for checking how much the dataset changed in recent times.
Unfortunately this limited possibility to set a time range holds also true for two visualizations of Nick Stokes here and here. In his first visualization, which is more exhaustive than the second, the following datasets are shown: GHCNv3 and an adjusted version of it (GADJ), a prelimary dataset from ISTI, BEST and CRUTEM 4. So his first visualization seems quite exhaustive also with respect to newer data. Unfortunately, as mentioned, setting the time range didn’t work properly (at least when I tested it). The same holds for his second visualization of GHCN v3 data. So, I was only able to trace the deterioration of recent data manually (for example, by clicking on individual stations).
Tim and I visualized CRUTEM4, that is, the updated version of CRUTEM3.
Newer datasets after 2011/2012, for example from the aforementioned ISTI or from the private collections, are not visualized in the two collections you just opened.
Moreover in the visualizations mentioned hre, there is no coverage of the GISS collection, which however now uses NOAA’S GHCN v3 collections. The historical data of GISS could, however, be different from the other collections. The visualizations may also not cover the Lugina et al. collection, which was mentioned above in the context of the IPCC report. Lugina et al. could however be similar to GSOD (and GHCN) due to cooperation. Moreover, GHCN v3 could be substantially more exhaustive than CRUTEM or GHCN v2 (as shown in Nick Stoves visualization). However here the last collection was—like CRUTEM4—released in the spring of 2011.
GCHN v3 is also represented in Nick Stokes’ visualizations (here and here). Upon manually investigating it, it didn’t seem to much crucial additional data not found in CRUTEM4. Since this manual exploration was not exhaustive, I may be wrong—but I don’t think so.
Hence, to our knowledge, in the two visualizations you just opened, quite a lot of the available data is visualized—and as it seems “almost all” (?) of the far-back-reaching original quality controlled global surface temperature data collections as of 2011 or 2012. If you know of other similar collections please let us know.
As mentioned above private collections and in particular the ISTI collection may contain much more data. At the point of writing we don’t know in how far those newer collections will be taken into account for the new IPCC reports and in particular for the AR5 report. Moreover it seems not so clear how quality control may be ensured for those newer collections.
In conclusion, the previous IPCC reports seem to have been informed by the collections described here. Thus the coverage problems you see here need to be taken into account in discussions about the scientific base of previous climate descriptions.
Hopefully the visualizations from Nick Stokes and from Tim and me are ready for exploration! You can start to explore them yourself, and in particular see that the ‘deterioration of data’ is—just as in our CRUTEM4 visualization—also visible in Nick’s collections.
Note: I would like to thank people at the Azimuth Forum for pointing out references, and in particular Nick Stokes and Nathan Urban.
supplement by John Baez
There have always been fewer temperature recording stations in Arctic regions than other regions. The following paper initiated a controversy over how this fact affects our picture of the Earth’s climate:
Here is some discussion:
• Kevin Cowtan, Robert Way, and Dana Nuccitelli, Global warming since 1997 more than twice as fast as previously estimated, new study shows, Skeptical Science, 13 November 2013.
• Stefan Rahmstorf, Global warming since 1997 underestimated by half, RealClimate, 13 November 2013 in which it is highlighted that satellites can replace surface measurements only to a certain degree.
• Anthony Watts’ protest about Cowtan, Way and the Arctic, HotWhopper, 15 November 2013.
• Victor Venema, Temperature trend over last 15 years is twice as large as previously thought , Variable Variability, 13 November 2013.
However, these posts seem to say little about the increasing amount of ‘missing data’.
Here are some notes from the back offices of the Azimuth project. After a long and productive stay as the Azimuth tech guy, Andrew Stacey is moving along and passing the baton to me. As part of this change, we’ve relocated the servers to a new Azimuth hosted account, and updated the forum software.
The forum is now at a new location:
This is where we collaborate on writing wiki and blog articles, on research and education projects, and on software development and systems issues. It’s also a fun place to chat with other professionals in a wide range of science-related fields.
So come on down to the forum! If you want to post, just apply for an account there. Acceptance criteria are minimal. A sincere desire to help goes a long way.
Important: please use your full name, using “camel case” capitalization e.g. DavidTanzer, as your userid. I will then put the spaces into your user ID. (We want the spaces, but the registration form blocks them.) The point is that we want to present ourselves as we really are.
This problem is famously difficult. So I’m happy to report some progress:
• John Baez, Karine Bagdasaryan and Philip Gibbs, Lebesgue’s universal covering problem.
But we’d like you to check our work! It will help if you’re good at programming. As far as the math goes, it’s just high-school geometry… carried to a fanatical level of intensity.
Here’s the story:
A subset of the plane has diameter 1 if the distance between any two points in this set is ≤ 1. You know what a circle of diameter 1 looks like. But an equilateral triangle with edges of length 1 also has diameter 1:
After all, two points in this triangle are farthest apart when they’re at two corners.
Note that this triangle doesn’t fit inside a circle of diameter 1:
There are lots of sets of diameter 1, so it’s interesting to look for a set that can contain them all.
In 1914, the famous mathematician Henri Lebesgue sent a letter to a pal named Pál. And in this letter he challenged Pál to find the convex set with smallest possible area such that every set of diameter 1 fits inside.
More precisely, he defined a universal covering to be a convex subset of the plane that can cover a translated, reflected and/or rotated version of every subset of the plane with diameter 1. And his challenge was to find the universal covering with the least area.
Pál worked on this problem, and 6 years later he published a paper on it. He found a very nice universal covering: a regular hexagon in which one can inscribe a circle of diameter 1. This has area
But he also found a universal covering with less area, by removing two triangles from this hexagon—for example, the triangles C_{1}C_{2}C_{3} and E_{1}E_{2}E_{3} here:
Our paper explains why you can remove these triangles, assuming the hexagon was a universal covering in the first place. The resulting universal covering has area
In 1936, Sprague went on to prove that more area could be removed from another corner of Pál’s original hexagon, giving a universal covering of area
In 1992, Hansen took these reductions even further by removing two more pieces from Pál’s hexagon. Each piece is a thin sliver bounded by two straight lines and an arc. The first piece is tiny. The second is downright microscopic!
Hansen claimed the areas of these regions were 4 · 10^{-11} and 6 · 10^{-18}. However, our paper redoes his calculation and shows that the second number is seriously wrong. The actual areas are 3.7507 · 10^{-11} and 8.4460 · 10^{-21}.
Philip Gibbs has created a Java applet illustrating Hansen’s universal cover. I urge you to take a look! You can zoom in and see the regions he removed:
• Philip Gibbs, Lebesgue’s universal covering problem.
I find that my laptop, a Windows machine, makes it hard to view Java applets because they’re a security risk. I promise this one is safe! To be able to view it, I had to go to the “Search programs and files” window, find the “Configure Java” program, go to “Security”, and add
to the “Exception Site List”. It’s easy once you know what to do.
And it’s worth it, because only the ability to zoom lets you get a sense of the puny slivers that Hansen removed! One is the region XE_{2}T here, and the other is T’C_{3}V:
You can use this picture to help you find these regions in Philip Gibbs’ applet. But this picture is not in scale! In fact the smaller region, T’C_{3}V, has length 3.7 · 10^{-7} and maximum width 1.4 · 10^{-14}, tapering down to a very sharp point.
That’s about a few atoms wide if you draw the whole hexagon on paper! And it’s about 30 million times longer than it is wide. This is the sort of thing you can only draw with the help of a computer.
Anyway, Hansen’s best universal covering had an area of
This tiny improvement over Sprague’s work led Klee and Wagon to write:
it does seem safe to guess that progress on [this problem], which has been painfully slow in the past, may be even more painfully slow in the future.
However, our new universal covering removes about a million times more area than Hansen’s larger region: a whopping 2.233 · 10^{-5}. So, we get a universal covering with area
The key is to slightly rotate the dodecagon shown in the above pictures, and then use the ideas of Pál and Sprague.
There’s a lot of room between our number and the best lower bound on this problem, due to Brass and Sharifi:
So, one way or another, we can expect a lot of progress now that computers are being brought to bear.
Read our paper for the details! If you want to check our work, we’ll be glad to answer lots of detailed questions. We want to rotate the dodecagon by an amount that minimizes the area of the universal covering we get, so we use a program to compute the area for many choices of rotation angle:
• Philip Gibbs, Java program.
The program is not very long—please study it or write your own, in your own favorite language! The output is here:
• Philip Gibbs, Java program output.
and as explained at the end of our paper, the best rotation angle is about 1.3^{°}.
• Workshop on Mathematical Trends in Reaction Network Theory, 1-3 July 2015, Department of Mathematical Sciences, University of Copenhagen. Organized by Elisenda Feliu and Carsten Wiuf.
This workshop focuses on current and new trends in mathematical reaction network theory, which we consider broadly to be the theory describing the behaviour of systems of (bio)chemical reactions. In recent years, new interesting approaches using theory from dynamical systems, stochastics, algebra and beyond, have appeared. We aim to provide a forum for discussion of these new approaches and to bring together researchers from different communities.
The workshop starts in the morning of Wednesday, July 1st, and finishes at lunchtime on Friday, July 3rd. In the morning there will be invited talks, followed by contributed talks in the afternoon. There will be a reception and poster session Wednesday in the afternoon, and a conference dinner Thursday. For those participants staying Friday afternoon, a sightseeing event will be arranged.
The workshop is organized by the research group on Mathematics of Reaction Networks at the Department of Mathematical Sciences, University of Copenhagen. The event is sponsored by the Danish Research Council, the Department of Mathematical Sciences and the Dynamical Systems Interdisciplinary Network, which is part of the UCPH Excellence Programme for Interdisciplinary Research.
• Nikki Meskhat (North Carolina State University, US)
• Alan D. Rendall (Johannes Gutenberg Universität Mainz, Germany)
• János Tóth (Budapest University of Technology and Economics, Hungary)
• Sebastian Walcher (RWTH Aachen, Germany)
• Gheorghe Craciun (University of Wisconsin, Madison, US)
• David Doty (California Institute of Technology, US)
>
• Manoj Gopalkrishnan (Tata Institute of Fundamental Research, India)
• Michal Komorowski (Institute of Fundamental Technological Research, Polish Academy of Sciences, Poland)
• John Baez (University of California, Riverside, US)
Abstract submission for posters and contributed talks: March 15, 2015.
Notification of acceptance: March 26, 2015.
Registration deadline: May 15, 2015.
Conference: July 1-3, 2015.
The organizers are Elisenda Feliu and Carsten Wiuf at the Department of Mathematical Sciences of the University of Copenhagen.
They’ve written some interesting papers on reaction networks, including some that discuss chemical reactions with more than one stationary state. This is a highly nonlinear regime that’s very important in biology:
• Elisenda Feliu and Carsten Wiuf, A computational method to preclude multistationarity in networks of interacting species, Bioinformatics 29 (2013), 2327-2334.
Motivation. Modeling and analysis of complex systems are important aspects of understanding systemic behavior. In the lack of detailed knowledge about a system, we often choose modeling equations out of convenience and search the (high-dimensional) parameter space randomly to learn about model properties. Qualitative modeling sidesteps the issue of choosing specific modeling equations and frees the inference from specific properties of the equations. We consider classes of ordinary differential equation (ODE) models arising from interactions of species/entities, such as (bio)chemical reaction networks or ecosystems. A class is defined by imposing mild assumptions on the interaction rates. In this framework, we investigate whether there can be multiple positive steady states in some ODE models in a given class.
Results. We have developed and implemented a method to decide whether any ODE model in a given class cannot have multiple steady states. The method runs efficiently on models of moderate size. We tested the method on a large set of models for gene silencing by sRNA interference and on two publicly available databases of biological models, KEGG and Biomodels. We recommend that this method is used as (i) a pre-screening step for selecting an appropriate model and (ii) for investigating the robustness of non-existence of multiple steady state for a given ODE model with respect to variation in interaction rates.
Availability and Implementation. Scripts and examples in Maple are available in the Supplementary Information.
• Elisenda Feliu, Injectivity, multiple zeros, and multistationarity in reaction networks, Proceedings of the Royal Society A.
Abstract. Polynomial dynamical systems are widely used to model and study real phenomena. In biochemistry, they are the preferred choice for modelling the concentration of chemical species in reaction networks with mass-action kinetics. These systems are typically parameterised by many (unknown) parameters. A goal is to understand how properties of the dynamical systems depend on the parameters. Qualitative properties relating to the behaviour of a dynamical system are locally inferred from the system at steady state. Here we focus on steady states that are the positive solutions to a parameterised system of generalised polynomial equations. In recent years, methods from computational algebra have been developed to understand these solutions, but our knowledge is limited: for example, we cannot efficiently decide how many positive solutions the system has as a function of the parameters. Even deciding whether there is one or more solutions is non-trivial. We present a new method, based on so-called injectivity, to preclude or assert that multiple positive solutions exist. The results apply to generalised polynomials and variables can be restricted to the linear, parameter-independent first integrals of the dynamical system. The method has been tested in a wide range of systems.
You can see more of their papers on their webpages.
Google’s Goal: Renewable Energy Cheaper than Coal
Creates renewable energy R&D group and supports breakthrough technologies
Mountain View, Calif. (November 27, 2007) – Google (NASDAQ: GOOG) today announced a new strategic initiative to develop electricity from renewable energy sources that will be cheaper than electricity produced from coal. The newly created initiative, known as RE<C, will focus initially on advanced solar thermal power, wind power technologies, enhanced geothermal systems and other potential breakthrough technologies. RE<C is hiring engineers and energy experts to lead its research and development work, which will begin with a significant effort on solar thermal technology, and will also investigate enhanced geothermal systems and other areas. In 2008, Google expects to spend tens of millions on research and development and related investments in renewable energy. As part of its capital planning process, the company also anticipates investing hundreds of millions of dollars in breakthrough renewable energy projects which generate positive returns.
But in 2011, Google shut down the program. I never heard why. Recently two engineers involved in the project have given a good explanation:
• Ross Koningstein and David Fork, What it would really take to reverse climate change, 18 November 2014.
Please read it!
But the short version is this. They couldn’t find a way to accomplish their goal: producing a gigawatt of renewable power more cheaply than a coal-fired plant — and in years, not decades.
And since then, they’ve been reflecting on their failure and they’ve realized something even more sobering. Even if they’d been able to realize their best-case scenario — a 55% carbon emissions cut by 2050 — it would not bring atmospheric CO_{2} back below 350 ppm during this century.
This is not surprising to me.
What would we need to accomplish this? They say two things. First, a cheap dispatchable, distributed power source:
Consider an average U.S. coal or natural gas plant that has been in service for decades; its cost of electricity generation is about 4 to 6 U.S. cents per kilowatt-hour. Now imagine what it would take for the utility company that owns that plant to decide to shutter it and build a replacement plant using a zero-carbon energy source. The owner would have to factor in the capital investment for construction and continued costs of operation and maintenance—and still make a profit while generating electricity for less than $0.04/kWh to $0.06/kWh.
That’s a tough target to meet. But that’s not the whole story. Although the electricity from a giant coal plant is physically indistinguishable from the electricity from a rooftop solar panel, the value of generated electricity varies. In the marketplace, utility companies pay different prices for electricity, depending on how easily it can be supplied to reliably meet local demand.
“Dispatchable” power, which can be ramped up and down quickly, fetches the highest market price. Distributed power, generated close to the electricity meter, can also be worth more, as it avoids the costs and losses associated with transmission and distribution. Residential customers in the contiguous United States pay from $0.09/kWh to $0.20/kWh, a significant portion of which pays for transmission and distribution costs. And here we see an opportunity for change. A distributed, dispatchable power source could prompt a switchover if it could undercut those end-user prices, selling electricity for less than $0.09/kWh to $0.20/kWh in local marketplaces. At such prices, the zero-carbon system would simply be the thrifty choice.
But “dispatchable”, they say, means “not solar”.
Second, a lot of carbon sequestration:
While this energy revolution is taking place, another field needs to progress as well. As Hansen has shown, if all power plants and industrial facilities switch over to zero-carbon energy sources right now, we’ll still be left with a ruinous amount of CO_{2} in the atmosphere. It would take centuries for atmospheric levels to return to normal, which means centuries of warming and instability. To bring levels down below the safety threshold, Hansen’s models show that we must not only cease emitting CO_{2} as soon as possible but also actively remove the gas from the air and store the carbon in a stable form. Hansen suggests reforestation as a carbon sink. We’re all for more trees, and we also exhort scientists and engineers to seek disruptive technologies in carbon storage.
How to achieve these two goals? They say government and energy businesses should spend 10% of employee time on “strange new ideas that have the potential to be truly disruptive”.
You can click on any of the pictures to see where it came from or get more information.
I’m very flattered to be invited to speak here. I was probably invited because of my abstract mathematical work on networks and category theory. But when I got the invitation, instead of talking about something I understood, I thought I’d learn about something a bit more practical and talk about that. That was a bad idea. But I’ll try to make the best of it.
I’ve been trying to learn climate science. There’s a subject called ‘complex networks’ where people do statistical analyses of large graphs like the worldwide web or Facebook and draw conclusions from it. People are trying to apply these ideas to climate science. So that’s what I’ll talk about. I’ll be reviewing a lot of other people’s work, but also describing some work by a project I’m involved in, the Azimuth Project.
The Azimuth Project is an all-volunteer project involving scientists and programmers, many outside academia, who are concerned about environmental issues and want to use their skills to help. This talk is based on the work of many people in the Azimuth Project, including Jan Galkowski, Graham Jones, Nadja Kutz, Daniel Mahler, Blake Pollard, Paul Pukite, Dara Shayda, David Tanzer, David Tweed, Steve Wenner and others. Needless to say, I’m to blame for all the mistakes.
Okay, let’s get started.
You’ve probably heard about the ‘global warming pause’. Is this a real thing? If so, is it due to ‘natural variability’, heat going into the deep oceans, some combination of both, a massive failure of our understanding of climate processes, or something else?
Here is chart of global average air temperatures at sea level, put together by NASA’s Goddard Institute of Space Science:
You can see a lot of fluctuations, including a big dip after 1940 and a tiny dip after 2000. That tiny dip is the so-called ‘global warming pause’. What causes these fluctuations? That’s a big, complicated question.
One cause of temperature fluctuations is a kind of cycle whose extremes are called El Niño and La Niña.
A lot of things happen during an El Niño. For example, in 1997 and 1998, a big El Niño, we saw all these events:
El Niño is part of an irregular cycle that happens every 3 to 7 years, called the El Niño Southern Oscillation or ENSO. Two strongly correlated signs of an El Niño are:
1) Increased sea surface temperatures in a patch of the Pacific called the Niño 3.4 region. The temperature anomaly in this region—how much warmer it is than usual for that time of year—is called the Niño 3.4 index.
2) A decrease in air pressures in the western side of the Pacific compared to those further east. This is measured by the Southern Oscillation Index or SOI.
You can see the correlation here:
El Niños are important because they can cause billions of dollars of economic damage. They also seem to bring heat stored in the deeper waters of the Pacific into the atmosphere. So, one reason for the ‘global warming pause’ may be that we haven’t had a strong El Niño since 1998. The global warming pause might end with the next El Niño. For a while it seemed we were due for a big one this fall, but that hasn’t happened.
The ENSO cycle is just one of many cycles involving teleconnections: strong correlations between weather at distant locations, typically thousands of kilometers. People have systematically looked for these teleconnections using principal component analysis of climate data, and also other techniques.
The ENSO cycle shows up automatically when you do this kind of study. It stands out as the biggest source of climate variability on time scales greater than a year and less than a decade. Some others include:
• The Pacific-North America Oscillation.
• The Pacific Decadal Oscillation.
• The North Atlantic Oscillation.
• The Arctic Oscillation.
For example, the Pacific Decadal Oscillation is a longer-period relative of the ENSO, centered in the north Pacific:
Recently people have begun to study teleconnections using ideas from ‘complex network theory’.
What’s that? In complex network theory, people often start with a weighted graph: that is, a set of nodes and for any pair of nodes a weight which can be any nonnegative real number.
Why is this called a weighted graph? It’s really just a matrix of nonnegative real numbers!
The reason is that we can turn any weighted graph into a graph by drawing an edge from node to node whenever This is a directed graph, meaning that we should draw an arrow pointing from to We could have an edge from to but not vice versa! Note that we can also have an edge from a node to itself.
Conversely, if we have any directed graph, we can turn it into a weighted graph by choosing the weight when there’s an edge from to and otherwise.
For example, we can make a weighted graph where the nodes are web pages and is the number of links from the web page to the web page
People in complex network theory like examples of this sort: large weighted graphs that describe connections between web pages, or people, or cities, or neurons, or other things. The goal, so far, is to compute numbers from weighted graphs in ways that describe interesting properties of these complex networks—and then formulate and test hypotheses about the complex networks we see in real life.
Here’s a very simple example of what we can do with a weighted graph. For any node we can sum up the weights of edges going into
This is called the degree of the node For example, if lots of people have web pages with lots of links to yours, your webpage will have a high degree. If lots of people like you on Facebook, you will have a high degree.
So, the degree is some measure of how ‘important’ a node is.
People have constructed climate networks where the nodes are locations on the Earth’s surface, and the weight measures how correlated the weather is at the th and th location. Then, the degree says how ‘important’ a given location is for the Earth’s climate—in some vague sense.
For example, in Complex networks in climate dynamics, Donges et al take surface air temperature data on a grid and compute the correlation between grid points.
More precisely, let be the temperature at the th grid point at month after the average for that month in all years under consideration has been subtracted off, to eliminate some seasonal variations. They compute the Pearson correlation of and for each pair of grid points The Pearson correlation is the simplest measure of linear correlation, normalized to range between -1 and 1.
We could construct a weighted graph this way, and it would be symmetric, or undirected:
However, Donges et al prefer to work with a graph rather than a weighted graph. So, they create a graph where there is an edge from to (and also from to ) when exceeds a certain threshold, and no edge otherwise.
They can adjust this threshold so that any desired fraction of pairs actually have an edge between them. After some experimentation they chose this fraction to be 0.5%.
A certain patch dominates the world! This is the El Niño basin. The Indian Ocean comes in second.
(Some details, which I may not say:
The Pearson correlation is the covariance
normalized by dividing by the standard deviation of and the standard deviation of
The reddest shade of red in the above picture shows nodes that are connected to 5% or more of the other nodes. These nodes are connected to at least 10 times as many nodes as average.)
The Pearson correlation detects linear correlations. A more flexible measure is mutual information: how many bits of information knowing the temperature at time at grid point tells you about the temperature at the same time at grid point
Donges et al create a climate network this way as well, putting an edge between nodes if their mutual information exceeds a certain cutoff. They choose this cutoff so that 0.5% of node pairs have an edge between them, and get the following map:
The result is almost indistinguishable in the El Niño basin. So, this feature is not just an artifact of focusing on linear correlations.
We can also look at how climate networks change with time—and in particular, how they are affected by El Niños. This is the subject of a 2008 paper by Tsonis and Swanson, Topology and predictability of El Niño and La Niña networks.
They create a climate network in a way that’s similar to the one I just described. The main differences are that they:
create a link between grid points when their Pearson correlation has absolute value greater than $0.5;$
only use temperature data from November to March in each year, claiming that summertime introduces spurious links.
They get this map for La Niña conditions:
and this map for El Niño conditions:
They conclude that “El Niño breaks climate links”.
This may seem to contradict what I just said a minute ago. But it doesn’t! While the El Niño basin is a region where the surface air temperatures are highly correlated to temperatures at many other points, when an El Niño actually occurs it disrupts correlations between temperatures at different locations worldwide—and even in the El Niño basin!
For the rest of the talk I want to focus on a third claim: namely, that El Niños can be predicted by means of an increase in correlations between temperatures within the El Niño basin and temperatures outside this region. This claim was made in a recent paper by Ludescher et al. I want to examine it somewhat critically.
People really want to predict El Niños, because they have huge effects on agriculture, especially around the Pacific ocean. However, it’s generally regarded as very hard to predict El Niños more than 6 months in advance. There is also a spring barrier: it’s harder to predict El Niños through the spring of any year.
It’s controversial how much of the unpredictability in the ENSO cycle is due to chaos intrinsic to the Pacific ocean system, and how much is due to noise from outside the system. Both may be involved.
There are many teams trying to predict El Niños, some using physical models of the Earth’s climate, and others using machine learning techniques. There is a kind of competition going on, which you can see at a National Oceanic and Atmospheric Administration website.
The most recent predictions give a sense of how hard this job is:
When the 3-month running average of the Niño 3.4 index exceeds 0.5°C for 5 months, we officially declare that there is an El Niño.
As you can see, it’s hard to be sure if there will be an El Niño early next year! However, the consensus forecast is yes, a weak El Niño. This is the best we can do, now. Right now multi-model ensembles have better predictive skill than any one model.
The Azimuth Project has carefully examined a 2013 paper by Ludescher et al called Very early warning of next El Niño, which uses a climate network for El Niño prediction.
They build their climate network using correlations between daily surface air temperature data between points inside the El Niño basin and certain points outside this region, as shown here:
The red dots are the points in their version of the El Niño basin.
(Next I will describe Ludescher’s procedure. I may omit some details in the actual talk, but let me include them here.)
The main idea of Ludescher et al is to construct a climate network that is a weighted graph, and to say an El Niño will occur if the average weight of edges between points in the El Niño basin and points outside this basin exceeds a certain threshold.
As in the other papers I mentioned, Ludescher et al let be the surface air temperature at the th grid point at time minus the average temperature at that location at that time of year in all years under consideration, to eliminate the most obvious seasonal effects.
They consider a time-delayed covariance between temperatures at different grid points:
where is a time delay, and the angle brackets denote a running average over the last year, that is:
where is the time in days.
They normalize this to define a correlation that ranges from -1 to 1.
Next, for any pair of nodes and and for each time they determine the maximum, the mean and the standard deviation of as the delay ranges from -200 to 200 days.
They define the link strength as the difference between the maximum and the mean value of divided by its standard deviation.
Finally, they let be the average link strength, calculated by averaging over all pairs where is a grid point inside their El Niño basin and is a grid point outside this basin, but still in their larger rectangle.
Here is what they get:
The blue peaks are El Niños: episodes where the Niño 3.4 index is over 0.5°C for at least 5 months.
The red line is their ‘average link strength’. Whenever this exceeds a certain threshold and the Niño 3.4 index is not already over 0.5°C, they predict an El Niño will start in the following calendar year.
Ludescher et al chose their threshold for El Niño prediction by training their algorithm on climate data from 1948 to 1980, and tested it on data from 1981 to 2013. They claim that with this threshold, their El Niño predictions were correct 76% of the time, and their predictions of no El Niño were correct in 86% of all cases.
On this basis they claimed—when their paper was published in February 2014—that the Niño 3.4 index would exceed 0.5 by the end of 2014 with probability 3/4.
The latest data as of 1 December 2014 seems to say: yes, it happened!
Graham Jones of the Azimuth Project wrote code implementing Ludescher et al’s algorithm, as best as we could understand it, and got results close to theirs, though not identical. The code is open-source; one goal of the Azimuth Project is to do science ‘in the open’.
More interesting than the small discrepancies between our calculation and theirs is the question of whether ‘average link strengths’ between points in the El Niño basin and points outside are really helpful in predicting El Niños.
Steve Wenner, a statistician helping the Azimuth Project, noted some ambiguities in Ludescher et al‘s El Niño prediction rules and disambiguated them in a number of ways. For each way he used Fischer’s exact test to compute the -value of the null hypothesis that Ludescher et al‘s El Niño prediction does not improve the odds that what they predict will occur.
The best he got (that is, the lowest -value) was 0.03. This is just a bit more significant than the conventional 0.05 threshold for rejecting a null hypothesis.
Do high average link strengths between points in the El Niño basin and points elsewhere in the Pacific really increase the chance that an El Niño is coming? It is hard to tell from the work of Ludescher et al.
One reason is that they treat El Niño as a binary condition, either on or off depending on whether the Niño 3.4 index for a given month exceeds 0.5 or not. This is not the usual definition of El Niño, but the real problem is that they are only making a single yes-or-no prediction each year for 65 years: does an El Niño occur during this year, or not? 31 of these years (1950-1980) are used for training their algorithm, leaving just 34 retrodictions and one actual prediction (1981-2013, and 2014).
So, there is a serious problem with small sample size.
We can learn a bit by taking a different approach, and simply running some linear regressions between the average link strength and the Niño 3.4 index for each month. There are 766 months from 1950 to 2013, so this gives us more data to look at. Of course, it’s possible that the relation between average link strength and Niño is highly nonlinear, so a linear regression may not be appropriate. But it is at least worth looking at!
Daniel Mahler and Dara Shayda of the Azimuth Project did this and found the following interesting results.
Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength on the same month:
(Click on these scatter plots for more information.)
The coefficient of determination, is 0.0175. In simple terms, this means that the average link strength in a given month explains just 1.75% of the variance of the Niño 3.4 index. That’s quite low!
Here is a scatter plot showing the Niño 3.4 index as a function of the average link strength six months earlier:
Now is 0.088. So, the link strength explains 8.8% of the variance in the Niño 3.4 index 6 months later. This is still not much—but interestingly, it’s much more than when we try to relate them at the same moment in time! And the -value is less than so the effect is statistically significant.
Of course, we could also try to use Niño 3.4 to predict itself. Here is the Niño 3.4 index plotted against the Niño 3.4 index six months earlier:
Now So, this is better than using the average link strength!
That doesn’t sound good for average link strength. But now let’s could try to predict Niño 3.4 using both itself and the average link strength 6 months earlier. Here is a scatter plot showing that:
Here the axis is an optimally chosen linear combination of average and link strength and Niño 3.4: one that maximizes .
In this case we get
What can we conclude from this?
Using a linear model, the average link strength on a given month accounts for only 8% of the variance of Niño 3.4 index 6 months in the future. That sounds bad, and indeed it is.
However, there are more interesting things to say than this!
Both the Niño 3.4 index and the average link strength can be computed from the surface air temperature of the Pacific during some window in time. The Niño 3.4 index explains 16% of its own variance 6 months into the future; the average link strength explains 8%, and taken together they explain 22%. So, these two variables contain a fair amount of independent information about the Niño 3.4 index 6 months in the future.
Furthermore, they explain a surprisingly large amount of its variance for just 2 variables.
For comparison, Mahler used a random forest variant called ExtraTreesRegressor to predict the Niño 3.4 index 6 months into the future from much larger collections of data. Out of the 778 months available he trained the algorithm on the first 400 and tested it on the remaining 378.
The result: using a full world-wide grid of surface air temperature values at a given moment in time explains only 23% of the Niño 3.4 index 6 months into the future. A full grid of surface air pressure values does considerably better, but still explains only 34% of the variance. Using twelve months of the full grid of pressure values only gets around 37%.
From this viewpoint, explaining 22% of the variance with just two variables doesn’t look so bad!
Moreover, while the Niño 3.4 index is maximally correlated with itself at the same moment in time, for obvious reasons, the average link strength is maximally correlated with the Niño 3.4 index 10 months into the future:
(The lines here occur at monthly intervals.)
However, we have not tried to determine if the average link strength as Ludescher et al define it is optimal in this respect. Graham Jones has shown that simplifying their definition of this quantity doesn’t change it much. Maybe modifying their definition could improve it. There seems to be a real phenomenon at work here, but I don’t think we know exactly what it is!
My talk has avoided discussing physical models of the ENSO, because I wanted to focus on very simple, general ideas from complex network theory. However, it seems obvious that really understanding the ENSO requires a lot of ideas from meteorology, oceanography, physics, and the like. I am not advocating a ‘purely network-based approach’.