guest post by Jan Galkowski
5. Trends Are Tricky
Trends as a concept are easy. But trends as objective measures are slippery. Consider the Keeling Curve, the record of atmospheric carbon dioxide concentration first begun by Charles Keeling in the 1950s and continued in the face of great obstacles. This curve is reproduced in Figure 8, and there presented in its original, and then decomposed into three parts, an annual sinusoidal variation, a linear trend, and a stochastic remainder.
The question is, which component represents the true trend, long term or otherwise? Are linear trends superior to all others? The importance of a trend is tied up with to what use it will be put. A pair of trends, like the sinusoidal and the random residual of the Keeling, might be more important for predicting its short term movements. On the other hand, explicating the long term behavior of the system being measured might feature the large scale linear trend, with the seasonal trend and random variations being but distractions.
Consider the global surface temperature anomalies of Figure 5 again. What are some ways of determining trends? First, note that by “trends” what’s really meant are slopes. In the case where there are many places to estimate slopes, there are many slopes. When, for example, a slope is estimated by fitting a line to all the points, there’s just a single slope such as in Figure 9. Local linear trends can be estimated from pairs of points in differing sizes of neighborhoods, as depicted in Figures 10 and 11. These can be averaged, if you like, to obtain an overall trend.
Lest the reader think constructing lots of linear trends on varying neighborhoods is somehow crude, note it has a noble history, being used by Boscovich to estimate Earth’s ellipticity about 1750, as reported by Koenker.
There is, in addition, a question of what to do if local intervals for fitting the little lines overlap, since these are then (on the face of it) not independent of one another. There are a number of statistical devices for making them independent. One way is to do clever kinds of random sampling from a population of linear trends. Another way is to shrink the intervals until they are infinitesimally small, and, so, necessarily independent. That definition is just the point slope of a curve going through the data, or its first derivative. Numerical methods for estimating these exist—and to the degree they succeed, they obtain estimates of the derivative, even if in doing do they might use finite intervals.
One good way of estimating derivatives involves using a smoothing spline, as sketched in Figure 6, and estimating the derivative(s) of that. Such an estimate of the derivative is shown in Figure 12 where the instantaneous slope is plotted in orange atop the data of Figure 6. The value of the derivative should be read using the scale to the right of the graph. The value to the left shows, as before, temperature anomaly in degrees. The cubic spline itself is plotted in green in that figure. Here it’s smoothing parameter is determined by generalized cross-validation, a principled means of taking the subjectivity out of the choice of smoothing parameter. That is explained a bit more in the caption for Figure 12. (See also Cr1979.)
What else might we do?
We could go after a really good approximation to the data of Figure 5. One possibility is to use the Bayesian Rauch-Tung-Striebel (“RTS”) smoother to get a good approximation for the underlying curve and estimate the derivatives of that. This is a modification of the famous Kalman filter, the workhorse of much controls engineering and signals work. What that means and how these work is described in an accompanying inset box.
Using the RTS smoother demands variances of the signal be estimated as priors. The larger the ratio of the estimate of the observations variance to the estimate of the process variance is, the smoother the RTS solution. And, yes, as the reader may have guessed, that makes the result dependent upon initial conditions, although hopefully educated initial conditions.
The RTS smoother result for two process variance values of 0.118 ± 002 and high 0.59 ± 0.02 is shown in Figure 13. These are 3 and 15 times the decorrelated variance value for the series of 0.039 ± 0.001, estimated using the long term variance for this series and others like it, corrected for serial correlation. One reason for using two estimates of the process variance is to see how much difference that makes. As can be seen from Figure 13, it does not make much.
Combining all six methods of estimating trends results in Figure 14, which shows the overprinted densities of slopes.
Note the spread of possibilities given by the 5 year local linear fits. The 10 year local linear fits, the spline, and the RTS smoother fits have their mode in the vicinity of the overall slope. The 10 year local linear fits slope has broader support, meaning it admits more negative slopes in the range of temperature anomalies observed. The RTS smoother results have peaks slightly below those for the spline, the 10 year local linear fits, and the overall slope. The kernel density estimator allows the possibility of probability mass below zero, even though the spline, and two RTS smoother fits never exhibit slopes below zero. This is a Bayesian-like estimator, since the prior is the real line.
Local linear fits to HadCRUT4 time series were used by Fyfe, Gillet, and Zwiers in their 2013 paper and supplement. We do not know the computational details of those trends, since they were not published, possibly due to Nature Climate Change page count restrictions. Those details matter. From these calculations, which, admittedly, are not as comprehensive as those by Fyfe, Gillet, and Zwiers, we see that robust estimators of trends in temperature during the observational record show these are always positive, even if the magnitudes vary. The RTS smoother solutions suggest slopes in recent years are near zero, providing a basis for questioning whether or not there is a warming “hiatus”.
The Rauch-Tung-Striebel smoother is an enhancement of the Kalman filter. Let denote a set of univariate observations at equally space and successive time steps . Describe these as follows:
The multivariate is called a state vector for index . and are given, constant matrices. Equations (5.3) and (5.4) say that the noise component of observations and states are distributed as zero mean Gaussian random variables with variance and covariance , respectively. This simple formulation in practice has great descriptive power, and is widely used in engineering and data analysis. For instance, it is possible to cast autoregressive moving average models (“ARMA”) in this form. (See Kitigawa, Chapter 10.) The key idea is that equation (5.1) describes at observation at time as the result of a linear regression on coefficients , where is the corresponding design matrix. Then, the coefficients themselves change with time, using a Markov-like development, a linear regression of the upcoming set of coefficients, , in terms of the current coefficients, , where is the design matrix. For the purposes here, a simple version of this is used, something called a local level model (Chapter 2) and occasionally a Gaussian random walk with noise model (Section 12.3.1). In that instance, and are not only scalars, they are unity, resulting in the simpler with scalar variances and . In either case, the Kalman filter is a way of calculating , given , values for and , and estimates for and . Choices for and are considered a model for the data. Choices for and are based upon experience with and the model. In practice, and within limits, the bigger the ratio the smoother the solution for over successive . Now, the Rauch-Tung-Striebel extension of the Kalman filter amounts to (a) interpreting it in a Bayesian context, and (b) using that interpretation and Bayes Rule to retrospectively update with the benefit of information through and the current state . Details won’t be provided here, but are described in depth in many texts, such as Cowpertwait and Metcalfe, Durbin and Koopman, and Särkkä. Finally, commenting on the observation regarding subjectivity of choice in the ratio of variances, mentioned in Section 5 at the discussion of their choice “smoother” here has a specific meaning. If this ratio is smaller, the RTS solution tracks the signal more closely, meaning its short term variability is higher. A small ratio has implications for forecasting, increasing the prediction variance. |
6. Internal Decadal Variability
The recent IPCC AR5 WG1 Report sets out the context in its Box TS.3:
Hiatus periods of 10 to 15 years can arise as a manifestation of internal decadal climate variability, which sometimes enhances and sometimes counteracts the long-term externally forced trend. Internal variability thus diminishes the relevance of trends over periods as short as 10 to 15 years for long-term climate change (Box 2.2, Section 2.4.3). Furthermore, the timing of internal decadal climate variability is not expected to be matched by the CMIP5 historical simulations, owing to the predictability horizon of at most 10 to 20 years (Section 11.2.2; CMIP5 historical simulations are typically started around nominally 1850 from a control run). However, climate models exhibit individual decades of GMST trend hiatus even during a prolonged phase of energy uptake of the climate system (e.g., Figure 9.8; Easterling and Wehner, 2009; Knight et al., 2009), in which case the energy budget would be balanced by increasing subsurface-ocean heat uptake (Meehl et al., 2011, 2013a; Guemas et al., 2013).
Owing to sampling limitations, it is uncertain whether an increase in the rate of subsurface-ocean heat uptake occurred during the past 15 years (Section 3.2.4). However, it is very likely that the climate system, including the ocean below 700 m depth, has continued to accumulate energy over the period 1998-2010 (Section 3.2.4, Box 3.1). Consistent with this energy accumulation, global mean sea level has continued to rise during 1998-2012, at a rate only slightly and insignificantly lower than during 1993-2012 (Section 3.7). The consistency between observed heat-content and sea level changes yields high confidence in the assessment of continued ocean energy accumulation, which is in turn consistent with the positive radiative imbalance of the climate system (Section 8.5.1; Section 13.3, Box 13.1). By contrast, there is limited evidence that the hiatus in GMST trend has been accompanied by a slower rate of increase in ocean heat content over the depth range 0 to 700 m, when comparing the period 2003-2010 against 1971-2010. There is low agreement on this slowdown, since three of five analyses show a slowdown in the rate of increase while the other two show the increase continuing unabated (Section 3.2.3, Figure 3.2). [Emphasis added by author.]
During the 15-year period beginning in 1998, the ensemble of HadCRUT4 GMST trends lies below almost all model-simulated trends (Box 9.2 Figure 1a), whereas during the 15-year period ending in 1998, it lies above 93 out of 114 modelled trends (Box 9.2 Figure 1b; HadCRUT4 ensemble-mean trend per decade, CMIP5 ensemble-mean trend per decade). Over the 62-year period 1951-2012, observed and CMIP5 ensemble-mean trends agree to within per decade (Box 9.2 Figure 1c; CMIP5 ensemble-mean trend per decade). There is hence very high confidence that the CMIP5 models show long-term GMST trends consistent with observations, despite the disagreement over the most recent 15-year period. Due to internal climate variability, in any given 15-year period the observed GMST trend sometimes lies near one end of a model ensemble (Box 9.2, Figure 1a, b; Easterling and Wehner, 2009), an effect that is pronounced in Box 9.2, Figure 1a, because GMST was influenced by a very strong El Niño event in 1998. [Emphasis added by author.]
The contributions of Fyfe, Gillet, and Zwiers (“FGZ”) are to (a) pin down this behavior for a 20 year period using the HadCRUT4 data, and, to my mind, more importantly, (b) to develop techniques for evaluating runs of ensembles of climate models like the CMIP5 suite without commissioning specific runs for the purpose. This, if it were to prove out, would be an important experimental advance, since climate models demand expensive and extensive hardware, and the number of people who know how to program and run them is very limited, possibly a more limiting practical constraint than the hardware.
This is the beginning of a great story, I think, one which both advances an understanding of how our experience of climate is playing out, and how climate science is advancing. FGZ took a perfectly reasonable approach and followed it to its logical conclusion, deriving an inconsistency. There’s insight to be won resolving it.
FGZ try to explicitly model trends due to internal variability. They begin with two equations:
is the model membership index. is the index of the model’s ensemble. runs over bootstrap samples taken from HadCRUT4 observations. Here, and are trends calculated using models or observations, respectively. and denote the “true, unknown, deterministic trends due to external forcing” common to models and observations, respectively. and are the perturbations to trends due to internal variability of models and observations. denotes error in climate model trends for model . denotes the sampling error in the sample. FGZ assume are exchangeable with each other as well, at least for the same time . (See [Di1977, Di1988, Ro2013c, Co2005] for more on exchangeability.) Note that while the internal variability of climate models varies from model to model, run to run, and time to time, the ‘internal variability of observations’, namely , is assumed to only vary with time.
The technical innovation FGZ use is to employ bootstrap resampling on the observations ensemble of HadCRUT4 and an ensemble of runs of 38 CMIP5 climate models to perform a two-sample comparison [Ch2008, Da2009, ]. In doing so, they explicitly assume, in the framework above, exchangeability of models. (Later, in the same work, they also make the same calculation assuming exchangeability of models and observations, an innovation too detailed for this present exposition.)
So, what is a bootstrap? In its simplest form, a bootstrap is a nonparametric, often robust, frequentist technique for sampling the distribution of a function of a set of population parameters, generally irrespective of the nature or complexity of that function, or the number of parameters. Since estimates of the variance of that function are themselves functions of population parameters, assuming the variance exists, the bootstrap can also be used to estimate the precision of the first set of samples, where “precision” is the reciprocal of variance. For more about the bootstrap, see the inset below..
In the case in question here, with FGZ, the bootstrap is being used to determine if the distribution of surface temperature trends as calculated from observations and the distribution of surface temperature trends as calculated from climate models for the same period have in fact similar means. This is done by examining differences of paired trends, one coming from an observation sample, one coming from a model sample, and assessing the degree of discrepancy based upon the variances of the observations trends distribution and of the models trends distribution.
The equations (6.1) and (6.2) can be rewritten:
moving the trends in internal variability to the left, calculated side. Both and are not directly observable. Without some additional assumptions, which are not explicitly given in the FGZ paper, such as
we can’t really be sure we’re seeing or , or at least less the mean of . The same applies to and . Here equations (6.5) and (6.6) describe internal variabilities as being multivariate but zero mean Gaussian random variables. and are covariances among models and among observations. FGZ essentially say these are diagonal with their statement “An implicit assumption is that sampling uncertainty in [observation trends] is independent of uncertainty due to internal variability and also independent of uncertainty in [model trends]“. They might not be so, but it is reasonable to suppose their diagonals are strong, and that there is a row-column exchange operator on these covariances which can produce banded matrices.
7. On Reconciliation
The centerpiece of the FGZ result is their Figure 1, reproduced here as Figure 15. Their conclusion, that climate models do not properly capture surface temperature observations for the given periods, is based upon the significant separation of the red density from the grey density, even when measuring that separation using pooled variances. But, surely, a remarkable feature of these graphs is not only the separation of the means of the two densities, but the marked difference in size of the variances of the two densities.
Why are climate models so less precise than HadCRUT4 observations? Moreover, why do climate models disagree with one another so dramatically? We cannot tell without getting into CMIP5 details, but the same result could be obtained if the climate models came in three Gaussian populations, each with a variance 1.5x that of the observations, but mixed together. We could also obtain the same result if, for some reason, the variance of HadCRUT4 was markedly understated.
That brings us back to the comments about HadCRUT4 made at the end of Section 3. HadCRUT4 is noted for “drop outs” in observations, where either the quality of an observation on a patch of Earth was poor or the observation was missing altogether for a certain month in history. (To be fair, both GISS and BEST have months where there is no data available, especially in early years of the record.) It also has incomplete coverage [Co2013]. Whether or not values for patches are imputed in some way, perhaps using spatial kriging, or whether or not supports to calculate trends are adjusted to avoid these omissions are decisions in use of these data which are critical to resolving the question [Co2013, Gl2011].
As seen in Section 5, what trends you get depends a lot on how they are done. FGZ did linear trends. These are nice because means of trends have simple relationships with the trends themselves. On the other hand, confining trend estimation to local linear trends binds these estimates to being only supported by pairs of actual samples, however sparse these may be. This has the unfortunate effect of producing a broadly spaced set of trends which, when averaged, appear to be a single, tight distribution, close to the vertical black line of Figure 14, but erasing all the detail available by estimating the density of trends with a robust function of the first time derivative of the series. FGZ might be improved by using such, repairing this drawback and also making it more robust against HadCRUT4’s inescapable data drops. As mentioned before, however, we really cannot know, because details of their calculations are not available. (Again, this author suspects this fault lies not with FGZ but a matter of page limits.)
In fact, that was indicated by a recent paper from Cowtan and Way, arguing that the limited coverage of HadCRUT4 might explain the discrepancy Fyfe, Gillet, and Zwiers found. In return Fyfe and Gillet argued that even admitting the corrections for polar regions which Cowtan and Way indicate, the CMIP5 models fall short in accounting for global mean surface temperatures. What could be wrong? In the context of ensemble forecasts depicting future states of the atmosphere, Wilks notes (Section 7.7.1):
Accordingly, the dispersion of a forecast ensemble can at best only approximate the [probability density function] of forecast uncertainty … In particular, a forecast ensemble may reflect errors both in statistical location (most or all ensemble members being well away from the actual state of the atmosphere, but relatively nearer to each other) and dispersion (either under- or overrepresenting the forecast uncertainty). Often, operational ensemble forecasts are found to exhibit too little dispersion …, which leads to overconfidence in probability assessment if ensemble relative frequencies are interpreted as estimating probabilities.
In fact, the IPCC reference, Toth, Palmer and others raise the same caution. It could be that the answer to why the variance of the observational data in the Fyfe, Gillet, and Zwiers graph depicted in Figure 15 is so small is that ensemble spread does not properly reflect the true probability density function of the joint distribution of temperatures across Earth. These might be “relatively nearer to each other” than the true dispersion which climate models are accommodating.
If Earth’s climate is thought of as a dynamical system, and taking note of the suggestion of Kharin that “There is basically one observational record in climate research”, we can do the following thought experiment. Suppose the total state of the Earth’s climate system can be captured at one moment in time, no matter how, and the climate can be reinitialized to that state at our whim, again no matter how. What happens if this is done several times, and then the climate is permitted to develop for, say, exactly 100 years on each “run”? What are the resulting states? Also suppose the dynamical “inputs” from the Sun, as a function of time, are held identical during that 100 years, as are dynamical inputs from volcanic forcings, as are human emissions of greenhouse gases. Are the resulting states copies of one another?
No. Stochastic variability in the operation of climate means these end states will be each somewhat different than one another. Then of what use is the “one observation record”? Well, it is arguably better than no observational record. And, in fact, this kind of variability is a major part of the “internal variability” which is often cited in these literature, including by FGZ.
Setting aside the problems of using local linear trends, FGZ’s bootstrap approach to the HadCRUT4 ensemble is an attempt to imitate these various runs of Earth’s climate. The trouble is, the frequentist bootstrap can only replicate values of observations actually seen. (See inset.) In this case, these replications are those of the HadCRUT4 ensembles. It will never produce values in-between and, as the parameters of temperature anomalies are in general continuous measures, allowing for in-between values seems a reasonable thing to do.
No algorithm can account for a dispersion which is not reflected in the variability of the ensemble. If the dispersion of HadCRUT4 is too small, it could be corrected using ensemble MOS methods (Section 7.7.1.) In any case, underdispersion could explain the remarkable difference in variances of populations seen in Figure 15. I think there’s yet another way.
Consider equations (6.1) and (6.2) again. Recall, here, denotes the model and denotes the run of model . Instead of , however, a bootstrap resampling of the HadCRUT4 ensembles, let run over all the 100 ensemble members provided, let run over the 2592 patches on Earth’s surface, and let run over the 1967 monthly time steps. Reformulate equations (6.1) and (6.2), instead, as
Now, is a common trend at time tick and and are deflections from from that trend due to modeling error and internal variability in the model, respectively, at time tick . Similarly, denotes deflections from the common trend baseline due to internal variability as seen by the HadCRUT4 observational data at time tick , and denotes the deflection from the common baseline due to sampling error in the patch at time tick . are indicator variables. This is the setup for an analysis of variance or ANOVA, preferably a Bayesian one (Sections 14.1.6, 18.1). In equation (7.1), successive model runs for model are used to estimate and for every . In equation (7.2), different ensemble members are used to estimate and for every . Coupling the two gives a common estimate of . There’s considerable flexibility in how model runs or ensemble members are used for this purpose, opportunities for additional differentiation and ability to incorporate information about relationships among models or among observations. For instance, models might be described relative to a Bayesian model average [Ra2005]. Observations might be described relative to a common or slowly varying spatial trend, reflecting dependencies among patches. Here, differences between observations and models get explicitly allocated to modeling error and internal variability for models, and sampling error and internal variability for observations.
More work needs to be done to assess the proper virtues of the FGZ technique, even without modification. A device like that Rohde used to compare BEST temperature observations with HadCRUT4 and GISS, one of supplying the FGZ procedure with synthetic data, would be perhaps the most informative regarding its character. Alternatively, if an ensemble MOS method were devised and applied to HadCRUT4, it might better reflect a true spread of possibilities. Because a dataset like HadCRUT4 records just one of many possible observational records the Earth might have exhibited, it would be useful to have a means of elaborating what those other possibilities were, given the single observational trace.
Regarding climate models, while they will inevitably disagree from a properly elaborated set of observations in the particulars of their statistics, in my opinion, the goal should be to strive to match the distributions of solutions these two instruments of study on their first few moments by improving both. While, statistical equivalence is all that’s sought, we’re not there yet. Assessing parametric uncertainty of observations hand-in-hand with the model builders seems to be a sensible route. Indeed, this is important. In review of the Cowtan and Way result, one based upon kriging, Kintisch summarizes the situation as reproduced in Table 1, a reproduction of his table on page 348 of the reference [Co2013, Gl2011, Ki2014]:
TEMPERATURE TRENDS | |
---|---|
1997-2012 | |
Source | Warming (/decade) |
Climate models | 0.102-0.412 |
NASA data set | 0.080 |
HadCRUT data set | 0.046 |
Cowtan/Way | 0.119 |
Table 1. Getting warmer.
New method brings measured temperatures closer to projections. Added in quotation: “Climate models” refers to the CMIP5 series. “NASA data set” is GISS. “HadCRUT data set” is HadCRUT4. “Cowtan/Way” is from their paper. Note values are per decade, not per year. |
Note that these estimates of trends, once divided by 10 years/decade to convert to a per year change in temperature, all fall well within the slope estimates depicted in the summary Figure 14. Note, too, how low the HadCRUT trend is.
If the FGZ technique, or any other, can contribute to this elucidation, it is most welcome.
As an example Lee reports how the GLOMAP model of aerosols was systematically improved using such careful statistical consideration. It seems likely to be a more rewarding way than “black box” treatments. Incidently, Dr Lindsay Lee’s article was runner-up in the Significance/Young Statisticians Section writers’ competition. It’s great to see bright young minds charging in to solve these problems!
The bootstrap is a general name for a resampling technique, most commonly associated with what is more properly called the frequentist bootstrap. Given a sample of observations, , the bootstrap principle says that in a wide class of statistics and for certain minimum sizes of , the sampling density of a statistic from a population of all , where is a single observation, can be approximated by the following procedure. Sample times with replacement to obtain samples each of size called , . For each , calculate so as to obtain . The set so obtained is an approximation of the sampling density of from a population of all . Note that because is sampled, only elements of that original set of observations will ever show up in any . This is true even if is drawn from an interval of the real numbers. This is where a Bayesian bootstrap might be more suitable.
In a Bayesian bootstrap, the set of possibilities to be sampled are specified using a prior distribution on [Da2009, Section 10.5]. A specific observation of , like , is use to update the probability density on , and then values from are drawn in proportion to this updated probability. Thus, values in never in might be drawn. Both bootstraps will, under similar conditions, preserve the sampling distribution of . |
8. Summary
Various geophysical datasets recording global surface temperature anomalies suggest a slowdown in anomalous global warming from historical baselines. Warming is increasing, but not as fast, and much of the media attention to this is reacting to the second time derivative of temperature, which is negative, not the first time derivative, its rate of increase. Explanations vary. In one important respect, 20 or 30 years is an insufficiently long time to assess the state of the climate system. In another, while the global surface temperature increase is slowing, oceanic temperatures continue to soar, at many depths. Warming might even decrease. None of these seem to pose a challenge to the geophysics of climate, which has substantial support both from experimental science and ab initio calculations. An interesting discrepancy is noted by Fyfe, Gillet, and Zwiers, although their calculation could be improved both by using a more robust estimator for trends, and by trying to integrate out anomalous temperatures due to internal variability in their models, because much of it is not separately observable. Nevertheless, Fyfe, Gillet, and Zwiers may have done the field a great service, making explicit a discrepancy which enables students of datasets like the important HadCRUT4 to discover an important limitation, that their dispersion across ensembles does not properly reflect the set of Earth futures which one might wish they did and, in their failure for users who think of the ensemble as representing such futures, give them a dispersion which is significantly smaller than what we might know.
The Azimuth Project can contribute, and I am planning subprojects to pursue my suggestions in Section 7, those of examining HadCRUT4 improvements using MOS ensembles, a Bayesian bootstrap, or the Bayesian ANOVA described there. Beyond trends in mean surface temperatures, there’s another more challenging statistical problem involving trends in sea levels which awaits investigation [Le2012b, Hu2010].
Working out these kinds of details is the process of science at its best, and many disciplines, not least mathematics, statistics, and signal processing, have much to contribute to the methods and interpretations of these series data. It is possible too much is being asked of a limited data set, and perhaps we have not yet observed enough of climate system response to say anything definitive. But the urgency to act responsibly given scientific predictions remains.
Bibliography
- Credentials. I have taken courses in geology from Binghamton University, but the rest of my knowledge of climate science is from reading the technical literature, principally publications from the American Geophysical Union and the American Meteorological Society, and self-teaching, from textbooks like Pierrehumbert. I seek to find ways where my different perspective on things canhelp advance and explain the climate science enterprise. I also apply my skills to working local environmental problems, ranging from inferring people’s use of energy in local municipalities, as well as studying things like trends in solid waste production at the same scales using Bayesian inversions. I am fortunate that techniques used in my professional work and those in these problems overlap so much. I am a member of the American Statistical Association, the American Geophysical Union, the American Meteorological Association, the International Society for Bayesian Analysis, as well as the IEEE and its signal processing society.
- [Yo2014] D. S. Young, “Bond. James Bond. A statistical look at cinema’s most famous spy”, CHANCE Magazine, 27(2), 2014, 21-27, http://chance.amstat.org/2014/04/james-bond/.
- [Ca2014a] S. Carson, Science of Doom, a Web site devoted to atmospheric radiation physics and forcings, last accessed 7 February 2014.
- [Pi2012] R. T. Pierrehumbert, Principles of Planetary Climate, Cambridge University Press, 2010, reprinted 2012.
- [Pi2011] R. T. Pierrehumbert, “Infrared radiative and planetary temperature”, Physics Today, January 2011, 33-38.
- [Pe2006] G. W. Petty, A First Course in Atmospheric Radiation, 2^{nd} edition, Sundog Publishing, 2006.
- [Le2012a] S. Levitus, J. I. Antonov, T. P. Boyer, O. K. Baranova, H. E. Garcia, R. A. Locarnini, A. V. Mishonov, J. R. Reagan, D. Seidov, E. S. Yarosh, and M. M. Zweng, “World ocean heat content and thermosteric sea level change (0-2000 m), 1955-2010″, Geophysical Research Letters, 39, L10603, 2012, http://dx.doi.org/10.1029/2012GL051106.
- [Le2012b] S. Levitus, J. I. Antonov, T. P. Boyer, O. K. Baranova, H. E. Garcia, R. A. Locarnini, A. V. Mishonov, J. R. Reagan, D. Seidov, E. S. Yarosh, and M. M. Zweng, “World ocean heat content and thermosteric sea level change (0-2000 m), 1955-2010: supplementary information”, Geophysical Research Letters, 39, L10603, 2012, http://onlinelibrary.wiley.com/doi/10.1029/2012GL051106/suppinfo.
- [Sm2009] R. L. Smith, C. Tebaldi, D. Nychka, L. O. Mearns, “Bayesian modeling of uncertainty in ensembles of climate models”, Journal of the American Statistical Association, 104(485), March 2009.
- Nomenclature. The nomenclature can be confusing. With respect to observations, variability arising due to choice of method is sometimes called structural uncertainty [Mo2012, Th2005].
- [Kr2014] J. P. Krasting, J. P. Dunne, E. Shevliakova, R. J. Stouffer (2014), “Trajectory sensitivity of the transient climate response to cumulative carbon emissions”, Geophysical Research Letters, 41, 2014, http://dx.doi.org/10.1002/2013GL059141.
- [Sh2014a] D. T. Shindell, “Inhomogeneous forcing and transient climate sensitivity”, Nature Climate Change, 4, 2014, 274-277, http://dx.doi.org/10.1038/nclimate2136.
- [Sh2014b] D. T. Shindell, “Shindell: On constraining the Transient Climate Response”, RealClimate, http://www.realclimate.org/index.php?p=17134, 8 April 2014.
- [Sa2011] B. M. Sanderson, B. C. O’Neill, J. T. Kiehl, G. A. Meehl, R. Knutti, W. M. Washington, “The response of the climate system to very high greenhouse gas emission scenarios”, Environmental Research Letters, 6, 2011, 034005,
http://dx.doi.org/10.1088/1748-9326/6/3/034005. - [Em2011] K. Emanuel, “Global warming effects on U.S. hurricane damage”, Weather, Climate, and Society, 3, 2011, 261-268, http://dx.doi.org/10.1175/WCAS-D-11-00007.1.
- [Sm2011] L. A. Smith, N. Stern, “Uncertainty in science and its role in climate policy”, Philosophical Transactions of the Royal Society A, 269, 2011 369, 1-24, http://dx.doi.org/10.1098/rsta.2011.0149.
- [Le2010] M. C. Lemos, R. B. Rood, “Climate projections and their impact on policy and practice”, WIREs Climate Change, 1, September/October 2010, http://dx.doi.org/10.1002/wcc.71.
- [Sc2014] G. A. Schmidt, D. T. Shindell, K. Tsigaridis, “Reconciling warming trends”, Nature Geoscience, 7, 2014, 158-160, http://dx.doi.org/10.1038/ngeo2105.
- [Be2013] “Examining the recent “pause” in global warming”, Berkeley Earth Memo, 2013, http://static.berkeleyearth.org/memos/examining-the-pause.pdf.
- [Mu2013a] R. A. Muller, J. Curry, D. Groom, R. Jacobsen, S. Perlmutter, R. Rohde, A. Rosenfeld, C. Wickham, J. Wurtele, “Decadal variations in the global atmospheric land temperatures”, Journal of Geophysical Research: Atmospheres, 118 (11), 2013, 5280-5286, http://dx.doi.org/10.1002/jgrd.50458.
- [Mu2013b] R. Muller, “Has global warming stopped?”, Berkeley Earth Memo, September 2013, http://static.berkeleyearth.org/memos/has-global-warming-stopped.pdf.
- [Br2006] P. Brohan, J. Kennedy, I. Harris, S. Tett, P. D. Jones, “Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850″, Journal of Geophysical Research—Atmospheres, 111(D12), 27 June 2006, http://dx.doi.org/10.1029/2005JD006548.
- [Co2013] K. Cowtan, R. G. Way, “Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends”, Quarterly Journal of the Royal Meteorological Society, 2013, http://dx.doi.org/10.1002/qj.2297.
- [Fy2013] J. C. Fyfe, N. P. Gillett, F. W. Zwiers, “Overestimated global warming over the past 20 years”, Nature Climate Change, 3, September 2013, 767-769, and online at http://dx.doi.org/10.1038/nclimate1972.
- [Ha2013] E. Hawkins, “Comparing global temperature observations and simulations, again”, Climate Lab Book, http://www.climate-lab-book.ac.uk/2013/comparing-observations-and-simulations-again/, 28 May 2013.
- [Ha2014] A. Hannart, A. Ribes, P. Naveau, “Optimal fingerprinting under multiple sources of uncertainty”, Geophysical Research Letters, 41, 2014, 1261-1268, http://dx.doi.org/10.1002/2013GL058653.
- [Ka2013a] R. W. Katz, P. F. Craigmile, P. Guttorp, M. Haran, Bruno Sansó, M.L. Stein, “Uncertainty analysis in climate change assessments”, Nature Climate Change, 3, September 2013, 769-771 (“Commentary”).
- [Sl2013] J. Slingo, “Statistical models and the global temperature record”, Met Office, May 2013, http://www.metoffice.gov.uk/media/pdf/2/3/Statistical_Models_Climate_Change_May_2013.pdf.
- [Tr2013] K. Trenberth, J. Fasullo, “An apparent hiatus in global warming?”, Earth’s Future, 2013,
http://dx.doi.org/10.1002/2013EF000165. - [Mo2012] C. P. Morice, J. J. Kennedy, N. A. Rayner, P. D. Jones, “Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 data set”, Journal of Geophysical Research, 117, 2012, http://dx.doi.org/10.1029/2011JD017187. See also http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/download.html where the 100 ensembles can be found.
- [Sa2012] B. D. Santer, J. F. Painter, C. A. Mears, C. Doutriaux, P. Caldwell, J. M. Arblaster, P. J. Cameron-Smith, N. P. Gillett, P. J. Gleckler, J. Lanzante, J. Perlwitz, S. Solomon, P. A. Stott, K. E. Taylor, L. Terray, P. W. Thorne, M. F. Wehner, F. J. Wentz, T. M. L. Wigley, L. J. Wilcox, C.-Z. Zou, “Identifying human infuences on atmospheric temperature”, Proceedings of the National Academy of Sciences, 29 November 2012, http://dx.doi.org/10.1073/pnas.1210514109.
- [Ke2011a] J. J. Kennedy, N. A. Rayner, R. O. Smith, D. E. Parker, M. Saunby, “Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850, part 1: measurement and sampling uncertainties”, Journal of Geophysical Research: Atmospheres (1984-2012), 116(D14), 27 July 2011, http://dx.doi.org/10.1029/2010JD015218.
- [Kh2008a] S. Kharin, “Statistical concepts in climate research: Some misuses of statistics in climatology”, Banff Summer School, 2008, part 1 of 3. Slide 7, “Climatology is a one-experiment science. There is basically one observational record in climate”, http://www.atmosp.physics.utoronto.ca/C-SPARC/ss08/lectures/Kharin-lecture1.pdf.
- [Kh2008b] S. Kharin, “Climate Change Detection and Attribution: Bayesian view”, Banff Summer School, 2008, part 3 of 3, http://www.atmosp.physics.utoronto.ca/C-SPARC/ss08/lectures/Kharin-lecture3.pdf.
- [Le2005] T. C. K. Lee, F. W. Zwiers, G. C. Hegerl, X. Zhang, M. Tsao, “A Bayesian climate change detection and attribution assessment”, Journal of Climate, 18, 2005, 2429-2440.
- [De1982] M. H. DeGroot, S. Fienberg, “The comparison and evaluation of forecasters”, The Statistician, 32(1-2), 1983, 12-22.
- [Ro2013a] R. Rohde, R. A. Muller, R. Jacobsen, E. Muller, S. Perlmutter, A. Rosenfeld, J. Wurtele, D. Groom, C. Wickham, “A new estimate of the average Earth surface land temperature spanning 1753 to 2011″, Geoinformatics & Geostatistics: An Overview, 1(1), 2013, http://dx.doi.org/10.4172/2327-4581.1000101.
- [Ke2011b] J. J. Kennedy, N. A. Rayner, R. O. Smith, D. E. Parker, M. Saunby, “Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850, part 2: Biases and homogenization”, Journal of Geophysical Research: Atmospheres (1984-2012), 116(D14), 27 July 2011, http://dx.doi.org/10.1029/2010JD015220.
- [Ro2013b] R. Rohde, “Comparison of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques on ideal synthetic data”, Berkeley Earth Memo, January 2013, http://static.berkeleyearth.org/memos/robert-rohde-memo.pdf.
- [En2014] M. H. England, S. McGregor, P. Spence, G. A. Meehl, A. Timmermann, W. Cai, A. S. Gupta, M. J. McPhaden, A. Purich, A. Santoso, “Recent intensification of wind-driven circulation in the Pacific and the ongoing warming hiatus”, Nature Climate Change, 4, 2014, 222-227, http://dx.doi.org/10.1038/nclimate2106. See also http://www.realclimate.org/index.php/archives/2014/02/going-with-the-wind/.
- [Fy2014] J. C. Fyfe, N. P. Gillett, “Recent observed and simulated warming”, Nature Climate Change, 4, March 2014, 150-151, http://dx.doi.org/10.1038/nclimate2111.
- [Ta2013] Tamino, “el Niño and the Non-Spherical Cow”, Open Mind blog, http://tamino.wordpress.com/2013/09/02/el-nino-and-the-non-spherical-cow/, 2 September 2013.
- [Fy2013s] Supplement to J. C. Fyfe, N. P. Gillett, F. W. Zwiers, “Overestimated global warming over the past 20 years”, Nature Climate Change, 3, September 2013, online at http://www.nature.com/nclimate/journal/v3/n9/extref/nclimate1972-s1.pdf.
- Ionizing. There are tiny amounts of heating due to impinging ionizing radiation from space, and changes in Earth’s magnetic field.
- [Ki1997] J. T. Kiehl, K. E. Trenberth, “Earth’s annual global mean energy budget”, Bulletin of the American Meteorological Society, 78(2), 1997, http://dx.doi.org/10.1175/1520-0477(1997)0782.0.CO;2.
- [Tr2009] K. Trenberth, J. Fasullo, J. T. Kiehl, “Earth’s global energy budget”, Bulletin of the American Meteorological Society, 90, 2009, 311–323, http://dx.doi.org/10.1175/2008BAMS2634.1.
- [IP2013] IPCC, 2013: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Stocker, T.F., D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P.M. Midgley (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 1535 pp. Also available online at https://www.ipcc.ch/report/ar5/wg1/.
- [Ve2012] A. Vehtari, J. Ojanen, “A survey of Bayesian predictive methods for model assessment, selection and comparison”, Statistics Surveys, 6 (2012), 142-228, http://dx.doi.org/10.1214/12-SS102.
- [Ge1998] J. Geweke, “Simulation Methods for Model Criticism and Robustness Analysis”, in Bayesian Statistics 6, J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith (eds.), Oxford University Press, 1998.
- [Co2006] P. Congdon, Bayesian Statistical Modelling, 2^{nd} edition, John Wiley & Sons, 2006.
- [Fe2011b] D. Ferreira, J. Marshall, B. Rose, “Climate determinism revisited: Multiple equilibria in a complex climate model”, Journal of Climate, 24, 2011, 992-1012, http://dx.doi.org/10.1175/2010JCLI3580.1.
- [Bu2002] K. P. Burnham, D. R. Anderson, Model Selection and Multimodel Inference, 2^{nd} edition, Springer-Verlag, 2002.
- [Ea2014a] S. Easterbrook, “What Does the New IPCC Report Say About Climate Change? (Part 4): Most of the heat is going into the oceans”, 11 April 2014, at the Azimuth blog, http://johncarlosbaez.wordpress.com/2014/04/11/what-does-the-new-ipcc-report-say-about-climate-change-part-4/.
- [Ko2014] Y. Kostov, K. C. Armour, and J. Marshall, “Impact of the Atlantic meridional overturning circulation on ocean heat storage and transient climate change”, Geophysical Research Letters, 41, 2014, 2108–2116, http://dx.doi.org/10.1002/2013GL058998.
- [Me2011] G. A. Meehl, J. M. Arblaster, J. T. Fasullo, A. Hu.K. E. Trenberth, “Model-based evidence of deep-ocean heat uptake during surface-temperature hiatus periods”, Nature Climate Change, 1, 2011, 360–364, http://dx.doi.org/10.1038/nclimate1229.
- [Me2013] G. A. Meehl, A. Hu, J. M. Arblaster, J. Fasullo, K. E. Trenberth, “Externally forced and internally generated decadal climate variability associated with the Interdecadal Pacific Oscillation”, Journal of Climate, 26, 2013, 7298–7310, http://dx.doi.org/10.1175/JCLI-D-12-00548.1.
- [Ha2010] J. Hansen, R. Ruedy, M. Sato, and K. Lo, “Global surface temperature change”, Reviews of Geophysics, 48(RG4004), 2010, http://dx.doi.org/10.1029/2010RG000345.
- [GISS-BEST] 3.667 (GISS) versus 3.670 (BEST).
- Spar. The smoothing parameter is a constant which weights a penalty term proportional to the second directional derivative of the curve. The effect is that if a candidate spline is chosen which is very bumpy, this candidate is penalized and will only be chosen if the data demands it. There is more said about choice of such parameters in the caption of Figure 12.
- [Ea2009] D. R. Easterling, M. F. Wehner, “Is the climate warming or cooling?”, Geophysical Research Letters, 36, L08706, 2009, http://dx.doi.org/10.1029/2009GL037810.
- Hiatus. The term hiatus has a formal meaning in climate science, as described by the IPCC itself (Box TS.3).
- [Ea2000] D. J. Easterbrook, D. J. Kovanen, “Cyclical oscillation of Mt. Baker glaciers in response to climatic changes and their correlation with periodic oceanographic changes in the northeast Pacific Ocean”, 32, 2000, Proceedings of the Geological Society of America, Abstracts with Program, page 17, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2001] D. J. Easterbrook, “The next 25 years: global warming or global cooling? Geologic and oceanographic evidence for cyclical climatic oscillations”, 33, 2001, Proceedings of the Geological Society of America, Abstracts with Program, page 253, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2005] D. J. Easterbrook, “Causes and effects of abrupt, global, climate changes and global warming”, Proceedings of the Geological Society of America, 37, 2005, Abstracts with Program, page 41, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2006a] D. J. Easterbrook, “The cause of global warming and predictions for the coming century”, Proceedings of the Geological Society of America, 38(7), Astracts with Programs, page 235, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2006b] D. J. Easterbrook, 2006b, “Causes of abrupt global climate changes and global warming predictions for the coming century”, Proceedings of the Geological Society of America, 38, 2006, Abstracts with Program, page 77, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2007] D. J. Easterbrook, “Geologic evidence of recurring climate cycles and their implications for the cause of global warming and climate changes in the coming century”, Proceedings of the Geological Society of America, 39(6), Abstracts with Programs, page 507, http://myweb.wwu.edu/dbunny/pdfs/dje_abstracts.pdf, abstract reviewed 23 April 2014.
- [Ea2008] D. J. Easterbrook, “Correlation of climatic and solar variations over the past 500 years and predicting global climate changes from recurring climate cycles”, Proceedings of the International Geological Congress, 2008, Oslo, Norway.
- [Wi2007] J. K. Willis, J. M. Lyman, G. C. Johnson, J. Gilson, “Correction to ‘Recent cooling of the upper ocean”‘, Geophysical Research Letters, 34, L16601, 2007, http://dx.doi.org/10.1029/2007GL030323.
- [Ra2006] N. Rayner, P. Brohan, D. Parker, C. Folland, J. Kennedy, M. Vanicek, T. Ansell, S. Tett, “Improved analyses of changes and uncertainties in sea surface temperature measured in situ since the mid-nineteenth century: the HadSST2 dataset”, Journal of Climate, 19, 1 February 2006, http://dx.doi.org/10.1175/JCLI3637.1.
- [Pi2006] R. Pielke, Sr, “The Lyman et al paper ‘Recent cooling in the upper ocean’ has been published”, blog entry, September 29, 2006, 8:09 AM, https://pielkeclimatesci.wordpress.com/2006/09/29/the-lyman-et-al-paper-recent-cooling-in-the-upper-ocean-has-been-published/, last accessed 24 April 2014.
- [Ko2013] Y. Kosaka, S.-P. Xie, “Recent global-warming hiatus tied to equatorial Pacific surface cooling”, Nature, 501, 2013, 403–407, http://dx.doi.org/10.1038/nature12534.
- [Ke1998] C. D. Keeling, “Rewards and penalties of monitoring the Earth”, Annual Review of Energy and the Environment, 23, 1998, 25–82, http://dx.doi.org/10.1146/annurev.energy.23.1.25.
- [Wa1990] G. Wahba, Spline Models for Observational Data, Society for Industrial and Applied Mathematics (SIAM), 1990.
- [Go1979] G. H. Golub, M. Heath, G. Wahba, “Generalized cross-validation as a method for choosing a good ridge parameter”, Technometrics, 21(2), May 1979, 215-223, http://www.stat.wisc.edu/~wahba/ftp1/oldie/golub.heath.wahba.pdf.
- [Cr1979] P. Craven, G. Wahba, “Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation”, Numerische Mathematik, 31, 1979, 377-403, http://www.stat.wisc.edu/~wahba/ftp1/oldie/craven.wah.pdf.
- [Sa2013] S. Särkkä, Bayesian Filtering and Smoothing, Cambridge University Press, 2013.
- [Co2009] P. S. P. Cowpertwait, A. V. Metcalfe, Introductory Time Series With R, Springer, 2009.
- [Ko2005] R. Koenker, Quantile Regression, Cambridge University Press, 2005.
- [Du2012] J. Durbin, S. J. Koopman, Time Series Analysis by State Space Methods, Oxford University Press, 2012.
- Process variance. Here, the process variance was taken here to be of the observations variance.
- Probabilities. “In this Report, the following terms have been used to indicate the assessed likelihood of an outcome or a result: Virtually certain 99-100% probability, Very likely 90-100%, Likely 66-100%, About as likely as not 33-66$%, Unlikely 0-33%, Very unlikely 0-10%, Exceptionally unlikely 0-1%. Additional terms (Extremely likely: 95-100%, More likely than not 50-100%, and Extremely unlikely 0-5%) may also be used when appropriate. Assessed likelihood is typeset in italics, e.g., very likely (see Section 1.4 and Box TS.1 for more details).”
- [Ki2013] E. Kintsch, “Researchers wary as DOE bids to build sixth U.S. climate model”, Science 341 (6151), 13 September 2013, page 1160, http://dx.doi.org/10.1126/science.341.6151.1160.
- Inez Fung. “It’s great there’s a new initiative,” says modeler Inez Fung of DOE’s Lawrence Berkeley National Laboratory and the University of California, Berkeley. “But all the modeling efforts are very short-handed. More brains working on one set of code would be better than working separately””.
- Exchangeability. Exchangeability is a weaker assumption than independence. Random variables are exchangeable if their joint distribution only depends upon the set of variables, and not their order [Di1977, Di1988, Ro2013c]. Note the caution in Coolen.
- [Di1977] P. Diaconis, “Finite forms of de Finetti’s theorem on exchangeability”, Synthese, 36, 1977, 271-281.
- [Di1988] P. Diaconis, “Recent progress on de Finetti’s notions of exchangeability”, Bayesian Statistics, 3, 1988, 111-125.
- [Ro2013c] J.C. Rougier, M. Goldstein, L. House, “Second-order exchangeability analysis for multi-model ensembles”, Journal of the American Statistical Association, 108, 2013, 852-863, http://dx.doi.org/10.1080/01621459.2013.802963.
- [Co2005] F. P. A. Coolen, “On nonparametric predictive inference and objective Bayesianism”, Journal of Logic, Language and Information, 15, 2006, 21-47, http://dx.doi.org/10.1007/s10849-005-9005-7. (“Generally, though, both for frequentist and Bayesian approaches, statisticians are often happy to assume exchangeability at the prior stage. Once data are used in combination with model assumptions, exchangeability no longer holds ‘post-data’ due to the influence of modelling assumptions, which effectively are based on mostly subjective input added to the information from the data.”).
- [Ch2008] M. R. Chernick, Bootstrap Methods: A Guide for Practitioners and Researches, 2^{nd} edition, 2008, John Wiley & Sons.
- [Da2009] A. C. Davison, D. V. Hinkley, Bootstrap Methods and their Application, first published 1997, 11^{th} printing, 2009, Cambridge University Press.
- [Mu2007] M. Mudelsee, M. Alkio, “Quantifying effects in two-sample environmental experiments using bootstrap condidence intervals”, Environmental Modelling and Software, 22, 2007, 84-96, http://dx.doi.org/10.1016/j.envsoft.2005.12.001.
- [Wi2011] D. S. Wilks, Statistical Methods in the Atmospheric Sciences, 3^{rd} edition, 2011, Academic Press.
- [Pa2006] T. N. Palmer, R. Buizza, R. Hagedon, A. Lawrence, M. Leutbecher, L. Smith, “Ensemble prediction: A pedagogical perspective”, ECMWF Newsletter, 106, 2006, 10–17.
- [To2001] Z. Toth, Y. Zhu, T. Marchok, “The use of ensembles to identify forecasts with small and large uncertainty”, Weather and Forecasting, 16, 2001, 463–477, http://dx.doi.org/10.1175/1520-0434(2001)0162.0.CO;2.
- [Le2013a] L. A. Lee, K. J. Pringle, C. I. Reddington, G. W. Mann, P. Stier, D. V. Spracklen, J. R. Pierce, K. S. Carslaw, “The magnitude and causes of uncertainty in global model simulations of cloud condensation nuclei”, Atmospheric Chemistry and Physics Discussion, 13, 2013, 6295-6378, http://www.atmos-chem-phys.net/13/9375/2013/acp-13-9375-2013.pdf.
- [Gl2011] D. M. Glover, W. J. Jenkins, S. C. Doney, Modeling Methods for Marine Science, Cambridge University Press, 2011.
- [Ki2014] E. Kintisch, “Climate outsider finds missing global warming”, Science, 344 (6182), 25 April 2014, page 348, http://dx.doi.org/10.1126/science.344.6182.348.
- [GL2011] D. M. Glover, W. J. Jenkins, S. C. Doney, Modeling Methods for Marine Science, Cambridge University Press, 2011, Chapter 7.
- [Le2013b] L. A. Lee, “Uncertainties in climate models: Living with uncertainty in an uncertain world”, Significance, 10(5), October 2013, 34-39, http://dx.doi.org/10.1111/j.1740-9713.2013.00697.x.
- [Ur2014] N. M. Urban, P. B. Holden, N. R. Edwards, R. L. Sriver, K. Keller, “Historical and future learning about climate sensitivity”, Geophysical Research Letters, 41, http://dx.doi.org/10.1002/2014GL059484.
- [Th2005] P. W. Thorne, D. E. Parker, J. R. Christy, C. A. Mears, “Uncertainties in climate trends: Lessons from upper-air temperature records”, Bulletin of the American Meteorological Society, 86, 2005, 1437-1442, http://dx.doi.org/10.1175/BAMS-86-10-1437.
- [Fr2008] C. Fraley, A. E. Raftery, T. Gneiting, “Calibrating multimodel forecast ensembles with exchangeable and missing members using Bayesian model averaging”, Monthly Weather Review. 138, January 2010, http://dx.doi.org/10.1175/2009MWR3046.1.
- [Ow2001] A. B. Owen, Empirical Likelihood, Chapman & Hall/CRC, 2001.
- [Al2012] M. Aldrin, M. Holden, P. Guttorp, R. B. Skeie, G. Myhre, T. K. Berntsen, “Bayesian estimation of climate sensitivity based on a simple climate model fitted to observations of hemispheric temperatures and global ocean heat content”, Environmentrics, 2012, 23, 253-257, http://dx.doi.org/10.1002/env.2140.
- [AS2007] “ASA Statement on Climate Change”, American Statistical Association, ASA Board of Directors, adopted 30 November 2007, http://www.amstat.org/news/climatechange.cfm, last visited 13 September 2013.
- [Be2008] L. M. Berliner, Y. Kim, “Bayesian design and analysis for superensemble-based climate forecasting”, Journal of Climate, 21, 1 May 2008, http://dx.doi.org/10.1175/2007JCLI1619.1.
- [Fe2011a] X. Feng, T. DelSole, P. Houser, “Bootstrap estimated seasonal potential predictability of global temperature and precipitation”, Geophysical Research Letters, 38, L07702, 2011, http://dx.doi.org/10.1029/2010GL046511.
- [Fr2013] P. Friedlingstein, M. Meinshausen, V. K. Arora, C. D. Jones, A. Anav, S. K. Liddicoat, R. Knutti, “Uncertainties in CMIP5 climate projections due to carbon cycle feedbacks”, Journal of Climate, 2013, http://dx.doi.org/10.1175/JCLI-D-12-00579.1.
- [Ho2003] T. J. Hoar, R. F. Milliff, D. Nychka, C. K. Wikle, L. M. Berliner, “Winds from a Bayesian hierarchical model: Computations for atmosphere-ocean research”, Journal of Computational and Graphical Statistics, 12(4), 2003, 781-807, http://www.jstor.org/stable/1390978.
- [Jo2013] V. E. Johnson, “Revised standards for statistical evidence”, Proceedings of the National Academy of Sciences, 11 November 2013, http://dx.doi.org/10.1073/pnas.1313476110, published online before print.
- [Ka2013b] J. Karlsson, J., Svensson, “Consequences of poor representation of Arctic sea-ice albedo and cloud-radiation interactions in the CMIP5 model ensemble”, Geophysical Research Letters, 40, 2013, 4374-4379, http://dx.doi.org/10.1002/grl.50768.
- [Kh2002] V. V. Kharin, F. W. Zwiers, “Climate predictions with multimodel ensembles”, Journal of Climate, 15, 1 April 2002, 793-799.
- [Kr2011] J. K. Kruschke, Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Academic Press, 2011.
- [Li2008] X. R. Li, X.-B. Li, “Common fallacies in hypothesis testing”, Proceedings of the 11^{th} IEEE International Conference on Information Fusion, 2008, New Orleans, LA.
- [Li2013] J.-L. F. Li, D. E. Waliser, G. Stephens, S. Lee, T. L’Ecuyer, S. Kato, N. Loeb, H.-Y. Ma, “Characterizing and understanding radiation budget biases in CMIP3/CMIP5 GCMs, contemporary GCM, and reanalysis”, Journal of Geophysical Research: Atmospheres, 118, 2013, 8166-8184, http://dx.doi.org/10.1002/jgrd.50378.
- [Ma2013b] E. Maloney, S. Camargo, E. Chang, B. Colle, R. Fu, K. Geil, Q. Hu, x. Jiang, N. Johnson, K. Karnauskas, J. Kinter, B. Kirtman, S. Kumar, B. Langenbrunner, K. Lombardo, L. Long, A. Mariotti, J. Meyerson, K. Mo, D. Neelin, Z. Pan, R. Seager, Y. Serra, A. Seth, J. Sheffield, J. Stroeve, J. Thibeault, S. Xie, C. Wang, B. Wyman, and M. Zhao, “North American Climate in CMIP5 Experiments: Part III: Assessment of 21st Century Projections”, Journal of Climate, 2013, in press, http://dx.doi.org/10.1175/JCLI-D-13-00273.1.
- [Mi2007] S.-K. Min, D. Simonis, A. Hense, “Probabilistic climate change predictions applying Bayesian model averaging”, Philosophical Transactions of the Royal Society, Series A, 365, 15 August 2007, http://dx.doi.org/10.1098/rsta.2007.2070.
- [Ni2001] N. Nicholls, “The insignificance of significance testing”, Bulletin of the American Meteorological Society, 82, 2001, 971-986.
- [Pe2008] G. Pennello, L. Thompson, “Experience with reviewing Bayesian medical device trials”, Journal of Biopharmaceutical Statistics, 18(1), 81-115).
- [Pl2013] M. Plummer, “Just Another Gibbs Sampler”, JAGS, 2013. Plummer describes this in greater detail at “JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling”, Proceedings of the 3^{rd} International Workshop on Distributed Statistical Computing (DSC 2003), 20-22 March 2003, Vienna. See also M. J. Denwood, [in review] “runjags: An R package providing interface utilities, parallel computing methods and additional distributions for MCMC models in JAGS”, Journal of Statistical Software, and http://cran.r-project.org/web/packages/runjags/. See also J. Kruschke, “Another reason to use JAGS instead of BUGS”, http://doingbayesiandataanalysis.blogspot.com/2012/12/another-reason-to-use-jags-instead-of.html, 21 December 2012.
- [Po1994] D. N. Politis, J. P. Romano, “The Stationary Bootstrap”, Journal of the American Statistical Association, 89(428), 1994, 1303-1313, http://dx.doi.org/10.1080/01621459.1994.10476870.
- [Sa2002] C.-E. Särndal, B. Swensson, J. Wretman, Model Assisted Survey Sampling, Springer, 1992.
- [Ta2012] K. E. Taylor, R.J. Stouffer, G.A. Meehl, “An overview of CMIP5 and the experiment design”, Bulletin of the American Meteorological Society, 93, 2012, 485-498, http://dx.doi.org/10.1175/BAMS-D-11-00094.1.
- [To2013] A. Toreti, P. Naveau, M. Zampieri, A. Schindler, E. Scoccimarro, E. Xoplaki, H. A. Dijkstra, S. Gualdi, J, Luterbacher, “Projections of global changes in precipitation extremes from CMIP5 models”, Geophysical Research Letters, 2013, http://dx.doi.org/10.1002/grl.50940.
- [WC2013] World Climate Research Programme (WCRP), “CMIP5: Coupled Model Intercomparison Project”, http://cmip-pcmdi.llnl.gov/cmip5/, last visited 13 September 2013.
- [We2011] M. B. Westover, K. D. Westover, M. T. Bianchi, “Significance testing as perverse probabilistic reasoning”, BMC Medicine, 9(20), 2011, http://www.biomedcentral.com/1741-7015/9/20.
- [Zw2004] F. W. Zwiers, H. Von Storch, “On the role of statistics in climate research”, International Journal of Climatology, 24, 2004, 665-680.
- [Ra2005] A. E. Raftery, T. Gneiting , F. Balabdaoui , M. Polakowski, “Using Bayesian model averaging to calibrate forecast ensembles”, Monthly Weather Review, 133, 1155–1174, http://dx.doi.org/10.1175/MWR2906.1.
- [Ki2010] G. Kitagawa, Introduction to Time Series Modeling, Chapman & Hall/CRC, 2010.
- [Hu2010] C. W. Hughes, S. D. P. Williams, “The color of sea level: Importance of spatial variations in spectral shape for assessing the significance of trends”, Journal of Geophysical Research, 115, C10048, 2010, http://dx.doi.org/10.1029/2010JC006102.
Great article. Thanks for the many references. (*) Will need quite some time to study.
But…
The pictures look fishy to me: Math overkill! (Well except the figure 8: I strongly suspect the Keeling curve cries for exponential, not linear.)
I bet Figures 12 and 13 will look ridiculous in a few years: I don’t need math nor climate models to see it is a good bet. It suffices to eyeball http://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A2.gif (use a transparent ruler to draw the trend line from circa 1970 (!!!)). (I note there’s a big El Nino in the making, so I can safely bet on early 2016. Anyhow, one can guess from inspecting Fig.A2.gif that a fluctuation upwards is to be expected soon.)
Methinks that linear trend line from 1970 is all you can seriously infer and discuss, given the data. Methinks any rumors of a hiatus or slowdown are pure statistical nonsense. See http://tamino.wordpress.com/2014/01/30/global-temperature-the-post-1998-surprise/
I find the FGZ Figure 1 particularly suspicious: Using 19 resp 14 years of data to check climate models? Perhaps I’m missing something, but Fig.A2.gif tells me they should include some more of those speed bumps visualized by the red line.
———-
(*) I could have needed this article, at least the inspiration and references, 2 years back: I had 1/2y stint doing the daily electricity load prognosis for a German company that shall not be named. At the end I didn’t try to improve their Kalman filters, but did things using Excel by eyeball selection of suitable historical data and some weighted average tuned by hand. That beat the Kalman filters (and just took 1 hour longer). Further improvement would have needed more accurate solar prognosis (not just Munich airport for all of Bavaria) – but they decided to ditch the job, continue dragging their heels in the German Energiewende, and waste a million something peanuts…
Thanks, Martin. Well, the particular Kalman-RTS smoother in use here, the local level model, is especially simple, and has a state with but one component and coefficients of unity. It’s often used in conjunction with Nile River data to try to pinpoint the structural break which corresponds to the construction of the Aswan HIgh Dam, both as a tutorial device and for serious work. I wouldn’t call it “math overkill”, but that’s personal taste.
Of course the other techniques for fitting the temperature series, like Figure 9, are intended to be illustrations only.
I don’t think FGZ should be downplayed in their contribution. Their assessment is based upon the entire HadCRUT4 ensemble, 100 separate variations on the surface temperature theme. Their work and paper can be criticized for sure, but it demands listening to, in my opinion.
Echoing a post-article synopsis from the first installment, the smoothing spline is just one of many possibilities for estimating trends, and that breadth of possibility is in itself a problem. Without some criterion like the AIC, DIC, or WAIC mentioned in the comments for the first installment, or Bayes factors, it’s not clear which model is preferable. Moreover, I hope to make the case that taking a particular observational time series for temperature as here, even if it is global, and picking an “absolutely best model” is a kind of overfitting in a way which the correction terms in xICs don’t and cannot compensate. Namely, a single temperature history, even if global, is in many ways just a single observation of many possible ones. That more than anything else has convinced me that the only reasonable approach to this business is a Bayesian one, where we build a likelihood carefully, using the best we know of physics and conceivably including empirical components, posit a number of informed priors, take the single observation history, and produce a time-by-space-by-probability surface or posterior.
While awaiting a medical test this morning, I thought up and convinced myself I’m doing to do a follow-up article, at least for arXiv.org, which does a straight Bayesian assessment of two things in these data (and conceivably others):
(1) trends, and
(2) the support in time used to estimate them
This will be a Bayesian hierarchical model with priors on things like the bandwidth of the support estimating a trend, and the usual hyperpriors. I’ll probably use JAGS.
Some comments on part 1 suggested that El Ninos could be seen in the data, and perhaps should be removed somehow. I think this is a case of people seeing what they expect to see.
I used the full data set (1880-2012), removed the long term trend using lowess (which gives a very smoothed result) and plotted the residuals. I also plotted, twice, random Gaussian noise with the same standard deviation. The results are here (click to enlarge):
One of them is El Niños plus volcanoes plus whatever. The other two are pure noise. I can’t tell which is which, can you?
I’ll try to put the code below.
Are the unlabeled datasets corresponding to the plots available someplace? This is to avoid the possible but delaying step of having to digitize the figures.
I’m not sure exactly what you want, but I’ve uploaded the R code to the same page as the graphs (the code is mostly yours with my bit above).
I understand now, Graham. Thanks. Sorry for the bother of the reply. I thought the data were a special case of someplace else, not just a replication produced by the quoted code.
Graham wrote:
I don’t understand your point. If El Niños and volcanos can be seen in the temperature data, one of your graphs should be more correlated to the Southern Oscillation Index than the other two. Are you saying we should guess which one that is without looking at the Southern Oscillation Index?
It seems more fair to run some sort of correlation with the Southern Oscillation Index, to see if the real residuals are more strongly correlated than random noise is.
But I can’t resist a challenge, even an unfair one. The second of your three graphs looks different than the other two: it seems to show more low-frequency oscillations and less high-frequency jitter. Is that the real one?
The ENSO signal definitely resides in the global temperature series.
That is bioth good and bad. Good in that it can be removed from the series, revealing a more secular warming trend. But it is bad in the sense that it is hard to predict what ENSO will do in the future.
John wrote: “Are you saying we should guess which one that is without looking at the Southern Oscillation Index?”
That was the game I had in mind, yes. And ‘game’ is a good word, as you said on the forum.
I started off with a more serious idea, that is, how to do the kind of curve fitting that Jan has been doing, ‘properly’. I don’t have much experience with time-series, but on general principles, I would start with a model for the signal, and a model for the noise. These models provide your definition of ‘signal’ and ‘noise’. In this case the signal we’re interested in (or at least the one I’m interested in) is a long term trend, longer than El Nino oscillations and volcanic effects, so the latter are ‘noise’. So then I wanted a rough idea of what the noise looked like, and how reasonable it would be to assume the noise was just independent guassian.
You got the answer right! And yes, looking at it again, it seems less jittery. I don’t detect any quasi-periodicity. Next game: what seems like a reasonable model for this ‘noise’?
Only got to this now. So, I took a careful look at the comparisons from lowess done by Graham Jones. The residuals returned by the lowess do indeed look very Gaussian, except that the tails of the residuals population are a bit too well behaved,
Before getting there, though, a remark on the calculation of standard deviation. In this case, there is little serial correlation in the residuals, but there might have been. Thus, the proper way to calculate s.d. is to correct for that, either by estimating the first serial correlation coefficient (lag 1 for the series), or by using a stationary bootstrap to do the estimate. I have done the latter and put all the code along with figures in the file at http://azimuth.ch.mm.st/WarmingSlowdownQ/GrahamJones/GrahamJonesWarmingSlowdown.tar.gz. The naive s.d. comes out as 0.1013. The adjusted one from the bootstrap is 0.0995. The simulated residuals using the revised look no different than Graham’s originals above:
Now, how to tell whether Gaussian or not? The standard diagnostic is a Q-Q plot:
This plots actual data versus theoretical values assuming a Gaussian distribution at its quantile point in the population. Roughly, the closer the points hug the line, the more Gaussian. Except, as you’ll note, the leftmost and rightmost plots, generated from R‘s built-in and impeccable Gaussian stray from the line at either extreme. The actual residuals don’t do that so much. In fact, it’s only if the population of simulated points is trebled do the Q-Q plots really look alike:
There’s something subtle going on there, possibly having to do with the lowess algorithm.
Finally, to estimate the derivative of the lowess, I repeated what was done in the article and obtained:
The fit of the lowess, as Graham indicated, looks pretty convincing, and the derivative is boringly flat, with a slight uptick. Notably, there is no evidence at all in this for a cooling spell before 1940.
Later today, I hope to go after the trends in these data directly using a Bayesian approach. Don’t know, however, if the write-up will be finished today. I will post any summarizing figures here, however.
By boringly flat for the derivative, I mean it is positive but has not changed much in value.
How was the “spar” (smoothing parameter) of Graham’s lowess fit chosen? I don’t see it showing up in his code, but my impression is that lowess requires choosing a value of the smoothing parameter:
Is there a way to say why it’s so much smoother than this other fit you created as a comment on the last post, copied below? Obviously it’s smoother because you’ve smoothed it more, but maybe there’s something more enlightening to say? If GCV ‘optimizes’ the spar, the extra wiggles are ‘really there’?
lowess() has a smoothing parameter called f, and f has a default value of 2/3. So the choice is arbitrary, but presumably a value that has proved reasonable on other data sets.I don’t understand lowess(). The documentation says the algorithm is complicated.
Thanks! I just added a link to the Wikipedia explanation of lowess. It looks readable but I haven’t had time to digest it… and now I must run off and go to a workshop on chemical reaction networks!
On why the spar value chosen by GCV is so small (0.5) on the longer data set. I guess that the GCV attempts to evaluate the accuracy of interpolated values, when some values are removed, and chooses a spar that makes those interpolated values as good as possible. It might do that by fitting the wrinkles or by fitting a long term trend. But for this data we are really interested in extrapolation, or say, estimating the derivative in 2013. So a GCV that evaluated accuracy of the derivatives at the endpoints of ranges seems better.
I have grabbed the code which actually calculates the spline. It’s FORTRAN, and while I do not (yet) understand the algorithm in detail, I have put that code here:
http://azimuth.ch.mm.st/WarmingSlowdownQ/Pspline.f.gz
There is a final step in the SPLCAL subroutine where the gcv value is calculated. It is directly proportional to a sum-of-squared errors and to the number of data points, and inversely proportional to the number of variables (whatever that means in this context) and the trace of a certain matrix. How the FMM subroutine acts when it receives this is something I haven’t yet figured out.
*ugh* I have gradually lengthened the number of years provided to smooth.pspline while keeping “method=3″ or generalized cross validation. Two observations. First, if the spar value is supplied anyway, even if it is supposed to be ignored except when “method=1″, apparently there is inconsistent behavior coming from that function. Second, and more troubling, the returned gcv and cv values gradually change from about 0.0069 up to 0.0075 as the series is gradually lengthened. But, astonishingly, the spar value goes 3.9, 138.2, 219, 1761, 2127 for the same amount of change. I don’t know the FORTRAN code, but I’m re-reading Chapter 4 of Wahba to see if this behavior is expected according to her equations. Whether or not it is, this is not very nice in practice. Perhaps smooth.pspline has a bug?
With the possibility that the SPAR-selecting g.c.v. code in smooth.pspline of R is broken, I redid the figure above using the same SPAR as was used originally in Figure 6. Click to enlarge:
I don’t have much in the way of documentation, but I did a Bayesian calculation of the posterior for the overall trend in degrees Celsius per year:
This was done using R and the JAGS Gibbs sampler facility, code being http://azimuth.ch.mm.st/WarmingSlowdownQ/BayesianTrending.R
The full plots of the run are available at http://azimuth.ch.mm.st/WarmingSlowdownQ/BayesianTrending–revA–Jags-20140608-235559.pdf
The JAGS output include CODA diagnostics is available at http://azimuth.ch.mm.st/WarmingSlowdownQ/201406082357TemperatureTrendsExcerpt.Rhistory
Note: The R code requires JAGS be installed in addition to obtaining the rjags, runjags, and coda packages.
The Bayesian hierarchical model fits a Gaussian with the mean being the overall trend, with the data being slopes from linear fits of different supports, ranging from 5 to 80 years. Of the 132 years from 1881-2012, only 1961-2012 are used, since the support of 80 would then not have data if 1960 or earlier were admitted. The precision of the Gaussian model in each case is the reciprocal of the variance of the fitted residuals. Even though up to 80 years was allowed, the model suggests short supports, as little as 5 years, are more consistent.
The Bayesian calculation results in an overall trend per decade of 0.097 degrees Celsius with a standard deviation of 0.029 degrees Celsius. Comparing these with the Table 1 results above shows favorable agreement, but the posterior density should be consulted for more information.
10 chains were run, with a burn-in of 20000, adaptation of 5000, and sampling of 50000. There were actually 150000 generated, since the MCMC was thinned, taking one out of every three. Chain autocorrelations were very low, and the Gelman-Rubin PSRF were all close to unity. The run took 1.4 minutes using all 4 cores of a 3.2 GHz 64-bit AMD processor, under Win7.
[…] 2014/06/05: JCBaez: Warming Slowdown? (Part 2) by Jan Galkowski […]
[…] Anyone the least bit familiar with either (1) the spewings of climate deniers, or (2) those who might accept climate change, and even its anthropogenic origins, but who dispute the forecast because of the poor quality of climate model projections can realize that the solution to this problem is to improve climate models. Indeed, this is the upshot of the pair of blog posts I made, with Professor John Carlos Baez’ help, here and here. […]
S. Lovejoy, “Return periods of global climate fluctuations and the pause”, http://dx.doi.org/10.1002/2014GL060478, with Abstract:
“An approach complementary to General Circulation Models (GCM’s), using the anthropogenic CO2 radiative forcing as a linear surrogate for all anthropogenic forcings [Lovejoy, 2014], was recently developed for quantifying human impacts. Using pre-industrial multiproxy series and scaling arguments, the probabilities of natural fluctuations at time lags up to 125 years were determined. The hypothesis that the industrial epoch warming was a giant natural fluctuation was rejected with 99.9% confidence. In this paper, this method is extended to the determination of event return times. Over the period 1880-2013, the largest 32 year event is expected to be 0.47 K, effectively explaining the postwar cooling (amplitude 0.42 – 0.47 K). Similarly, the “pause” since 1998 (0.28 – 0.37 K) has a return period of 20-50 years (not so unusual). It is nearly cancelled by the pre-pause warming event (1992-1998, return period 30-40 years); the pause is no more than natural variability.”
This work builds on earlier work by Lovejoy reported in Climate Dynamics, http://dx.doi.org/10.1007/s00382-014-2128-2 , having Abstract:
“Although current global warming may have a large anthropogenic component, its quantification relies primarily on complex General Circulation Models (GCM’s) assumptions and codes; it is desirable to complement this with empirically based methodologies. Previous attempts to use the recent climate record have concentrated on “fingerprinting” or otherwise comparing the record with GCM outputs. By using CO2 radiative forcings as a linear surrogate for all anthropogenic effects we estimate the total anthropogenic warming and (effective) climate sensitivity finding: ΔT anth = 0.87 ± 0.11 K, λ2xCO2,eff=3.08±0.58K. These are close the IPPC AR5 values ΔT anth = 0.85 ± 0.20 K and λ2xCO2=1.5−4.5K (equilibrium) climate sensitivity and are independent of GCM models, radiative transfer calculations and emission histories. We statistically formulate the hypothesis of warming through natural variability by using centennial scale probabilities of natural fluctuations estimated using scaling, fluctuation analysis on multiproxy data. We take into account two nonclassical statistical features—long range statistical dependencies and “fat tailed” probability distributions (both of which greatly amplify the probability of extremes). Even in the most unfavourable cases, we may reject the natural variability hypothesis at confidence levels >99 %.”
I’d much rather see posterior densities than reports of significance tests, but these’ll do.
Two technical discussions regarding climate internal variability are offered by Dr Isaac Held in his blog, namely, “Heat uptake and internal variability” (from 2011), and “Heat uptake and internal variability — part II” (from 2014).
There is a just-released article which argues (quantitatively, of course) that climate model predictions are dependent, thus invalidating one of the assumptions, that of exchangeability of models, in the Fyfe, Gillett, and Zwiers paper discussed here.