## Lies, damned lies and statistics!

There’s rather a strange post over on Bishop Hill called Met Office admits claims of significant temperature rise untenable. The post discusses the Met Office’s response to a parliamentary question by Lord Donahue in which he asked

…Her Majesty’s Government … whether they consider a rise in global temperature of 0.8 degrees Celsius since 1880 to be significant.

The response from the Met Office was that the “the temperature rise since about 1880 is statistically significant”.

The basic argument in the post seems to be that the response requires an explanation of the statistical model that they’ve used and hence allows for the possibility that another statistical model would be equally valid. So, is there any merit in this claim? Well the technique that is being used by the Met Office to make the claim that the rise in global surface temperatures is “statistically significant” is known as linear regression. Basically, it is a technique that determines the best-fit straight line through a dataset (in this case the temperature anomaly data). It also allows you to determine the error on the best-fit line. In the case of temperature anomaly data, the data points are correlated (the anomaly values depend on previous values) and so determining the error is somewhat more complicated than for basic linear regression (you can download a basic linear regression routine from the internet). You also need to decide on the significance of the error. Typically, for temperature anomaly data the significance is taken to be 2σ, which means that there is a 95% chance that the trend is between trend + error and trend – error.

So, what do you get if you use linear regression to analyse the Met Office’s temperature anomaly data. The data from 1880 to 2013 has a best-fit linear trend of 0.062oC per decade with a 2σ error of 0.008oC per decade. Therefore there is a 95% chance that the trend is between 0.053 and 0.071oC per decade. So, when the Met Office responded that it was “statistically significant” what they meant is that there is virtually no chance that there has been no underlying linear warming trend in the temperature anomaly data since 1880.

So, is there any merit to the suggestion that some alternative statistical technique could have been used to analyse this data and that the actual long-term trend is different to what the Met Office obtained using linear regression? In a sense, yes. By simply applying linear regression to the temperature anomaly data, we have no idea why the data has behaved in the way that it has since 1880. The Bishop Hill blog also goes on to quote a Met Office statistician (Doug McNeall) as saying that the trending autoregressive model is “simply inadequate”. What the Bishop Hill post fails to clarify is that he actually says inadequate to capture all of the timescales that are apparent in the Earth system. Indeed, I agree. Linear regression simply determines the underlying linear trend. It tells you nothing about other variations that may be present in the data. This is precisely the point. You need to do more than simply fit a line or a curve to a dataset to understand the underlying physical processes.

So, what do you do to try and explain the underlying linear trend in the Met Office temperature anomaly data? Well, you may notice that there has been a significant rise in atmospheric CO2 levels since the mid-1800s. This is a greenhouse gas and so increasing its concentration in the atmosphere could lead to a rise in global surface temperatures. You could also notice that the ocean heat content has risen by about 2.5 x 1023 J since 1960 (about 5 x 1021J per year). You eventually get satellite data that tells you that – for at least the last 10 to 20 years – the Earth has been receiving 0.5 – 1 W m-2 more energy from the Sun than it loses into space. This indicates that the Earth has been receiving between 2 – 4 x 1021 J of excess energy every year (very similar to the increase in ocean heat content). You could then conclude that there is evidence to suggest that the underlying rising linear trend in the global surface temperature anomaly is part of a process of global warming driven by the increase in atmospheric CO2 resulting from our use of fossil fuels. There are other processes producing variations in the global temperature anomalies, but the underlying warming trend seems as though it is being driven by enhanced global warming due to increasing atmospheric CO2 concentrations.

Now, one could apply a different statistical technique to analyse the global surface temperature anomalies. Maybe see if one can fit some kind of sinusoidal variation in which the data from 1880 to 2013 is on the rising part of the sinusoid. I’m sure that this is possible. What you have to do next, however, is determine what physical process explains this long-term sinusoidal variation. I can’t think of one, but maybe the author of the Bishop Hill post is cleverer than me.

The Bishop Hill post finishes by saying that the basis for the claim that the warming is statistically significant

has now been effectively acknowledged to be untenable. Possibly there is some other basis for the claim, but that seems extremely implausible: the claim does not seem to have any valid basis.

Plainly, then, the Met Office should now publicly withdraw the claim. That is, the Met Office should admit that the warming shown by the global-temperature record since 1880 (or indeed 1850) might be reasonably attributed to natural random variation. Additionally, the Met Office needs to reassess other claims that it has made about statistically significant climatic changes.

Really? I don’t think that the Met Office made any claim about statistically significant climatic changes. All they claimed is that the underlying linear trend in their temperature anomaly data is statistically significant at the 2σ level. Any claims about the significance of global warming (and consequently climate change) is based on extensive research aimed at understanding why there is a an underlying warming trend in the temperature anomaly data. If the author of the Bishop Hill post wants to claim that it is simply natural random variations, they need to do the research that attempts to understands what this random natural variation is. Even a random natural variation has to obey the basic laws of physics and if you can’t find some physical process that produces such a random natural variation then the chance that it is driving the observed warming seems rather small.

This post has become quite long, but the basic point is that our understanding of global warming and associated climate change is based not only on linear regression analysis of the global temperature anomalies, but on a host of other measurements, analyses and modelling. If someone is to claim that they can use a different statistical method to analyse the global temperature anomaly data, that’s fine. That – in itself – isn’t enough to disprove the basic tenets of anthropogenic global warming. They need to make all the other measurements, analyses and modelling that provides evidence for their alternative. Until they’ve done that, their alternative has no intrinsic value.

This entry was posted in Climate change, Global warming and tagged , , , , , , , , , . Bookmark the permalink.

### 13 Responses to Lies, damned lies and statistics!

1. BBD says:

That would be GWPF trustee Lord Donoughue. No less.

Fans of Climategate (the Clue is in the Frame™) may remember the name.

2. Fragmester says:

Watching the deniers at work with this sort of thing makes me wonder what they might make of the Dead Sea Scrolls. Can we club together and buy them a dictionary so they can read plain English and understand what is being said?

3. jyyh says:

OHC to 700 vs. to 2000m is yet another proof about the origin of warming, that is the atmosphere, and specifically greenhouse gases. Once they’d acknowledge the fact, they might be better at downplaying arctic amplification :-p. I guess the main theory of the deniers of the greenhouse effect of CO2 is still the deep ocean heating (or in Baron Monckhofens words (‘underground volcanoes!’ ), as somewhat supporting babble occasionally comes out also in RealClimate.

4. Yes, I should have guessed as much.

5. I just have a sense that they don’t really understand the difference between “global warming” and “climate change”. They don’t seem to get that global warming is about energy, not simply about global surface temperatures. Of course, it may be that they simply don’t want to understand the distinction.

6. I wouldn’t really want them to be attempting to interpret anything particularly complicated.

7. cRR Kampen says:

It is simply so: the Met Office made a very clear statement, so de merchants of doubt really had to act: deny, deny, deny!!

8. Indeed. Something I didn’t do in the post, but was tempted to do, was to try and illustrate the complete twisting of what was said. The Bishop Hill post is essentially saying “you said it was statistically significant and therefore it is not”. If you read some of the comments on WUWT there are some who are actually claiming that the Met Office has stated that their analysis of the temperature anomaly data is not statistically significant – exactly the opposite of what they actually said. Quite remarkable really.

9. Anonymous says:

They’re suggesting that the met office has admitted that a different model would produce non-significant results, and the alternative they’re suggesting is from this op-ed in the WSJ which claims to have found a model that fits the data better, and is non-significant. The background material for the model is here (pdf).

Unfortunately, their “alternative model” is a dog’s breakfast. It suffers from multiple flaws, that can be inferred from the pdf:

1. They misinterpret the paper that they base their analysis on, it uses a suite of ARFIMA models but they interpret it as recommending the use of a single ARIMA model – the ARIMA(3,1,0), which is a model with a difference (of 1) that is clearly not used in the paper that justifies their modeling
2. They don’t justify choice of ARIMA(3,1,0) through, eg. partial autocorrelations or autocorrelations, but just present it as a fait accompli
3. In their associated code they don’t show the coefficients they claim are non-significant
4. In any case those coefficients are completely unrelatable to the standard linear model with correlated residuals used by the met, so if one of them is non-significant that doesn’t mean that there is no significant warming
5. they are probably wrong to use an ARIMA(3,1,0) model when the original paper suggests a linear model with ARFIMA residuals: I don’t know but I’m guessing the only time a linear model with ARIMA residuals and an ARIMA model are equivalent is at ARIMA(1,1,0)
6. they calculate AICs using a formula that seems a bit weird, rather than just taking direct AICs from R – superficially (ie without running their code) it appears they have incorporated the degrees of freedom into the AIC calculation twice
7. they appear to get the number of degrees of freedom in the GLS model wrong, I guess it should be 3 (one for hte constant, one for the slope, one for the AR term) but they have given it four.
8. they use degrees of freedom from a GLS model with AR(1) residuals but compare the resulting AIC with that of an ARIMA model with no gls fit – generally these comparisons can only be done within nested models, not different classes of models
9. they calculate the likelihood ratio using AIC instead of likelihood so it is wrong by a factor of (in their case) 2.8 but probably more like 2.8^2 depending on the correct degrees of freedom
10. comparing likelihood ratios the way they do is extremely unusual in the time series literature: they should instead calculate a difference of -2log likelihoods, which is known to be chi-squared, and compare it to a distribution with (probably) 3 degrees of freedom. This difference probably has a signfiicance of p=0.01, but it’s hard to say given the shoddiness of their comparisons
11. their model has three AR terms (at lag 1,2,3). They need to use a model-building strategy to show that all their AR terms are significant – which is highly unlikely, but they don’t show their results.

Plus, of course, their supplementary material is very sloppily written in general.

This is the basis on which they claim that there is no significant warming since 1880, and the model they claim the Met has now admitted might be valid. I guess that both the silly lord who asked the question and the WSJ lack the scientific skill to assess the validity of the alternative model. I think the WSJ should be asking themselves whether it is ethical to publish articles that should be subject to peer-review, when they are not capable of that peer review.

10. Thanks for the comment and the clarifications. As you’re indicating, most of what they’ve written doesn’t really make much sense to me, so I’ve had trouble really understanding what they’re suggesting. One impression I did get (and maybe you can clarify) is that their proposed models essentially starts by removing the linear trend. If so, then clearly they can fit what remains with some kind of random fluctuations, but that’s only because the linear trend has been removed.

11. assman says:

What they are suggesting is extremely simple. A random walk has spurious trends. I can produce 50 realizations of a random walk and you will spot trends in large number of them. But random walks have no trends. Random Walks are trendless. You were fooled by randomness. You thought you saw a trend but in reality it was just random variation.

The measure of statistical significance is hugely affected by choice of model. Trends tests will spuriously find trends in random walks and make them appear statistically significant. So then the question is, is the climate system a random walk? The truth is that a random walk can’t be ruled out. This was already argued a long time ago in a Woodward and Gray:

http://journals.ametsoc.org/doi/pdf/10.1175/1520-0442(1993)006%3C0953%3AGWATPO%3E2.0.CO%3B2

12. Indeed, I agree that one could produce data using a random walk that will appear to have a trend over some time interval. Presumably if one were to continue that random walk for a sufficiently long period, the trend will eventually disappear. However, over the period initially considered, the trend is real (i.e., whatever data is being presented has actually, on average, increased – for example – over that time interval). Just because the process is random, doesn’t mean the trend isn’t actually there.

What I presume people are arguing is that since the temperature anomaly data can be reproduced using some kind of random walk, that the whole process is simply random and hence over a sufficiently long time period the trend will eventually disappear. The problem I have with this suggestion is that some physical process has to have produced the underlying rising trend seen in the temperature anomaly data since 1880. If it is some random process, what is it? The global surface temperature changes have to be due to actual physical processes. They don’t simply change randomly. The process itself could, possibly, be behaving in some random way but we should still be able to represent this using actual physical models.