Debunking the Hockey Stick

This may be old news to many that read my posts, but a comment on one of my recent posts has made me think that I should try to address the claim that the Hockey Stick has been debunked. The basic story is that Mann, Bradley & Hughes published a paper in 1998 called Northern Hemisphere Temperatures During the Past Millenium: Inferences, Uncertainties and Limitations. The paper presented reconstructions of the temperature history for the last 1000 years that indicated that temperatures were reasonably constant up until the mid 1800s and then rose sharply. The shape of the temperature profile was, therefore, referred to as a Hockey Stick.

In 2005, Stephen McIntyre and Ross McKitrick published a paper called Hockey Sticks, Principal Components, and Spurious Significance. In this paper they conclude by saying

…., the effect of the transformation is so strong
that a hockey-stick shaped PC1 is nearly always generated
from (trendless) red noise with the persistence properties of
the North American tree ring network.

What they are suggesting is that the analysis used by Mann, Bradley & Hughes can produce hockey stick shaped temperature reconstructions even if the data is simply random red noise. The conclusion is, therefore, that the Mann, Bradley & Hughes analysis is flawed and that the Hockey Stick is not real.

Now, this is all rather odd. Partly because the instrument record shows that surface temperatures have indeed risen dramatically in the last 100 years or so, and partly because numerous other studies have reproduced the hockey stick profile. One explanation I have heard is that McIntyre & McKitrick did indeed produce hockey sticks from their red-noise data, but only when – by chance – their red-noise data had an underlying hockey stick.

The comment I referred to at the beginning of this post was made by caerbannog666 and provides a slightly different explanation. You can read caerbannog666’s comment here. What is claimed is that McIntyre & McKitrick produced their red-noise by using the data from Mann, Bradley & Hughes, but forgot to de-trend, or remove the underlying hockey stick shape. Therefore, the reason their analysis produced hockey sticks was not because the Mann, Bradley & Hughes analysis was flawed, but because there was a hockey stick profile in their data. This certainly seems at least partly consistent with what McIntyre & McKitrick did, as they say in their paper

We generated the red noise network for Monte Carlo
simulations as follows. We downloaded and collated the
NOAMER tree ring site chronologies used by MBH98 from
M. Mann’s FTP site and selected the 70 sites used in the
AD1400 step.

Caerbannog666 provides a link to the code used by McIntyre & McKitrick to do their analysis. I haven’t understood the code completely, but it has two main parts. The first part has the various functions, the second is the Narrative portion to generate figures and statistics. It does appear as though the part of the code where the Mann, Bradley & Hughes data is read-in does not do anything to remove the underlying hockey stick. There is a detrend function in part one, but it’s not clear if or where this is used (if at all).

Now, maybe this has already been addressed somewhere else and everyone already knows whether McIntyre & McKitrick properly detrended their data or not. Also, I’ve misunderstood how other people’s codes work on many occasions in the past, so I don’t want to claim that McIntyre & McKitrick have made some kind of fundamental mistake. However, this does seem like something that could be addressed very easily. Either they removed the underlying hockey stick when they produced the red-noise data that they used to test the Mann, Bradley & Hughes analysis, or they didn’t.

So, my basic question to either Stephen McIntyre or Ross McKitrick is did you properly remove any underlying hockey stick profile from the data you used to produce the red-noise that you used in your 2005 paper. If so, my next question is where in the code is this procedure applied and how does it work. Fairly straightforward questions that should be easy enough to answer. Given that the debunking of the hockey sticks is one of the mainstays of the typical skeptic’s argument, it would seem that clarifying this would be very important.

This entry was posted in Climate change, Global warming, Steven McIntyre and tagged , , , , , . Bookmark the permalink.

33 Responses to Debunking the Hockey Stick

  1. Very nice. I have never delved into the original M&M attack, because the fact that the results of Mann et al were replicated over and over by multiple teams using multiple methods was good enough for me. It certainly sounds like the careful work we’ve come to expect from the pioneers of “blog science”!

  2. Just a few more comments…

    This problem was flagged by others (with much stronger scientific/technical credentials than I have) years ago; that’s how I found out about it.

    This is really old news, and the only reason I piped up about it now is that this “hockey sticks from random noise” claim is once again being recycled by the “skeptic” noise-machine.

    R code is pretty opaque, and if you are only minimally familiar with R (like I am) you will want to have reference/tutorial material handy as you wade through the code. Spend enough time perusing the code, and you will see that autocorrelations are computed from the tree-ring data exactly as it is read in from the data file (unless I’ve really missed something, in which case I’ve really shot myself in both feet — quite possible, but I don’t think I did). The properties of the synthetic red noise used to generate “noise hockey sticks” will depend on those autocorrelation outputs.

    If the long-term climate signal (the “hockey stick”, if you will) isn’t filtered filtered from the data prior to computing the autocorrelations, you will introduce long-duration autocorrelation components into your synthetic noise.

    This doesn’t mean that you will automatically see “hockey sticks” pop out of your synthetic noise, but the long autocorrelation times introduced by the inclusion of the long-term climate signal in the data will greatly increase the chance of getting seeing spurious trends (or other low-frequency “features”) in what is supposed to be “trendless” noise, especially if you “cherry pick” your results from large ensembles of trials.

    For a truly valid “noise only” test of Mann’s procedure, you need to pre-filter the tree-ring data to eliminate the “climate signal” before you use the data as a “template” for your random noise model. That does not at all appear to have been done (from my reading of the code).

  3. I should add that John Mashey addressed the problems with McIntyre’s work in great detail, especially the problems associated with “cherry picking” of hockey-stick features from large numbers of trials (this is over and above the signal vs. noise issue described above).

  4. Thanks. Indeed I can’t find anything in the code that seems to remove any signal. However, I have explicitly written this post in a manner that indicates that I’m more than happy for them to indicate where this procedure is done. Personally, I think this is a perfectly reasonable question to ask that should be easy to answer. Given how often it is claimed that the hockey stick has been debunked (which is odd – as TheTracker mentions above – given how often it’s been reproduced) it would seem crucial to establish that they’ve generated appropriate red-noise for their test.

  5. Yes, I had heard about the cherry-picking before.

  6. vvenema says:

    You could also treat the code as a black box.

    See how often it produces hockey sticks the way it is written now.

    Then truncate the tree ring time series to stop in 1800.

    See how often the code now produces hockey sticks.

    Compare both rates.

  7. Yes, that’s a very good suggestion. I had wondered if I could get the code to work, but don’t know R very well. What you say makes sense though. If red-noise can produces hockey sticks, it shouldn’t matter where the code is truncated.

  8. And reproduced using proxy data other than tree rings (i.e. ice sheet samples, mud cores etc).

  9. I’ve just found this post on Deep Climate that seems to be saying something similar to what I’ve said here but is clearly written by someone who understands this much better than I do.

  10. Pingback: Around the NZIHL; Dunedin Thunder's Imported talent. | GET REAL HOCKEYGET REAL HOCKEY

  11. mandas says:

    There was a study published in Nature Geoscience only lat week which, once again, confirmed the accuracy of the ‘hocky stick’.

  12. I’ve dealt with this time and time again in my content and comment sections. And no matter how often you point out that the hockey stick isn’t flawed, the criticism almost always weren’t valid, and that there are now multiple studies confirming the original hockey stick they won’t concede the point.

    Often it leads to going down further into the rabbit hole.

  13. Indeed, one could ignore McIntyre & McKitrick simply because so many other studies have reproduced the original hockey stick. It’s just amazing that so many still repeat the mantra that it’s been debunked. If M&M did fail to remove the hockey stick from their test data, that would be a nice simple way of illustrating that their study was the one that was flawed and – in an ideal world – would end this debate once and for all. Of course, we don’t live in an ideal world 🙂

  14. Yes, many have already addressed this and it still doesn’t change what some people believe. What I find amazing, as I say above, is that if there are fundamental problems with McIntyre & McKitrick’s analysis, it should be straightforward to point this out and end the debate once and for all. Of course, that would only work if people actually wanted the debate to end.

  15. Martin Vermeer says:

    To be absolutely clear (and I’m not sure you grasp this clearly wottsupwiththatblog) the problem is that the hockey-stick shape if not removed contaminates the noise model. That is, you get much “redder” (more serially persistent) noise than if you did it correctly. This makes finding spurious hockey sticks much more likely. But the synthetic data generated using this noise model is still noise, i.e., it itself does not contain a hockey-stick signature.

  16. Thanks for the comment and I’m sure you’re correct. I am no expert at this and make no claim to be. Probably wasn’t necessary to point out that you’re not sure that I grasp this clearly as your comment would have clarified things quite niclely by itself, but thanks anyway. Good to be clear I guess.

  17. Also, just to clear (and I’m not sure you grasp this clearly Martin Vermeer) one of the reasons I allow comments on my blog is so that those who know more than me can correct where I’ve gone wrong.

  18. bratisla says:

    I decided a short while ago to begin to use R, in order to use basic tools. But Lord do I regret this decision when I see how unclear R can be when it’s more complicated. I do not like Python, but the difference is striking.

    Anyway, I looked at the first function sd.detrend. It is not a detrending tool. What it does is the following :
    – takes the input x
    – fit a linear model to it
    – computes the residual
    – calculates the standard deviation of the residual

    sd.detrend is then used in mannomatic (what a classy name …). For what ? This is where it becomes unclear. I think mannomatic and sd.detrend are here to replicate MBH98, in order to compare with arfima noise. But I’m not sure.

    But still, R is a mess. I will reconsider my first move and check what numpy has in its belly.
    (sure, R is far better than IDL, but *anything* will be better than IDL. Including stepping on a lego).

  19. I quite like IDL, but then I only use it to produce figures from my data, not to do any actual computations. How hard would it be to do what Victor has suggested? Completely replicate what M&M have done but only use the MBH data up until about 1800. I can see in the code where one would truncate the year, but I don’t actually know how to run an R code.

  20. bratisla says:

    There are some “hardcoded” values, such as the number of years to consider to calibrate the proxies (from what I guess), so it seems difficult for me to do what was suggested before (run for datas before 1800), although this idea has many advantages.

    Personally, if I had time I would restart from scratch, trying to replicate instead of audit :
    – generate red noise “proxies” with characteristics from truely detrended proxy supposed noise (highpass filter the data for example)
    – calibrate together the thus generated proxies
    – see what happens with PCA
    Main advantage : you can code with the language you wish, especially since from what I saw the R tools used can be easily found elsewhere (GSL, numpy, etc.)

    But this is a first guess without any big work on the data/problem and after a glance at a code written in a language I do not master, so I may be badly mistaken both ways …

  21. Yes, I tried to clarify this a bit above.

    Failure to remove the hockey-stick signature from the tree-ring data before using it as a noise-model “template” will not cause an actual “hockey stick” to be injected into the noise. It will, as you pointed out, introduce a much longer autocorrelation length into the noise. If that autocorrelation length is a significant fraction of the data record length, then the likelihood of a spurious “hockey-stick” trend/signature showing up is greatly increased.

    Rather like putting half of sine-wave period into a data record — it could easily be mistaken for a “trend” simply due to the fact that your data record length is too short relative to the period of the sine-wave (oversimplified example, but folks here should get the point).

    Unfortunately, this is all a bit too complicated to fit into a boilerplate “talking point” of the sort that your average “skeptic” can comprehend.

    Coming up with a simple “talking point” that refutes the “hockey-sticks from random noise” claim, and that is both technically accurate and easily understood by someone with a Wattsian-level comprehension of the science, is not easy.

    Perhaps a talking-point like “McIntyre goofed — he generated red-noise that was contaminated with hockey-stick signal *statistics*” would work. The talking point makes it clear how McIntyre messed up with his noise model without propagating the erroneous notion that his red noise was not actually trendless.

  22. Yes, I realised that your comment had already covered this and, in retrospect, I don’t think what I said was all the incompatible with this picture (e.g., removed the underlying hockey stick when they produced the red-noise data). I’m not an expert at this so Martin’s probably correct that I don’t quite grasp this (and I’m not sure that I quite get it yet) but, as you say, if you don’t remove the signal from that data that you use to generate the noise, then it seems likely that that will influence the result that you get (or at least I think that’s what you’re saying :-)).

  23. Another quick followup:

    Something else to consider is the tree-ring data eigenvalue *magnitudes* vs the red-noise eigenvalue *magnitudes*. (Note: eigenvalues are just singular-values squared — they are equivalent in the information they contain.)

    IIRC, the leading eigenvalue magnitude for Mann’s tree-ring data was much greater than the leading eigenvalue magnitudes for McIntyre’s red-noise trials. (This held true for both full-centered and short-centered SVD runs).

    One thing folks should remember — if the SVD (Singular Value Decomposition) produces a hockey-stick, the SVD output will tell you two things about it: its shape, and its *size*. The leading singular vector (aka leading principal component) will tell you about the hockey-stick *shape*, while the leading eigenvalue will tell you about its “size” (i.e. how much of the data it represents).

    Although it’s not a black-and-white call, a large “hockey stick” eigenvalue magnitude would indicate that the data contains a real hockey-stick signal, whereas a small eigenvalue magnitude would indicate that it’s likely just a noise artifact.

    One thing that McIntyre did not do (or at least I didn’t see any indication that he did do this) is plot his red-noise eigenvalues against Mann’s tree-ring eigenvalues. Had he done that, the differences between his red-noise and Mann’s tree-ring data would have been very clear.

    Show someone familiar with the SVD the full SVD outputs for McIntyre red-noise vs Mann’s tree-ring data, and he/she would be able to tell you which is which in about 5 seconds.

  24. Just a quickie drive-by:

    Michael Mann just tweeted a link to this page (, so that gives me more confidence that the discussion here is on the right track.

  25. John Mashey says:

    I think this argument is true … but accidentally, a red herring.

    As per Deep Climate’s diligent work, followed by Nick Stokes Effects of selection in the Wegman Report., I would claim:

    1) Lack of detrending could indeed lead to positive hockey sticks.

    2) But they used overly-persistent noise (compared to real life), which means that in a set of 10,000, more will look hockey-stickish (positive or negative), even if there were no trend in data a all. Although the stats aren’t the same, think of generating random-walks … at least some will part fairly far from the mean, Noel Cressie even pointed out to Wegman that he needed to show negative hockey sticks, too (but too late, and Wegman never backed off.)

    3) Then, they sorted from most positive to most negative hockey-sticks, selected the top 100 (which you can see in the R Code, as DC showed), and then sampled from that. Wegman used the same code, just ported. That’s a 100:1 cherry-pick, explicit in the code.
    In academe, such things may be called falsification or fabrication.

    Read Nick’s post carefully,

    This is like measuring the the heights of men in a town … by attending an NBA game and selecting the men who happen to be on the basketball court.

  26. Thanks John, your name has come up a number of times as someone who knows about this stuff. I read the Deep Climate post The Wegman Report see Red but have not read the one you mention above nor Nick’s. I shall do so shortly. What you say certainly makes sense. I’m no expert at the techniques used here but, as a physicist, I find it odd that someone could take a data set that may have a signal (hockey stick), use that dataset to produce another dataset (the red-noise that M&M then tested) and then claim that because their analysis can produce the signal that others think is real, that this now indicates that the signal is not real. Logically, this just seems completely flawed, even if it isn’t quite the reason why their analysis was producing hockey sticks.

  27. John Mashey says:

    Of course it’s flawed.
    Several orthogonal issues are found here.
    1) Essex & McKitrick got involved with Fred SInger in 2001, and McIntyre was added by 2003.

    2) During 2000-2004, there was a concerted effort to attack the hockey stick, trying all sorts of figures and claims … which really didn’t stick too well.

    3) The M&M piece appeared in 2005, with lots of publicity from the Wall Street Journal, National Post, Inhofe, etc. Very few people have the background and patience to sort out what was there, but that was irrelevant. Physics arguments had failed, now people went off into statistics, and a common tactic is to take an argument deep enough that most people cannot follow it, but allowing one to claim a simple message, akin to the Cheshire Cat’s lingering smile, unsupported by reality. The simple message was: statistical problems generate false hockey sticks out of random noise.

    4) That was followed up by Fred Singer, David Deming, M&M’s exhumation of the IPCC(1990) Fig.7.1(c) “flat-earth map,” talks in Washington, and the Barton harassment of Mann, Bradley & Hughes … and then the Wegman Report, designed to ratify M&M’s statistics and the social network claims. See FOIA Facts series for how bad that was.

    SO, the issue at this point is that M&M2005 has many problems, but how does one explain that to a general audience without getting bogged down in details?

  28. BBD says:

    Fake controversy. It’s almost as good as the real thing!

  29. The link to this has been tweeted around (thanks, Dr. Mann) and otherwise publicized a fair bit, so it may show up in future Google/Bing/whatever searches. So I figure that I ought to wrap this up with a simple “talking-point-friendly” summary for the benefit of future Internet searchers:

    1) McIntyre relied on a 100:1 cherry-pick to get his “random noise” hockey sticks.

    2) His random noise was badly contaminated with “hockey-stick” signal statistics (i.e. he didn’t filter the hockey-stick out of the tree-ring data before he used it as a template for his random noise).

    3) In spite of (1) and (2) above, McIntyre’s random-noise hockey sticks were *much* smaller than Mann’s genuine tree-ring hockey-stick. No competent analyst would confuse a McIntyre random-noise hockey stick with a genuine Mann tree-ring hockey stick.

    The magnitude of the problems with McIntyre’s “noise from hockey-sticks” study can pretty much be summarized as: “Other than that, Mrs. Lincoln, how did you like the play?”

  30. Pingback: Richard Tol and the 97% consensus | Wotts Up With That Blog

  31. Pingback: The Post-Normal Times , Archive » Rhetorical hyperbole? Or reckless disregard for truth?

  32. metzomagic says:

    Late to the party as usual. I found this as a result of a comment from caerbannog over on Hot Whopper, where Robert Wilson’s recent dissing of Mann is being discussed. Anyway…

    Though I agree with all the other points made above, I don’t think failing to de-trend the data before adding noise to it is a problem:

    1) If you use an appropriate noise model
    2) You are trying to determine if your PCA algorithm can still extract the climate signal from the signal + noise

    As I understand it, the idea here is to simulate non-climatic effects on trees of things like disease or insect infestations. Dendro folks seem to agree that these would run their course in just a few years. Indeed, if you use AR1 with a coefficient of .2, that gives an average auto-correlation period for the noise of: (1 + .2)/(1 – .2) = 1.5 years. But McIntyre used ARFIMA, which is akin to using AR1 with a very high coefficient of .9 (this is discussed in Deep Climate’s: “Replication and due diligence, Wegman style”, which is linked to above). This gives an extremely unrealistic average de-correlation period of: (1 + .9)/(1 – .9) = 19 years. It degraded the signal so much, that it caused a lot of the simulations to ‘run away’ (in either direction). So that’s where the ‘hockey sticks out of nowhere’ really come from. The above points are well discussed here:

    It’s in relation to a flawed analysis by van Storch, but same idea. The take-away quote is:

    “The added noise was purportedly designed to represent non-climatic effects such as disease or insect infestation. This simulated ‘noisy’ world then can be used as a test-bed for the reconstruction methodology. A given analysis procedure is validated if it successfully recovers the original AOGCM noise free results and could be rejected if it fails to recover the original results. Of course such testing only makes sense if the simulated test world has characteristics similar to the real-world.”

    Of course, McIntyre did not have my point no. 2) above in mind at all. He purported to get hockey sticks from nowhere out of so-called ‘trendless red noise’ by claiming that they were purely artefacts of Mann’s ‘flawed’ PCA algorithm. So for his purpose, he should have de-trended the data first. But it’s all about getting rid of the hockey stick at any cost as far as McIntyre is concerned. Physical basis for anything he does be damned. The man is not an honest broker.

  33. I suspect that you’re be right. The de-trending issue was just something that I found odd. I don’t really understand precisely how this all works but, as a physicist, I just find it very odd that M&M would use the MBH data to produce the noise. Surely you would want to make sure that what you were using to produce the noise had no chance of actually having hockey sticks.

Comments are closed.