Watt about Richard Tol?

John Cook (with others) published, recently, a paper called Quantifying the consensus on anthropogenic global warming in the scientific literature. The paper used Web of Science to search for papers on “global warming” or “global climate change”. They restricted their search to the physical sciences and to articles only and it returned 12474 results. They then analysed the abstracts of these papers and scored them according to whether or not they endorsed anthropogenic global warming (AGW) or had no position with regards to AGW. Their results indicate that of those papers that addressed (in the abstract at least) AGW, 97.1% endorsed AGW while only 2.9% rejected or were uncertain about AGW.

According to Watts Up With That (WUWT), Richard Tol has statistically deconstructed the 97% consensus. The basic claim attributed to Richard Tol is that by searching for articles on “global climate change” rather than “climate change” the Cook et al. study ignored 75% of relevant papers and changed the disciplinary distribution. Maybe so, but is this at all relevant? Let’s consider how one might design a study such as that carried out by Cook et al.

Well, typically one would consider the resources available and hence how many samples they can realistically analyse. Next one needs to determine how to extract the sample. One could do a very broad search (such as “climate change” rather than “global climate change”) and then discover that the sample was too large to be analysed given the resources. One could then choose to change the search terms, or to select randomly from within the bigger sample. Both could be valid ways of extracting a suitable sample. The advantage of using a randomly selected subgroup of a big sample is that your large sample may be – in a sense – the “correct” full sample of all possible papers. Your randomly selected sub-group would then be a representative sub-sample of all possible papers.

Alternatively, one could argue that the full sample includes papers that aren’t actually relevant. In this case, the interest is in whether or not man is responsible for “global warming” or “global climate change”. By searching for “global warming” or “climate change” the results may well include many that aren’t really relevant (or that one could argue aren’t relevant). By changing your search terms to be more specific, you can extract a full sample of papers that cover the topics of interest and that is of a manageable size.

Essentially I’m suggesting that although Richard Tol’s strategy may be fine, there’s nothing fundamentally wrong – in my opinion at least – with the strategy adopted by Cook et al. It’s certainly a perfectly reasonable strategy that returned a large number (12474) of relevant articles, one third (4000) of which directly – in their abstracts – address AGW. If Richard Tol really wants to show that there is a problem with this sampling strategy, he could do so quite easily. His two claims are that Cook et als. strategy ignored 75% of relevant papers, and that it changed the disciplinary distribution. What Richard Tol would need to do would be two different things, neither of which should be too onerous. First, randomly select a sample of papers from his larger sample and then apply the same study criteria as applied by Cook et al. One could then assess if the larger sample was likely to produce a different result. The next thing would be to do the same – but individually – for some of the different disciplines. This would indicate if there was a discipline dependence.

If Richard Tol did this and showed that the result were quite different, then maybe his argument would have some merit. But to say that the “sampling strategy is a load of nonsense” simply because he can get a different sample by doing a different search and that his different search produces a different disciplinary distribution doesn’t really have much merit without a study to show how the results are affected by these differences. One of the most extreme variations between the two searches considered seem to be for his own papers. The Cook et al. search produced 10 papers by Richard Tol, while Richard Tol’s search produces almost 90 of his own papers. Given that Richard Tol is actually an economist, that only a few of his papers are relevant to a study of whether or not climate science papers endorse AGW or not, seems quite reasonable to me.

This entry was posted in Anthony Watts, Climate change, Global warming, Watts Up With That and tagged , , , , , , . Bookmark the permalink.

21 Responses to Watt about Richard Tol?

  1. Hello,

    Richard published a draft:

    ://twitter.com/RichardTol/status/341144213162962945

    Comments welcome.

  2. Rachel says:

    I think the link Willard meant to provide is this one – http://richardtol.blogspot.co.uk/2013/06/draft-comment-on-97-consensus-paper.html

    Honestly though, how much fuss is there about this paper? I agree, that rather than sit down and cry about it, deniers should conduct their own study and provide some concrete evidence for their position. I think they will get the same result as although the methodology in this study was different to the Oreskes study, they both ended up with more or less the same results. Have there been any others do you know?

  3. Thanks. Will have a look.

  4. Yes, I did find the link eventually. I agree. It’s one thing to point out some potential issues with a study, but if you want to claim that it is horribly flawed, you would typically need to do some study of your own to show how the results will change if the design of the study is changed.

  5. The published data say 98%. The paper says 97%. The difference is due to data that are hidden from the reader. 40 papers were reclassified in a fourth rating. The paper is vague whether this was 40 out of 1000 (and so 319 out of 7970) or 5 out of a 1000 (and so 40 out of 7970). The latter case corresponds to 97%, the former to 91%.

    Reworking data violates the survey protocol.

  6. Thanks for commenting. Here’s how I see it. The Cook et al. paper claims that there were 4014 (3896 + 78 + 40) articles – from their sample – whose abstracts took some position with regards to AGW. Of those papers 97.1% endorsed AGW. Your claim seems to be that the 40 articles that were uncertain should not be included as they were determined by reclassifying papers that were originally classified as having no position. This, you claim, violates the survey protocol. I’m not an expert in designing and carrying out surveys, so maybe there are some strict rules and that they have violated them. Given that they explain this extra process in the paper and that it has been peer-reviewed, it seems unlikely that it is something that is absolutely wrong. I’m happy to be corrected, but it seems to me that your view is somewhat extreme.

    I will grant you something though. Having reread parts of the Cook et al. paper, I’m now slightly confused about the 40 articles that are ‘uncertain’. It isn’t clear if this is 40 out of the randomly selected sample of 1000 (and hence 319 out 7970) or 5 out of 1000 (and hence 40 out of 7970). As you say, this makes it uncertain as to whether it is 97% or 91%. Ideally, they should have made this clearer.

    I have read some of your draft response to the Cook et al. I was going to write a post about this, but maybe I’ll just add some comments here. I do find that quite a bit of what you say in the draft isn’t particularly well justified. Why, for example, were there certain papers that had to be rated as neutral (didn’t quite understand how you defined or determined this). It also wasn’t quite clear the significance of the discipline dependence.

    I also found the last paragraph of your conclusion rather unfortunate. You seem to be saying that the consensus they get is correct and that there is no doubt that the literature on climate science overwhelmingly supports the hypothesis that it is caused by humans, and yet you state that for the Cook et al. study the “conclusions are unfounded” and you accuse the Cook et al. authors of being “secretive” and “incompetent”. To me, this comes across as very adversarial and is not a style that I would recognise in my field. It’s one thing to accuse others of being incompetent in discussions with colleagues. It’s quite another thing to do it in published work. In my opinion, this reflects more negatively on you than on them and doesn’t – in my opinion – make a particularly positive contribution to the discussion.

  7. I don’t know where the 40 comes from. It is not in the data. It rather strange that you rank 12,000 papers on a scale from 1-7, and then go back to rerank 1,000 papers on a scale 1-4a,b-7.

    I have three choices:
    a. shut up
    b. destructive comment
    c. constructive comment

    a. is wrong
    c. is not an option. I don’t have the resources to redo what they did, and I think it is silly to search a large number of papers that are off-topic; there are a number of excellent surveys of the relevant literature already, so there is no point in me replicating that.

    that leaves b

  8. Fair enough. At least you’re honest 🙂 I do think that this is a rather unfortunate manner in which to respond and does go against what I would normally regard as standard scientific/academic discourse. However, I guess there aren’t actually any rules, so – in some sense – anything goes.

  9. On Twitter, Dana has confirmed that it was 5 out of 1000 extrapolated to 40 out of 7970. Although this could have been clarified in the paper, this would suggest – to me – that the study indicates that 97% of articles stating a position on AGW, endorse AGW.

  10. Dana’s been saying all sorts of things.

    I’ve asked for the data. If true, it is easy to demonstrate.

    I’ve also asked for the survey protocol to see whether the fourth rating was planned from the beginning.

  11. I would tend to agree that making data available so that others can check is the right thing to do.

    Having said that, however, you do appear to have made up your mind. You’ve referred to the survey strategy as a “load of nonsense”, the results are “unfounded”, and accused the authors of being effectively “incompetent”. That’s without seeing the data. Is it likely that you would look at the data objectively and in an unbiased way? Possibly, but from what you’ve written about this already, it’s certainly my opinion that it’s highly unlikely.

  12. Dear Wott,

    Richard did made up his mind quite soon, when he said that the study undersampled seminal papers:

    But we can see that Richard unmade (down?) his mind too, when he realized that the study oversampled seminal papers:

    We can observe that, in this case, that Richard’s mind has behaved in an homoscedastic manner.

  13. Dear Richard,

    I don’t think your trichotomy captures your options very well. You don’t have to reproduce Cook & al’s experiment to satisfy your (c). That is, you have forgotten about this option:

    (c2) Prescribe how to redo that research by clearly stating a specification you’d consider valid.

    You do have the resources to do that. Or at least you do choose to invest your resources in less constructive endeavours. See for instance this morning’s tweet where you took the pain to find duplicate records in the data.

    No such specification of what would be a study that would satisfy your future demands closes your formal comment. Thus, your criticism offers no explicit way to improve future research. Your focus is on destroying a paper you consider silly, and that’s about it. This behaviour, and your own testimony to Wott here, are not reconcilable with the declaration of interest regarding those who, presumably (h/t your own comment) like you, share “concerns about “correct methods” or “quality of research”.

    Instead of focusing on these concerns, your comment begins with an editorial on the consensus argument, an editorial that has nothing to do with your comment. An editorial that clearly reveals, if we’re to follow its conclusion, that you are just doing a political hit job. To that effect, please consider including your affiliation to the GWFP, at least for honesty’s sake.

    You do have a choice, Richard. You should own what you’re doing. And what you’re doing right now does not hold to the implicit standards you are holding your target.

    I still hope you will take the necessary steps to correct that situation. To that effect, I will continue to tweet my comments about your drafts.

  14. Bernard J. says:

    Perhaps Richard Mullerand BEST could have a bash at it. Surely if they found the same result the denialists would finally accept the result.

    Oh, that’s right…

  15. Bernard Murphy says:

    I am a follower of many climate scientists, environmental journalists, ect on twitter, including both Dana Nuccitelli and Richard Tol. What appears to have been the driver in this debacle is Dana’s basically calling Richard a denier because he dared question the methodology used in Cook et al. If Dana is really starting to believe that anyone who questions science is now somehow “anti-science” or a “science denier”, then he really needs to take a step back, or better still, in the interests of scientific discourse as it was intended to be, he needs to take a break.

  16. And Richard Tol appears to be trying to publish a paper in which he claims the survey strategy used by Cook et al. is “complete nonsense”, the results are “unfounded” and the authors are “incompetent”. Bit of chicken-egg going on here I think.

  17. Bernard Murphy says:

    I actually agree that Richard will not come out of this sorry saga smelling of roses either.

  18. Tom Curtis says:

    wottsuptwiththatblog says of Tol, “At least you’re honest”

    You are giving Tol more credit than he deserves. His claim that he does not have the time for a constructive criticism is sheer bunk.

    Taking one example, he corrected his first draft claim that:

    “In fact, the paper by Cook et al. may strengthen the belief that all is not well in climate research. For starters, their headline conclusion is wrong. According to their data and their definition, 98%, rather than 97%, of papers endorse anthropogenic climate change. While the difference between 97% and 98% may be dismissed as insubstantial, it is indicative of the quality of manuscript preparation and review.”
    (My emphasis)

    by adding the footnote that:

    “1 Cook et al. arrive at 97% by splitting the neutral rate 4 into 4a and 4b, but only for 1,000 of the 7,970 papers rated 4; data are not provided. It is unclear whether they found 40 in the sample of 1,000, or 5 and scaled it up to 40 for the 7,970 neutral abstract. If the former is true, then 319 should have been reclassified. The headline endorsement rate would be 91% in that case. No survey protocol was published, so it is unclear whether the 4 ad hoc addition.”

    So, on the evidence available to him all he knows is that Cook et al’s headline result may be the result of the correct projection of a subsidiary survey and is therefore in no way indicative of poor manuscript preparation or review. These details, however, are consigned to a footnote, while the original attempt at condemnation remains in the body of the text. The difference between 97% and 98% is only indicative of Tol’s failure to read the paper accurately, and his insistence on retaining the original text clearly marks his intention to condemn regardless of the merits of his argument.

    His proper course of action given the additional information should have been to remove the original paragraph from the manuscript. Discussion of the “issue”, if included should have been consigned to an additional item in the body of the text. Even then, the unverified suggestion that Cook et al failed to perform an simple and obvious projection from the subsidiary survey is unwarranted. Cook et al should have made their method clearer by including the data from the subsidiary survey in the SI. But that is a quibble having no impact on the headline result. But pointing this out would have taken no more time or effort than Tol’s chosen course of retaining the implicit slur while adding a footnote that completely undercuts the point he tries to make.

    Another example is Tol’s comment that:

    “The Web of Science provides aggregate statistics for any query results. Figure 2 compares the disciplinary composition of the larger sample to that of the smaller sample. There are large differences. Particularly, the narrower query undersamples papers in meteorology (by 0.7%), geosciences (2.9%), physical geography (1.9%) and oceanography (0.4%), disciplines that are particularly relevant to the causes of climate change.”
    (My emphasis)

    This restrained comment contrasts with his clear statement in other cases that the detected skew in samples he thinks is likely to bias the results in favour of endorsements, eg:

    “The data behind Figures 3 and 4 suggest that the smaller sample favoured influential authors and papers, who overwhelmingly support the hypothesis of anthropogenic climate change.”

    The reason for the restraint in the former case is revealed in an email to me in which Tol states:

    “Cook et al. undersampled meteorology, oceanography, and geophysics journals, which suggests that they underestimated endorsement.”

    It is evident that when Tol discovers a skew in the sample he thinks will bias the result in favour of endorsement, he says so up front. In contrast, when he thinks the skew will bias the result against endorsement he merely mentions the skew and not (what he considers to be) the probable consequences. Again it takes no more effort to mention a negative bias than it does to mention a positive bias. The negativity then, is by construction. It represents a deliberate policy based on political intentions, not time constraints.

    A third example comes from his analysis of skewness of the sample relative to disciplines in WoS. Using data Tol has provided me, I have estimated the number of papers in the Cook et al survey from disciplines which are over represented relative to Tol’s preferred search terms (5883) and those which are under represented (5985). (The sum is 76 less than 11,944 papers rated but not excluded as per Cook et al. This is due to rounding errors and the fact that some disciplines are not represented in both samples,making scaling of the results difficult. The difference should not be significant). It is also possible to estimate the number of excess abstracts in disciplines which are over represented (1711) and those which are under represented (1714).

    These data should have been included by Tol in his analysis. The near equality of the figures means it is almost impossible that the skew in subjects has resulted in a bias in the headline result. In fact, given that the subjects which are over represented cannot have more than 100% endorsements excluding abstracts rated (4); it is impossible for papers from subjects that are under represented to have less than 96% endorsements in aggregate. That means that the maximum variation in endorsement percentages resulting from the skewness Tol draws attention to is between 97.4 and 98.6%.

    This is something highly relevant to Tol’s critique of Cook et al. It only takes about half an hour to calculate these facts, so Tol’s failure to do so is not due to time constraints. Again the simplest explanation is a straightforward bias towards including only negative criticisms; and towards excluding context that allows assessment of the impact of those criticisms.

    A fourth and final example comes from Tol’s new and unsurprising discovery that self rating respondents do not match in proportion the rated papers. Unsurprising because people with strong positions (endorsement or rejection) are more likely to want their opinions registered and hence more likely to respond. Given this the result is as likely to show bias in the rate of response rather than show the abstract ratings are in error. The direct comparison between absract ratings and self ratings is not straightforwardly projectable, but does clearly show the abstract ratings to be conservative, ie, biased towards a rating of (4).

    Though not straightforwardly projectable, however, we can project them on the assumption that self ratings are representative. Doing so shows that if there was no skewness between abstract and self rating numbers, the abstract ratings would have reported 96.6% endorsing the consensus, with 3.4% rejecting or uncertain on the consensus. In other words, the skewness identified by Tol would have had an impact of only 0.5% on the headline result. Again, calculating this result is straighforward and requires minimal time. While reporting it, however, is very useful in placing the skewness reported in table 5 of the paper in context, it destroys that data as a useful negative talking point. Therefore Tol could not find the time for this simple analysis.

    These four examples do not address the major flaws in Tol’s critique. In fact, where I to do so it is simple to show that Tol’s critique is based on superficial data analysis and a fundamental misunderstanding of basic terms in the paper. These examples show, however, that the negativity of Tol’s critique is based on a predetermined desire to undermine the paper, whose results he finds politically inconvenient. His choice to be destructive in his criticism is not because of time constraints, but because he needs to generate, and disseminate “talking points” to allow those inclined to not think about the implications of Cook et al.

    That clear motive, evident in both his tweets and his comment strongly suggests that corrections of his errors will not be incorporated into his comment. Certainly his comment will not include estimates of the likely impact of the skewness he identifies on the headline result except where (as with his footnote mentioned in the first example, absurd suppositions allow him to quote a large impact.

  19. Tom Curtis says:

    Bernard, the “driver” of this debacle is Tol’s initial intemperate and unjustified criticism of Cook et al. His opening sally was to treat a low sample poll (three scientists) from a biased sample as more significant than the self ratings reported in the paper. His analysis has not improved.

  20. Yes, I noticed some of those tweets. Seemed that Richard had accepted an error in his analysis and then seemed to ignore that he had made such an acknowledgement. Found it all a little confusing.

  21. Bernard Murphy – that is not an accurate representation of the series of events. On Twitter, Tol had misrepresented our study (equating the abstract ratings with author self-ratings), then re-Tweeted deniers like Marc Morano misrepresenting our paper. As a result of this behavior, I commented that “I didn’t peg [Tol] as a denier before”, my intent being to note that he was behaving as a denier (which he clearly was).

    It’s not remotely accurate to say I believe that anyone who questions science is anti-science or a science denier. Had Tol said “I think Cook et al. misclassified these papers”, I would have been happy to explain the situation to him. That would be questioning science. That’s not what Tol did – he misrepresented our paper (and I first asked him if he had read it, giving him the benefit of the doubt that he was just misinformed by secondhand information) and then encouraged various denier websites’ misrepresentations of our paper.

Comments are closed.