Robert G. Brown (who posts as rgbatduke) has a new “sticky” post at Watts Up With That (WUWT) called the ensemble of models is completely meaningless, statistically.
What Robert G. Brown says in this post is
by forming a mean and standard deviation over model projections and then using the mean as a “most likely” projection and the variance as representative of the range of the error, one is treating the differences between the models as if they are uncorrelated random variates causing > deviation around a true mean!.
What I take this to indicate is that if you have a set of different models each with a most-likely projection, you can’t simply add them all to get a mean (which you interpret as the most likely result) and then use the variance to determine the error in the model estimates. Indeed, I would agree with this. This is not the correct way in which to determine the most likely result and in which to determine the likely range.
What would be the correct procedure? Well, as far as I understand it you would normally consider a single model. That single model will have various calculations associated with the known physics and chemistry, and various parameters that have a range of uncertainty. A single run of the model will produce a single result for a given set of parameters. If the parameters are uncertain (as they almost certainly are) then one would normally run the model many times with different choices of parameters. One can then combine the different results to get a mean and to get the range of likely results. Of course, some parameter values are more likely than others, so some model results are more likely than others and so when combining the models results, this has to be taken into account.
When I look at the typical figure from the IPCC’s AR5 leaked report – which is shown below – I see 4 different models, each with a range which indicates the uncertainty in each model’s predictions. I don’t know for certain that they’ve done the mean and error calculations correctly, but this does seem consistent with what I would expect.
I don’t know if all climate model predictions published in the literature have done their error analysis correctly (it is something that scientists do get wrong at times) but I’m unaware of an example where they’ve simply added together the mean results for a large ensemble of different models. I have seen this somewhere though. Where was it? Oh yes, it was in a WUWT post by Roy Spencer called Climate modelling epic FAIL – Spencer: the day of reckoning has arrived. In this post Roy includes the figure below that appears to be simply an ensemble of 73 model runs which have been averaged to get a mean and in which it is implied that the range of the different results gives an indication of the error.
I think I agree with Robert G. Brown that doing what Roy Spencer has done so as to compare measurements with model results is an “abuse of statistics“. It’s good to see WUWT including posts that criticise bad practice on their own pages. To be honest, I would quite like to see more of this.