Yamal Emulation I
UPDATE: Not following? Try Yamal III, a summary and update.
Steve McIntyre has once again stirred the hornet’s nest of online climate change denial with a hasty modification of the Yamal tree ring data published by Keith Briffa and colleagues in 2008 as part of a paper in Philosphical Transactions of the Royal Society (Phil. Trans. R. Soc. B (2008) 363, 2271–2284). Normally, I ignore McIntyre’s blog because of the juvenile name calling, repetitive nonsense, and the general misunderstanding of huge swaths of proxy paleoclimatology. However, I knew when Roger Pielke Jr. jumped in with support for his collaborator, it merited some attention [insert smiley face emoticon here].
Here, I’m actually interested in the data and the science. The first thing was to emulate the steps that McIntyre had performed (an audit, if you will), leaving aside for the moment whether they are even proper steps from a data point-of-view. McIntyre has rolled his own Regional Curve Standardization code in R, strangely eschewing the freely available software used by dendrochronologists, so I wanted first to ensure there was no significant error in his approach.
I downloaded the original Yamal data from here, and the Khadyta from the ITRDB here. I used ARSTAN to first emulate the original chronology used in Briffa et al. 2008. My regional curve standardized chronology differed slightly from the published version available here, probably because Briffa et al. 2008 used a time-varying spline for the regional curve, but the essential features, including the increasing values in the 20th century, are essentially the same. All these data and programs are publicly available. You can check these results for yourself.
I then added the Khadyta River raw data (which shows evidence of the ‘divergence problem‘) to the set of raw Yamal data, and recalculated the master chronology using regional curve standardization (because I am positive that McIntyre would insist on using all the data). Again, I am not yet addressing here whether it is  appropriate to add these data or  appropriate to not also add other or different data. Here is a comparison of the two versions:
Devastating, I know.
The real differences of course arise at the end, where the modern, relatively short series from Khadyta influence the final chronology.
Adding the Khadyta River series reduces the the level of the chronology though the 1970s and 1980s and into the early 1990s, when those data end. But if one includes both data sets, the series terminates similarly to the original Yamal chronology, of course (because the last few years are only present in the modern trees from Yamal). These changes are potentially important, and the actual scientific questions are interesting (as opposed to the political expedience of selecting certain findings to attack one’s political enemies). But the actual impact on the chronology is still far less than being implied by non-scientist partisans on one side. Why is that?
Part of the difference appears to be McIntyre’s use of a 21 year Gaussian low pass filter. The issues of how to smooth data series to avoid misleading end effects is not a trivial one. I can replicate the strong upturn in the modern era in McIntyre’s graph by using reflected end points. This creates the illusion of a massively unprecedented rise in ring width:
But as the close up view shows, one influence of the filter is such that it helps create the appearance of a massive rise, when annual values in mid-century are actually similar to those in the late 20th century.
There are actually interesting scientific questions (as opposed to the utterly uninteresting partisan griping) at play here that deal with the ‘divergence problem‘. I’ll address those in the next post.
UPDATE [10/06/09]: I should emphasize that this isn’t a comparison of standard RCS software vs. McIntyre’s home grown code. I might fire up R and do that comparison at some point, but I expect any differences to be minor.
UPDATE [10/09/10]: In case it still isn’t clear, my point about smoothing is not that there is anything wrong per se with a 21 point Gaussian filter using reflected endpoints. Rather, I’m pointing out one of the reasons that the initial graphs, posted at Climate Audit but that I first saw at Deltoid, convey such a dramatic rise in the last several years compared to mid-century is the behavior of this particular method.