Keith Briffa and Tom Melvin have posted an interesting and thorough examination of the Yamal data here:
This now supersedes much if not all additional analysis I had considered for possible future posts.
First off, thanks to delayed.oscillator for inviting me to participate. I’m excited to be involved. Secondly, please forgive some of the formatting – I’m still new to this whole blogging thing. And away we go …
A recent paper (Dengel et al), published in the usually rigorous journal New Phytologist purports to show a statistically significant positive correlation between Galactic Cosmic Ray (GCR) fluxes and tree growth of sitka spruce (Picea sitka) from a plantation or managed forest in Scotland. Their hypothesis follows: High GCR fluxes stimulate aerosol nucleation in the atmosphere, leading to higher aerosol concentrations. The increased aerosol concentrations increase scattering of radiation, increasing the ratio of diffuse to direct radiation fluxes. The increased diffuse radiation penetrates more deeply into tree canopies, stimulating photosynthesis and growth.
Others have already shown that links between GCR fluxes and the Earth’s climate are quite dubious. Here we show that the conclusions drawn by the authors of this study are likely based on errors in their statistical analysis and faults in their study design.
The primary evidence in this paper for the GCR/diffuse radiation link to tree growth is tied to two Pearson correlations. One is a correlation between tree growth and March-August total diffuse radiation (n=45, r=+0.29, P=0.05). The second is a correlation between tree growth and GCR fluxes (n=45, r=+0.39, P=0.008). On the surface, both correlations appear to pass the traditionally accepted significance levels of P<=0.05, supporting the authors’ hypothesis.
The authors use a nominal sample size equal to 45, the total number of years in their dataset. This is fine, assuming that the underlying observations represent independent samples, e.g. a ‘white noise’ time series. To test this, we digitized the tree ring growth anomaly time series from the paper and estimated the lag-1 autocorrelation. If each observation were independent, the autocorrelation should be near 0. For this series, the autocorrelation is actually quite high (r=0.484) and significant (P=0.0008), indicating this time series has significant persistence (‘red’). The individual observations are not independent, and the significance of a Pearson correlation with n=45 will be overestimated.
This is easy enough to account for, however:
where n=original sample size, r is the autocorrelation of the underlying time series, and n’ is the new (effective) sample size. Plugging into this equation, we find that the sitka spruce tree growth time series has an effective sample size of only 16, instead of 45.
This change will not influence the actual correlation, but it will affect the test statistic used to determine statistical significance. For both correlations, we can recalculate the test statistic and significance level with the new effective sample size of 16:
GCR____ n T Statistic Significance
Original 45 2.7773 0.008
Revised 16 1.5647 0.139
Radiation n T Statistic Significance
Original 45 1.9870 0.050
Revised 16 1.1195 0.281
In both cases, the correlations now do not meet the criteria (P<0.05) to be considered statistically significant.
It is also very clear, from reading the paper, that the researchers were considering many, many possible associations — looking for a statistical correlation between the tree rings and some variable, regardless of whether there was a strong a priori theoretical basis to think there should be. Some of their correlations make sense – boreal summer temperatures concurrent with the growth year, for example, is reasonable since one would expect trees growing in Scotland could (possibly) be sensitive to temperatures during the growing season. Others are more tenuous, as when the authors attempt to correlate tree growth to diffuse radiation from the previous year.
In cases where one is data mining (as in this paper), it is important to guard against significant correlations that may be due simply to the overwhelming number of statistical tests made. Conceptually, it helps to think of the P value as your percent chance of a ‘false positive’, the chance that a statistically significant correlation is due to chance, rather than something meaningful. For most purposes, a P value of 0.05 (a 5% chance of a false positive) or less is considered acceptable. Each time you attempt another test, however, you increase the opportunity for this type of error. So, if two tests are conducted, the chance of a false positive, at a P=0.05 acceptable threshold, is actually 9.75% (assuming independent comparisons). In the case of total diffuse radiation, Dengel et al compared their tree growth time series against total diffuse radiation for each month of the current and previous growth years, a total of 24 tests (70.8% chance of a false positive). This makes it very likely that some of the correlations will be significant due to chance alone.
To adjust for this, researchers may use some flavor of what is called a Bonferroni correction, modifying the original P value to account for multiple comparisons. One simple way to do this is to divide your original acceptable threshold by the number of tests you conduct, increasing the burden of proof to accept a significant result. In the diffuse radiation example (tree growth versus diffuse radiation), that means a P=0.05 should actually be P=0.05/24=0.002. In other words, to have a P value confidence of 0.05, you actually need to meet a much stricter threshold, P=0.002. This burden is not met, either in the original analysis or with the adjusted sample size. The Bonferroni correction leads to less false positives (Type I errors), although it may increase the rate of false negatives (Type II errors).
Aside from not accounting for these features of their data and analysis, there are also some methodological oddities in the study design, where Dengel et al deviate from some standard practices in dendrochronology. First, they do not report many of the standard statistics used to assess the quality of a tree ring chronology-e.g. the mean interseries correlation (the common signal amongst trees) nor the mean sensitivity (a measure of a variable vs. complacent the ring width series). This makes it impossible for a reader to assess how well the trees correlate with each other, and whether the chronology really represents a common signal among the trees rather than simply an assemblage of noise. Second, they appear to use essentially randomly sampled trees from a managed forest. Normally, to find a tree (or set of trees) with a strong climate response researchers target trees where climate is the most limiting factor for growth. This often means trees near their climatic limits, where they may be stressed by temperature, moisture, or even radiation. Plantations or managed forests are typically not limited by these factors, because the goal is to manage growth for some purpose (e.g., timber, wildlife, et). Often this management can be quite intensive, involving fertilizer application, protection from pests, and thinning. Even at a location where tree growth is limited by climate, trees are not simply randomly sampled to try to find a climate signal. In particular, trees that may be influenced by competitive interactions with other trees (shading, etc) are generally avoided, because the signal in the tree rings will not necessarily best represent a response to larger scale climate variability. Randomly sampling trees through an even aged stand, as appears to be done in this study, will likely result in a large-scale signal that is equivocal or, worse, misleading. This is evidenced clearly by Figure 2 in the paper, where the authors attempted (largely unsuccessfully) to correlate tree growth against precipitation, temperature, and vapor pressure deficit. Dengel et al also use very short time series (only 45 years of growth), truncate the trees such that the juvenile growth is partially removed, and appear to use a relatively stiff spline in their detrending (potentially problematic in trees that may have experienced growth changes related to stand dynamics).
The criticisms I have made are not particularly nuanced or obscure, and are largely standard practice in climatology and dendrochronology. The lack of adherence to these practices likely led Dengel et al astray. I largely suspect their results will not be reproducible in other studies with a more typical design and analysis, but time will tell.
Please note that from now on I will summarily ‘unapprove’ any comment that wanders off topic from the post to which it is attached. I’ve previously stated I have no interest in revisiting battles that some of you have been fighting on the internet for years now. You have many venues for that. This place will be different. This is my house, and I will not tolerate it. Full stop.
A second editor and contributor, Transient Eddy, now joins the team at Delayed Oscillator. Please treat him with all the courtesy and respect you’ve directed toward me thus far.
Via Deep Climate, I found this post by Jeff Id at The Air Vent. Comments there and elsewhere lead me to believe there is some confusion about the related question of regional curve standardization and the reason for the importance of sample size in dendrochronology — dendroclimatology in particular — and while this post is only an indirect commentary on Jeff Id’s post, hopefully it will be more broadly useful or stimulate some interesting technical discussion. For more information on regional curve standardization, this book chapter [PDF] is currently your best bet.
Jeff Id fits two separate exponential growth curves to the most recent 12 trees in the Yamal chronology and to the full Yamal series, and notes that they are different. Let’s emulate this here. Let me note first of all that this is an emulation — the published Yamal series uses a time-varying spline fit that I haven’t integrated in my own code.
What I’ve done is align the full Yamal set (blue) and the most recent 12 ring width series (red) by age, assuming no pith offset (that is, assuming the innermost ring in the core or cross section was the innermost ring in the tree). The heavy lines are the mean regional curves. The black line is the Khadyta River mean regional curve.
There are few interesting features, some of them I believe are noted by Jeff in his post. The more recent Yamal trees had a somewhat lower growth rate when they were young than the average of the full set of living and subfossil trees; however, it is not at all out of the range of the full Yamal population. The regional curve for just these twelve is therefore lower than that of the regional curve for the full population. Another feature to note is that the red line (the regional curve for the 12 trees alone) rises at 100 years of age and again near 300 years of age, since these represent the ages of the most recent wider ring widths of several of the individual samples in this small set. Finally, note that the Khadyta River trees as a whole are relatively young, and their growth falls for the most part lower than the regional curves for the Yamal full and recent subsets.
UPDATE: I’m adding here the spline curve fits
So what is the consequence of performing a separate RCS on the recent Yamal series only vs. the full Yamal set?
The 12-series only chronology is somewhat noisier overall, since it also excludes 5 other tree ring series that come into the second half of the 20th century but not all the way into the 1990s. The influence of sample size on the chronology variance can also be seen in the 1600s, probably, when the year-to-year variability is reproduced but the small number of series (prior to 1660 or so, there are only 3 cores) influences the variance. The influence of the different regional curves — shown above — is more difficult to detect, since it is intermingled with the influence of the loss of the other 5 cores, but slightly higher levels in the 12-tree only chronology in the 1600s and parts of the 1700s might reflect it. The most notable difference is therefore perhaps that the recent-tree only chronology is slightly lower than the full Yamal chronology starting in the early 1980s.
So what is going on? In fact, you are witnessing the importance of overall sample size in the specific case of Regional Curve Standardization. It is important to understand the importance of sample replication for two different (but of course complementary) purposes in dendrochronology, specifically when applying Regional Curve Standardization:  Adequate sample replication overall so as to accurately estimate the ‘true’ regional growth curve, and  sample replication through time adequate to estimate the transient climate signal. Remember that the goal of regional curve standardization is to remove a common age-related growth trend while preserving low frequency climate variability — to have any hope of estimating this you need a large number of trees whose actual period of growth was well-distributed over time. The reason for this is that you need to avoid intermingling your climate signal of interest with your age-related growth trend. You can imagine an age-related growth trend estimated by trees of more or less the same age that grew more or less at the same time could intermingle the time-related environmental signals with the age-related geometric growth patterns. On the face of it then, Yamal is a good candidate for RCS since it has a large number of total trees whose actual time of growth is well-distributed over the length of the chronology. Isolating the 12 most recent trees, however, runs the risk of intermingling recent patterns of temperature variability with the trees’ common growth signal. The ‘regional curve’ from just these twelve trees is quite unlikely to be very representative of some significant fraction of the mean regional growth pattern associated with tree age.
The full Yamal regional growth curve is therefore likely to be a much better estimate of the ‘true’ regional growth curve common to trees from the region than a growth curve from a small number of trees growing over a period of anthropogenic climate change, because the climate signal of interest is a common feature of the growth of many of the trees. The lower chronology values in the recent-only chronology is red above is a consequence of at least part of the temperature signal being subtracted because it is intermingled when the regional curve is calculated over only a few trees growing, at the end of their life, in a warming world. Jeff’s post is a little hard to parse in places (for one thing, he keeps referring to ‘climatology’, but I think he mean ‘climatologists’ or the ‘climatology community’), but reading carefully, I think he might recognize this as a potential problem.
Now, the other important part of having multiple samples from the same site is maximize the signal to noise ratio (for our purposes, the signal is climate) at any given time. Dendrochronologists have ways of traditionally estimated whether their chronology is sufficiently well-replicated and contains a common signal, including the Expressed Population Signal or Subsample Signal Strength. Using 20 year windows with 10 years of overlap, the Expressed Population Signal for the 12-series only RCS chronology is consistently above the (arbitrary but historical) 0.85 level back to the 18th century (and, indeed, most of the way back to the earlier parts of the chronology before sample size is reduced to a few cores). For 10 year windows with a 5 year overlap (not something I would consider particularly stable, but it allows us to look at very small slice of time), the EPS exceeds 0.85 from at least 1990 back to beginning of the chronology with only two decades in the 19th and 18th centuries with low interseries correlation. Note that these windows are shorter than we normally use.
My take home message is this and it is intended to be general: it is important to understand the two complementary parts of the importance of samples size in developing RCS chronologies for climate reconstruction. Lots of ring width series are necessary to develop an accurate regional curve. The number of chronologies needed at any given point in time to capture the transient climate signal can be estimated using EPS. Strong average interseries correlation between cores can mean that even relative few trees collectively capture a significant portion — again, as estimated from established metrics — of the climate variance and allow for adequate signal to noise ratios in the mean chronology. Replication gives us increased confidence in the value of the mean chronology, but a strong common signal is an important part of the equation.