Seeing Red (Noise): Galactic Cosmic Ray Fluxes and Tree Rings
First off, thanks to delayed.oscillator for inviting me to participate. I’m excited to be involved. Secondly, please forgive some of the formatting – I’m still new to this whole blogging thing. And away we go …
A recent paper (Dengel et al), published in the usually rigorous journal New Phytologist purports to show a statistically significant positive correlation between Galactic Cosmic Ray (GCR) fluxes and tree growth of sitka spruce (Picea sitka) from a plantation or managed forest in Scotland. Their hypothesis follows: High GCR fluxes stimulate aerosol nucleation in the atmosphere, leading to higher aerosol concentrations. The increased aerosol concentrations increase scattering of radiation, increasing the ratio of diffuse to direct radiation fluxes. The increased diffuse radiation penetrates more deeply into tree canopies, stimulating photosynthesis and growth.
Others have already shown that links between GCR fluxes and the Earth’s climate are quite dubious. Here we show that the conclusions drawn by the authors of this study are likely based on errors in their statistical analysis and faults in their study design.
The primary evidence in this paper for the GCR/diffuse radiation link to tree growth is tied to two Pearson correlations. One is a correlation between tree growth and March-August total diffuse radiation (n=45, r=+0.29, P=0.05). The second is a correlation between tree growth and GCR fluxes (n=45, r=+0.39, P=0.008). On the surface, both correlations appear to pass the traditionally accepted significance levels of P<=0.05, supporting the authors’ hypothesis.
The authors use a nominal sample size equal to 45, the total number of years in their dataset. This is fine, assuming that the underlying observations represent independent samples, e.g. a ‘white noise’ time series. To test this, we digitized the tree ring growth anomaly time series from the paper and estimated the lag-1 autocorrelation. If each observation were independent, the autocorrelation should be near 0. For this series, the autocorrelation is actually quite high (r=0.484) and significant (P=0.0008), indicating this time series has significant persistence (‘red’). The individual observations are not independent, and the significance of a Pearson correlation with n=45 will be overestimated.
This is easy enough to account for, however:
where n=original sample size, r is the autocorrelation of the underlying time series, and n’ is the new (effective) sample size. Plugging into this equation, we find that the sitka spruce tree growth time series has an effective sample size of only 16, instead of 45.
This change will not influence the actual correlation, but it will affect the test statistic used to determine statistical significance. For both correlations, we can recalculate the test statistic and significance level with the new effective sample size of 16:
GCR____ n T Statistic Significance
Original 45 2.7773 0.008
Revised 16 1.5647 0.139
Radiation n T Statistic Significance
Original 45 1.9870 0.050
Revised 16 1.1195 0.281
In both cases, the correlations now do not meet the criteria (P<0.05) to be considered statistically significant.
It is also very clear, from reading the paper, that the researchers were considering many, many possible associations — looking for a statistical correlation between the tree rings and some variable, regardless of whether there was a strong a priori theoretical basis to think there should be. Some of their correlations make sense – boreal summer temperatures concurrent with the growth year, for example, is reasonable since one would expect trees growing in Scotland could (possibly) be sensitive to temperatures during the growing season. Others are more tenuous, as when the authors attempt to correlate tree growth to diffuse radiation from the previous year.
In cases where one is data mining (as in this paper), it is important to guard against significant correlations that may be due simply to the overwhelming number of statistical tests made. Conceptually, it helps to think of the P value as your percent chance of a ‘false positive’, the chance that a statistically significant correlation is due to chance, rather than something meaningful. For most purposes, a P value of 0.05 (a 5% chance of a false positive) or less is considered acceptable. Each time you attempt another test, however, you increase the opportunity for this type of error. So, if two tests are conducted, the chance of a false positive, at a P=0.05 acceptable threshold, is actually 9.75% (assuming independent comparisons). In the case of total diffuse radiation, Dengel et al compared their tree growth time series against total diffuse radiation for each month of the current and previous growth years, a total of 24 tests (70.8% chance of a false positive). This makes it very likely that some of the correlations will be significant due to chance alone.
To adjust for this, researchers may use some flavor of what is called a Bonferroni correction, modifying the original P value to account for multiple comparisons. One simple way to do this is to divide your original acceptable threshold by the number of tests you conduct, increasing the burden of proof to accept a significant result. In the diffuse radiation example (tree growth versus diffuse radiation), that means a P=0.05 should actually be P=0.05/24=0.002. In other words, to have a P value confidence of 0.05, you actually need to meet a much stricter threshold, P=0.002. This burden is not met, either in the original analysis or with the adjusted sample size. The Bonferroni correction leads to less false positives (Type I errors), although it may increase the rate of false negatives (Type II errors).
Aside from not accounting for these features of their data and analysis, there are also some methodological oddities in the study design, where Dengel et al deviate from some standard practices in dendrochronology. First, they do not report many of the standard statistics used to assess the quality of a tree ring chronology-e.g. the mean interseries correlation (the common signal amongst trees) nor the mean sensitivity (a measure of a variable vs. complacent the ring width series). This makes it impossible for a reader to assess how well the trees correlate with each other, and whether the chronology really represents a common signal among the trees rather than simply an assemblage of noise. Second, they appear to use essentially randomly sampled trees from a managed forest. Normally, to find a tree (or set of trees) with a strong climate response researchers target trees where climate is the most limiting factor for growth. This often means trees near their climatic limits, where they may be stressed by temperature, moisture, or even radiation. Plantations or managed forests are typically not limited by these factors, because the goal is to manage growth for some purpose (e.g., timber, wildlife, et). Often this management can be quite intensive, involving fertilizer application, protection from pests, and thinning. Even at a location where tree growth is limited by climate, trees are not simply randomly sampled to try to find a climate signal. In particular, trees that may be influenced by competitive interactions with other trees (shading, etc) are generally avoided, because the signal in the tree rings will not necessarily best represent a response to larger scale climate variability. Randomly sampling trees through an even aged stand, as appears to be done in this study, will likely result in a large-scale signal that is equivocal or, worse, misleading. This is evidenced clearly by Figure 2 in the paper, where the authors attempted (largely unsuccessfully) to correlate tree growth against precipitation, temperature, and vapor pressure deficit. Dengel et al also use very short time series (only 45 years of growth), truncate the trees such that the juvenile growth is partially removed, and appear to use a relatively stiff spline in their detrending (potentially problematic in trees that may have experienced growth changes related to stand dynamics).
The criticisms I have made are not particularly nuanced or obscure, and are largely standard practice in climatology and dendrochronology. The lack of adherence to these practices likely led Dengel et al astray. I largely suspect their results will not be reproducible in other studies with a more typical design and analysis, but time will tell.