Yamal IV: Growth Curves and Sample Size
Via Deep Climate, I found this post by Jeff Id at The Air Vent. Comments there and elsewhere lead me to believe there is some confusion about the related question of regional curve standardization and the reason for the importance of sample size in dendrochronology — dendroclimatology in particular — and while this post is only an indirect commentary on Jeff Id’s post, hopefully it will be more broadly useful or stimulate some interesting technical discussion. For more information on regional curve standardization, this book chapter [PDF] is currently your best bet.
Jeff Id fits two separate exponential growth curves to the most recent 12 trees in the Yamal chronology and to the full Yamal series, and notes that they are different. Let’s emulate this here. Let me note first of all that this is an emulation — the published Yamal series uses a time-varying spline fit that I haven’t integrated in my own code.
What I’ve done is align the full Yamal set (blue) and the most recent 12 ring width series (red) by age, assuming no pith offset (that is, assuming the innermost ring in the core or cross section was the innermost ring in the tree). The heavy lines are the mean regional curves. The black line is the Khadyta River mean regional curve.
There are few interesting features, some of them I believe are noted by Jeff in his post. The more recent Yamal trees had a somewhat lower growth rate when they were young than the average of the full set of living and subfossil trees; however, it is not at all out of the range of the full Yamal population. The regional curve for just these twelve is therefore lower than that of the regional curve for the full population. Another feature to note is that the red line (the regional curve for the 12 trees alone) rises at 100 years of age and again near 300 years of age, since these represent the ages of the most recent wider ring widths of several of the individual samples in this small set. Finally, note that the Khadyta River trees as a whole are relatively young, and their growth falls for the most part lower than the regional curves for the Yamal full and recent subsets.
UPDATE: I’m adding here the spline curve fits
So what is the consequence of performing a separate RCS on the recent Yamal series only vs. the full Yamal set?
The 12-series only chronology is somewhat noisier overall, since it also excludes 5 other tree ring series that come into the second half of the 20th century but not all the way into the 1990s. The influence of sample size on the chronology variance can also be seen in the 1600s, probably, when the year-to-year variability is reproduced but the small number of series (prior to 1660 or so, there are only 3 cores) influences the variance. The influence of the different regional curves — shown above — is more difficult to detect, since it is intermingled with the influence of the loss of the other 5 cores, but slightly higher levels in the 12-tree only chronology in the 1600s and parts of the 1700s might reflect it. The most notable difference is therefore perhaps that the recent-tree only chronology is slightly lower than the full Yamal chronology starting in the early 1980s.
So what is going on? In fact, you are witnessing the importance of overall sample size in the specific case of Regional Curve Standardization. It is important to understand the importance of sample replication for two different (but of course complementary) purposes in dendrochronology, specifically when applying Regional Curve Standardization:  Adequate sample replication overall so as to accurately estimate the ‘true’ regional growth curve, and  sample replication through time adequate to estimate the transient climate signal. Remember that the goal of regional curve standardization is to remove a common age-related growth trend while preserving low frequency climate variability — to have any hope of estimating this you need a large number of trees whose actual period of growth was well-distributed over time. The reason for this is that you need to avoid intermingling your climate signal of interest with your age-related growth trend. You can imagine an age-related growth trend estimated by trees of more or less the same age that grew more or less at the same time could intermingle the time-related environmental signals with the age-related geometric growth patterns. On the face of it then, Yamal is a good candidate for RCS since it has a large number of total trees whose actual time of growth is well-distributed over the length of the chronology. Isolating the 12 most recent trees, however, runs the risk of intermingling recent patterns of temperature variability with the trees’ common growth signal. The ‘regional curve’ from just these twelve trees is quite unlikely to be very representative of some significant fraction of the mean regional growth pattern associated with tree age.
The full Yamal regional growth curve is therefore likely to be a much better estimate of the ‘true’ regional growth curve common to trees from the region than a growth curve from a small number of trees growing over a period of anthropogenic climate change, because the climate signal of interest is a common feature of the growth of many of the trees. The lower chronology values in the recent-only chronology is red above is a consequence of at least part of the temperature signal being subtracted because it is intermingled when the regional curve is calculated over only a few trees growing, at the end of their life, in a warming world. Jeff’s post is a little hard to parse in places (for one thing, he keeps referring to ‘climatology’, but I think he mean ‘climatologists’ or the ‘climatology community’), but reading carefully, I think he might recognize this as a potential problem.
Now, the other important part of having multiple samples from the same site is maximize the signal to noise ratio (for our purposes, the signal is climate) at any given time. Dendrochronologists have ways of traditionally estimated whether their chronology is sufficiently well-replicated and contains a common signal, including the Expressed Population Signal or Subsample Signal Strength. Using 20 year windows with 10 years of overlap, the Expressed Population Signal for the 12-series only RCS chronology is consistently above the (arbitrary but historical) 0.85 level back to the 18th century (and, indeed, most of the way back to the earlier parts of the chronology before sample size is reduced to a few cores). For 10 year windows with a 5 year overlap (not something I would consider particularly stable, but it allows us to look at very small slice of time), the EPS exceeds 0.85 from at least 1990 back to beginning of the chronology with only two decades in the 19th and 18th centuries with low interseries correlation. Note that these windows are shorter than we normally use.
My take home message is this and it is intended to be general: it is important to understand the two complementary parts of the importance of samples size in developing RCS chronologies for climate reconstruction. Lots of ring width series are necessary to develop an accurate regional curve. The number of chronologies needed at any given point in time to capture the transient climate signal can be estimated using EPS. Strong average interseries correlation between cores can mean that even relative few trees collectively capture a significant portion — again, as estimated from established metrics — of the climate variance and allow for adequate signal to noise ratios in the mean chronology. Replication gives us increased confidence in the value of the mean chronology, but a strong common signal is an important part of the equation.