On dating certainty

Open frame in new window
What thresholds to use when dating?
When you run a correlation analysis between ringwidths from a sample and your reference curve you will get a small list of matching points as shown in the picture above.

There is a marked match at relative year 313 with a correlation value of 0.80
The next best match is down at 0.31.
The difference between the best match and the next best match (BNDiff) is 0.49.

This seems to be an obvious match - and surely it is!

If we hadn't had that long reference curve, what had happened then?
Then we had got this list:

  237  0.31   90
   43  0.31   90
  275  0.30   90
  279  0.26   90
  143  0.26   90

We cannot honestly say that this looks like a hit especially with that no difference in correlation value between the best and the next best match.

So we have to say "we cannot date that sample"!

Now, let us look at another sample!
A correlation test gives the table above.

This does not look as a good match, but it is nevertheless the best with the difference between the best and next best value at BNDiff = 0.06

Later we extend our reference curve and then we find the actual match as shown above.

How to know
where to set the thresholds for correlation value and BNDiff
to be sure that our datings can be trusted?

Using an uncorrelated curve as reference to find correlation values for could-be datings

Uncorrelated curves: Nämdö (red) towards Torne Träsk
When we compare samples from one reference curve with a reference curve which is not correlated to the first, then all matches found represent possible incorrect datings! This gives us a way to see where correlation value thresholds should be set to avoid incorrect datings!

If we compare the whole Nämdö reference of 400 years towards that of Torne Träsk of 1500 years we have a correlation value of only 0.02 at the dating years, though the best matching points with both curves covering each other have correlation values around 0.17.

For the rest of this demonstration, we will use three collections of samples which you can download from the ITRDB:
swed019w.rwl (Torne Träsk), swed302.rwl (Nämdö) and swed022w.rwl (Gotland).

In Settings/Options for normalization and matching, see that the Blocklength is set at 100, that the block distance is at 20 and that the Least overlap in years... is set at 100.

Use the command Collections/Create reference curve from big decadal file to open the file swed019w.rwl
Select it as the reference curve.

Use the command Collections/Create new collection from decadal file to open the file swed302.rwl

To uncheck those samples shorter than 100 years, click the command Collections/Uncheck too short samples

See that "With block checking" is checked and click the button "Test towards reference".
At the end of the report you will get this table:

The top half-table shows matches grouped by correlation values.
The "0.50 column" tells that two blocks matched the reference curve with correlation values in the range 0.50-0.55 and with a difference between best and next best correlation values around 0.13.
Surely this could easily have been mistaken as a dating!

The bottom half-table shows matches grouped by difference between best and next best correlation values (BNDiff).
The "0.10 column" has matches with a BNDiff in the intervall 0.10-0.15.
In this group we have 8 out of 148 tested blocks, with a mean correlation value of 0.46.
The highest correlation value found in this group was 0.52
These are all incorrect dating proposals!

If the Torne Träsk curve had been longer, we had probably seen even more matches with quite high correlation values and high BNDiff values.

This means: If we only accept "datings" with a correlation value higher than 0.50 we will have few incorrect datings if the length of our samples are at least 100 years. Then we can expect the BNDiff value to be above 0.15.

If our reference curve is short, we may find incorrect dating proposals like this with the same correlation value but with much higher BNDiff, as the chance of finding another quite good match is probably proportional to the length of the reference curve. So we cannot rely only on a high BNDiff. Though when we know that the dating of our sample surely falls within the range of the reference curve, then the BNDiff value is an important indicator.

Note: The tables above are only printed when With block checking is ON (is checked)!

Now, let us look at one of those two best incorrect dating proposals!
(Year number 20-119 of NM014 in swed302.rwl)

 303  0.52  100 (1677)
1260  0.38  100 (720) 
 424  0.35  100       
 766  0.34  100       
1302  0.33  100       

with a 150 years long block, the correlation value for best match goes down for this sample:
 303  0.43  150 (1677)
 424  0.34  150 (1556)
1260  0.32  150       
1298  0.28  150       
 852  0.28  150       

Now let us look at what happens if we have much shorter samples - only 60 years in length
but still with the same samples with a length of at least 100 years, but overlay setting at 60 years.
In the tables above 2.4% (5 out of 206) of the samples had their best (incorrect dating proposals!) matching points at correlation values in the range 0.51 - 0.62 with BNDiff values in the range 0.10 - 0.21

I.e. short samples give quite high correlation values though they are incorrect datings!

A low BNDiff value connected with a match is a good indicator that something is wrong with that matching.

These curves (Blockwise/Crossdating quality functions) show the same case as the table above but from a slightly different point of view. Here we have plotted the correlation values (red) and TTest values (blue) with a bold curve at correct matching points - very low values as the Nämdö curve and the Torne Träsk curve are not related). In addition to that we have plotted the best incorrect match (thinner lines) at every point in the reference curve.
Note epecially the TTest values near 6 based on incorrect datings!

Note:If we use thresholds which give 4% incorrect datings, we may sometimes end up with very unreliable datings.
Consider finding a 60 years long sample from a ship. When this does not date towards the reference curve from the current area, we try to date it towards 10 different distant places. Then if the risk of incorrect dating of the sample towards each reference curve is 4% then the risk of incorrect dating towards them all is around 35%!
When we try to date a sample towards many different reference curves, we have to be very careful!

Now let's date the Nämdö samples towards the Nämdö reference curve itself:
1. Set blocklength and least overlap to 100 in Settings/Options for normalization and matching.
2. Click Check all and then Collection/Uncheck too short samples
3. See that With block checking is checked.
4. Click the button Test towards rest of collection.

84% of the blocks tested had a correlation value above 0.55 at their best matching points with a difference between that correlation value and the next best correlation value (BNDiff) of at least 0.16

89% of the blocks tested had a BNDiff-value above 0.20 and no next best (incorrect dating) correlation value above 0.51.

Usually there is one very best (correct) dating and several next best dating proposals. So even when the correct dating is outside of the reference curve, the BNDiff value is quite low for any (incorrect) dating inside the reference curve.

Now, let us use the Gotland curve to date our samples from Nämdö.
Do remember from the run towards the Torne Träsk curve that we will get lots of incorrect datings if we accept correlation values around 0.4. We have to set a threshold for correlation values at 0.50 for blocklength 100 to avoid more than 1% incorrect datings!

This indicates that only 11% of our 100 years long blocks from Nämdö can be dated towards this "nearby-curve" from Gotland.

The only way to get better dating in this case is to have several samples from Nämdö, match these together into quite a long sum-curve and correlate that towards the Gotland curve.

To see how this works:
1. Select the command Collections/Create new collection
2. Click the button "Create a sample with width mean values and normalized data" in the Nämdö (swed302) collection.
3. Save that "sum-sample" as swed302.d12  (Samples/Save normalized data As)
4. Click the button Add to target collection and accept 0 as the offset.
5. Close the swed302.d12 window.
6. Set the block distance to 10 in Settings/Options for normalization and matching
7. In the new target collection, see that With block checking is checked and click the button Test towards reference

We have created a sum-curve from Nämdö and correlated 100 years long blocks from that curve towards the Gotland curve.
91% of these blocks were datable towards the Gotland curve with correlation values above 0.50 and with lowest BNDiff at 0.12.

Compare that 90% to those only 11% of single 100 years long samples/blocks being datable!

These curves (Blockwise/Crossdating quality functions) show the same case as the table above but from a slightly different point of view. Here with have plotted the correlation values (red) and TTest values (blue) with a bold curve at correct matching points. In addition to that we have plotted the best incorrect match (thinner lines) at every point in the reference curve.

Correct datings usually have a correlation value above 0.5 and a TTest value mostly above 6.

These curves reveal something more: The oldest 190 years of the 400 year long Nämdö curve matches much better towards the Gotland curve (TTest= 9-10) than the more recent data. I do not know what this comes from. One interpretation is that the more recent data from Gotland is based on a very local population of trees - to be understood as a broader selection of Gotland samples would match better towards the Nämdö data - but this is only a speculation.

Let us check the limits for that way of doing dating!

We will run blocks from the sum of the Nämdö curve towards the Torne Träsk curve and see how many incorrect dating proposals we get at two different block lenghts - 100 years and 150 years.

1. Click the Torne Träsk curve (swed019w.d12) and select it as the reference.

2. See that blocklength and overlap is set to 100 and blockdistance at 10 in Settings/Options for normalization...

3.Click the target collection. See that With block checking is checked.

4.Click the Test towards reference button.

Once again we see that we can only accept a dating proposal when the correlation value for a 100 years long block is above 0.50!
Otherwise we will get too many incorrect dating proposals! Correlation coefficient 0.5 with a 100 year long block corresponds to a TTest value of 5.7
With the block length at 150 years, we may accept dating proposals when the correlation value is well above 0.40 (the highest incorrect proposal is at 0.39). (Corr.coeff = 0.40 with block length 150 corresponds to TTest = 5.3)
Note, that the BNDiff values are all below 0.08 for the incorrect dating proposals in the table above!

In this test we had only 27 blocks which were also overlapping each other to a high degree. So this test is not too statistically significant! But it gives an indication of what we may accept as a dating and especially it indicates what we may never accept!

To assure that our datings are correct we should demand
high correlation values
TTest-values well above 6
high BNDiff values and
long block lenghts
for any dating proposal!

To get an intuitive feeling for what is a proper dating, you should experiment like this with your own data.

If we have wood from places where trees grow more complacent - e.g. they always have sufficient with water - the reference curves with their standard deviations do not look as distinct as those shown above. They are more "fat and flat" which makes dating more difficult than shown here.

Trees standing side by side may have very different ways of growing. If the roots of one tree always reach into water and another tree is standing on a dry place, then the first tree will not be sensitive to dry periods which will make the trees grow differently. So even if you feel sure that a sample "should match somewhere" that may not be the case!

A note on the normalization algorithm: All values shown above are based on correlation calculations on normalized growth curves which are created with the default normalization algorithm in CDendro, i.e. "proportion of last two years growth", see "Settings/Options for normalization and matching".

If you use another algorithm to create normalized growth curves, you will not get the same correlation values as shown above.

Plotting the quality of matching
See that Crossdating quality test functions are enabled!
Correlation curves, TTest curves and a sort of quality curves can be plotted.
Correlation coefficients and TTest values for each compared point from a correlation analysis.
Crossdating quality functions

The "Blockwise. Correct and best incorrect match" curve plots the correct-match-correlation value for each possible block of a certain length together with the next best (incorrect) correlation values. These curves show how well a correct match can be distinguished from incorrect matches.

Extension of the crossdating quality test function

The blockwise correct and best incorrect match plot has been extended.
The diagram below shows how 100 years long blocks of a short (sum-)sample match towards a longer reference.
There are six curves shown, i.e. at a vertical line crossing all these six curves, we have six values.

Curves 1,3,5 show correlation coefficient values, curves 2,4,6 show TTest values.

Curve 1 (bold red) and 2 (bold blue): corr-values of a 100 years long block from the sample matched towards the reference at correct position.

Curve 3 (thin red) and 4 (thin blue): Highest value found when other blocks are matched towards this same point, i.e. highest incorrect match found for this point.

Curve 4 (orange) and 6 (violet): Highest value found when the block from "1 and 2" above is matched towards all other possible positions, i.e. next best value found.

The distances between curve 1 (bold red) and curve 5 (thin orange) give an indication of the certainty of the matchings between the sample curve and the reference, i.e. correct and next best match. If the curves lay near each other there is no good discrimination between correct and incorrect matches.

When the orange curve lay above the bold red curve, the reference curve alone is not usable for crossdating of the sample curve as highest correlation values are then related to incorrect matching points.



Copyright © 2008, Cybis Elektronik & Data AB, www.cybis.se