Normalization

From Cybis Wiki
Jump to navigation Jump to search

Normalization is, within the context of the CDendro program, the process of transforming (preparing) data before a Correlation analysis. Doing correlation analysis directly on untransformed ring width data is not a way to achieve proper cross dating results. Neither does a Detrending transformation lead to data that works with the correlation analysis.

Proportion of last two years growth

The default transformation of CDendro, is to divide the ring width of the current year with the sum of the ring widths during the last two years. This will give a series of numbers between 0 and 1, i.e.

where y is a year number like 2009, 2008, 2007 etc. and w(y) is the ring width of year y.

I.e. this transformation calculates for each year, that years proportion of ring width growth during the last two years. Within CDendro this transformation is named "Proportion of last two years growth" or "P2Yrs". The effect of the P2Yrs transformation is that low frequency data, i.e. long time variations of growth rate, is removed from the curves to be crossdated. Another property of the transformation is that the resulting curve has no extreme peaks which is essential not to fool the correlation analysis.

The P2Yrs transformation was first put into the predecessor of CDendro, a DOS based program of 1995-1997. The P2Yrs transformation is indeed related to earlier transformations used for crossdating like the Hollstein and Baillie/Pilcher transformations which give almost equally good results but are mathematically somewhat more complicated and not as conceptually clear as the P2Yrs transformation.

Extensive testing [1] has shown that the P2Yrs transformation gives good crossdating results - i.e. a correlation analysis on P2Yrs data will usually sort out the correct match between two curves.

Other moving average transformations

The P2Yrs transformation could be considered a member of a class of "moving average transformations", where the current ring width is divided with a mean value of surrounding ring widths considered to be inside a moving frame. Parameters of this class would then be

  • the width of that frame (divisor frame length),
  • where that frame starts in relation to the current ring (divisor frame offset),
  • whether the current ring is included in the average
  • whether we should take the natural logarithm of the result of the division.

Within CDendro there is a "Toolbox for normalization algorithm based on sliding frame" where you can experiment with variations of the normalization algorithm.

Hollstein normalization

Ernst Hollstein introduced the concept of "Wuchswert" [2] (growth value) where he basically divided this years ring width by the ring width of the previous year. When you plot such a curve of ring width quotes you will often get a number of high peaks which will make the crossdating analysis difficult. Apparently to avoid this problem, Hollstein took the natural logarithm of each quote to get his "Wuchswert". (Taking the logarithm by hand as Hollstein certainly did is not difficult if you use a semi-logarithmic paper for that convertion.)

This can also be written as:

Baillie/Pilcher normalization

The Baillie/Pilcher transformation (presented as early as in 1973) divides this years (current) ring width by the mean value of the surrounding five ring widths including the current ring. Then the natural logarithm is taken on that value as with the Hollstein transformation. [3]

When just "plain t-values", without any further discussions or explanations, are mentioned in literature or reports, it is often the t-values according to Baillie/Pilcher normalization which is referred to.

A weakness with the Hollstein and Baillie/Pilcher transformations

When there is one or more very very narrow ring widths in two curves being compared, then the correlation analysis will be fooled by the "overall shape" of the curves where the effect of the narrow rings will dominate the view. Normally this does not happen with actual measurement data, though if e.g. a missing ring is replaced by a very small number this may happen. It may also happen if data created with the Corridor method is offset to make all values positive (and the smallest value is made very small) and then the data is used as if it was ring width data. See also How to get fooled by your normalization method and some too narrow ring widths

Besancon Index E normalization

The frame length is 7 and centred around the current ring width (w). The current ring width does not take part in the mean value calculation. The minimum and the maximum values within the frame are excluded from the mean value (m). The normalized value normV is calculated as normV = w / m

Logarithm may be optionally applied in two different ways:

A. Log on normal value: normV = log(normV) if normV = 0 then normV = 0.001 to avoid misunderstanding this value as a zero ring

B. Besancon variant of applying logarithm:
valueForLog = 100 * ( w / m - 1 )
If abs(valueForLog) < 0.001 Then valueForLog = 0.001
If valueForLog > 0 then normV = log(valueForLog) else normV = - log(-valueForLog)
If abs(normV) < 0.001 Then normV = 0.001

The Besancon variant seems to give very high standard deviation values when normalized values for a reference curve (a sum-curve) are created as mean values of the normalized values of the collection members.

Notes

  1. Torbjörn Axelson and Lars-Åke Larsson: What is a good TTest value
  2. Ernst Hollstein: "Mitteleuropäische Eichenchronologie". Verlag Philipp von Zabern, Mainz am Rhein, 1980, ISBN 3-8053-00964. Page 14.
  3. Baillie, M. G. L. and J. R. Pilcher A simple crossdating program for tree-ring research. Tree-Ring Society Tree-Ring Bulletin 33:7-14