The 1988 Shroud of Turin Radiocarbon Tests Reconsidered

Bryan J Walsh

Shroud of Turin Center

Richmond Virginia USA

The radiocarbon dating performed on the Shroud of Turin in 1988 by laboratories located in Oxford, Tucson and Zurich concluded with a 95% probability that the linen cloth of the Shroud of Turin dated from between 1260 - 1390AD. A re-analysis of the data used to derive this range of dates suggests that the statistical tests performed earlier assumed ¹⁴C homogeneity in the samples and as a result may have lead to a misleading range of dates. A different series of statistical evaluations has been applied to this radiocarbon date data leading to the conclusion that the Shroud subsamples each contained differing levels of ¹⁴C. An evaluation of this conclusion was conducted and found to be statistically supportable. Further analysis revealed that the sample dates observed were directly related to the physical location of the sample on the Shroud linen. This necessarily implies that the linen samples were non-homogeneous as regards ¹⁴C and the radiocarbon date derived for the Shroud samples are of questionable validity. The hypothesis of a relationship between the sample location on the Shroud cloth and the date measured was evaluated and found to be statistically significant.

The results of the radiocarbon dating of the Shroud of Turin(1) caused many who believed that the cloth which was tested could no longer be considered the authentic burial cloth of Jesus of Nazareth. Further, a number of researchers who had devoted a substantial portion of their research time on issues related to the Shroud saw their funding evaporate or lost further interest. For a number of years, interest in the Shroud waned, only to revive in the late 1990’s in anticipation of two public viewings of the Shroud, one in 1998 and the other in 2000. Further, there have been several proposals in recent years(2)(3) that have offered the possibility that the radiocarbon dating may have been distorted by physical or biological agents.

In 1997 one researcher(4) critically analyzed the statistical evaluation performed in the original dating and found the assumptions made and statistical conclusions drawn to be of questionable certainty. As noted in the original research, “the spread of measurements for sample 1 (Shroud of Turin sample) is somewhat greater than would be expected from the errors quoted”. Applying a c² test to their data, the authors noted, “that it is unlikely that the errors quoted by the laboratories for sample 1 (Shroud of Turin Sample) fully reflect the overall scatter”. This paper has been written to reevaluate the data collected in 1988 and offer alternative and statistically significant explanations for the 1988 measurements.

Statistical Analysis

Radiocarbon dating laboratories attempt to ascertain, using standard scientific methods, the ratios of three carbon isotopes - ¹²C, ¹³C and ¹⁴C - in carbon-bearing samples extracted from objects whose chronological age is to be determined. Since the abundance of ¹⁴C in any object is measured in parts per trillion, radiocarbon laboratories take great care in attempting to remove all exogenous sources of contamination so as to permit precise measurements of the carbon isotopic mix. Once all contaminants have been removed, measurements of ¹²C, ¹³C and ¹⁴C are taken through a sampling procedure in which a series of runs are performed using small sections of the sample provided. The measurements taken on these subsamples are then generalized, using statistical techniques, to obtain conclusions about the whole population - i.e. the object to be dated. These data are then reported along with the statistical and systematic errors observed.

Therefore, a statistical evaluation of the reported data is important in understanding the validity and significance of the conclusions reached in dating an object. In practice, radiocarbon dates are always accompanied by confidence intervals that are mathematical expressions of the range of statistical certainty for the date itself. The precision of any particular radiocarbon date can thus be estimated using statistical techniques. Consequently, evaluating the precision and accuracy of any radiocarbon date involves an analysis of the statistical methodology applied in deriving the radiocarbon date of the object.

To perform an independent statistical assessment on the data reported for the radiocarbon testing of the Shroud of Turin, a series of standard statistical evaluations were performed on the radiocarbon dates for sample AA-3367, 2575 and ETH-3883 - the Shroud of Turin samples. The data originally reported, along with the laboratory that produced it, is shown on Table 1.

Table 1. Radiocarbon dating results by Laboratory

Sample AA-3367 Sample 2575 Sample ETH-3883

Arizona	A1.1b	591	+/-	30	Oxford	O1.1u	795	+/-	65	Zurich	Z1.1u	733	+/-	61
	A1.2b	690	+/-	35		O1.2b	730	+/-	45		Z1.1w	722	+/-	56
	A1.3a	606	+/-	41		O1.1b	745	+/-	55		Z1.1s	635	+/-	57
	A1.4a	701	+/-	33							Z1.2w	639	+/-	45
											Z1.2s	679	+/-	51

A preliminary review of the data reveals that the quoted error reported for the Arizona lab was only about 5/8 the size of the quoted error reported by the other two labs. Quoted error, as described in the Nature paper, is the combined measurement of statistical error, the scatter of results for the standards and blanks used in benchmarking each test run, and the uncertainty of the d¹³C determination. Why the Arizona quoted error was only 5/8 of the error reported by the other labs was not immediately obvious. So, to evaluate the significance of the apparent difference in quoted error, several statistical tests were performed.

The initial test, the Kruskal-Wallis One-Way Analysis of Variance, is a nonparametric statistical analysis employed to determine whether or not 2 or more independent samples are from different populations. Since it is non-parametric, the statistical distribution of the underlying population of radiocarbon dates need not be known. A further advantage is that this technique can be employed when there are differing group sample sizes as there were in the case of the reported radiocarbon dates from each of the labs.

The result of this evaluation required that the hypothesis that the quoted errors reported by each of the labs have the same mean value and are thus from the same statistical population should be rejected. To evaluate the source of the difference noted, a series of Bonferoni pair-wise T-tests was performed using the unpooled variances of each of the lab error observations. This Bonferoni procedure adjusts upward the level of critical probability required before results can be considered statistically significant:

These evaluations confirmed the initial observation that the Arizona quoted error measurements were statistically significantly different from the quoted error measurements reported by either of the other labs. Several years later it was reported(4) that the Arizona lab had actually produced eight separate measurements rather than the four indicated in the Nature paper. It was revealed that, in 1988, the Arizona lab had been requested by the British Museum - the coordinator of the Shroud radiocarbon test - to combine measurement “runs” if they were conducted on the same day using the same standards and blanks into one “independent” measurement run. This procedure reduced the number of test observations reported by Arizona from 8 to 4. The statistical technique used to combine these observations was based on mathematical statistical procedures developed by Ward and Wilson(5). One of the assumptions in these statistics is that the samples being evaluated come from a homogeneous population.

The observed quoted error that was reported as a result of using this procedure was different from the arithmetic average of the quoted error as is shown below:

Arizona test results - Quoted Error term

Original run errors weighed error reported arithmetic mean error

41, 45 30 43

51, 49 35 50

57, 59 41 58

47, 47 33 47

Unfortunately, the combination of measurement runs and the application of this statistical procedure to combine runs was not mentioned in the original report and it may have had some bearing on the validity of the conclusions reached. The reason for concern about the application of this statistical procedure is best illustrated by an example.

Suppose you conduct a series of ten measurements of two observations each to determine the radiocarbon age of a sample. The results of the measurements are shown below:

Applying standard statistical measurements to this population of observations results in the following:

Arithmetic Mean RC age: 600

Error: +/- 3.44

Applying the statistical techniques employed in dating the Shroud in 1988 results in the following statistical measures:

Mean RC age: 600

Error: +/- 3.35

Next, assume the last five measurements are modified in the manner shown below:

The standard statistical data now become:

Arithmetic Mean RC age: 615

Error: +/- 4.87

Using the Wilson & Ward statistical techniques employed in the 1988 dating of the Shroud sample yields:

Mean RC age: 615

Error: +/- 3.35

Note, there has been no change in the error statistic of the type used in the 1988 dating while the error statistic has increased substantially using more conventional statistical tools. The change in the error term reflects the change in population characteristics using conventional statistical measurements. The Wilson & Ward statistics correctly reflect the value of measurement error but underestimates the population error since the population is no longer homogeneous.

In this illustration, the assumption of homogeneity leads to an underestimate of measurements of population variance when Wilson & Ward statistics were utilized and could lead to incorrect conclusions if applied to determine point estimate confidence intervals, for example. Further, the Wilson & Ward procedure utilized in the 1988 statistical evaluation of the Shroud sample measurements requires that each subsample be drawn from a homogeneous population if the correct measurement of population error is to be derived. If the deposition of ¹⁴C on the Shroud linen were non-homogeneous, then the statistical approach employed in radiocarbon dating the Shroud sample would underestimate the error involved.

Since the combination of error results using the weighed quoted error assumes the error terms have the same statistical distribution for each sub-sample measured, its use would appear to have placed on the measurements taken a presumption of ¹⁴C homogeneity since the technique employed requires that the samples are each drawn from the same population. The fact that the quoted errors reported for Arizona were statistically significantly different from either of the other labs was an indicator that either the statistical technique employed to combine the error terms was creating a problem or was indicative of a major difference in the statistical characteristics of the quoted error measurements themselves. To determine the underlying source of the differences observed, we performed additional statistical evaluations on the radiocarbon date measurements themselves

The radiocarbon date measurements themselves, as reported in Nature on Table 1, were evaluated utilizing the same series of statistical tests as were employed to evaluate the error terms.

In this non-parametric test, the hypothesis that the sample dates are statistically indistinguishable must be rejected. The implication of this is that the labs were measuring different mean dates. The source of the difference in the radiocarbon dates was then ascertained using two other tests:

This Bonferoni pairwise T-Test revealed that the Oxford measurements were statistically significantly different from the Arizona measurements at the 95% confidence level.

A randomized Analysis of Variance was also performed on the radiocarbon date measurements to further ascertain if there were any statistically significant differences among the various labs:

The between lab variability was statistically significant at the 96% confidence level. This evaluation serves to confirm the other analyses that there was a statistically significant difference among the lab date measurements and indicates that there is a statistically significant probability that the dates the laboratories reported were actually different. The implication of this is that the labs were actually measuring differing levels of radiocarbon in their respective samples.

To determine the effects of the different measurement error terms on the likelihood that the samples were statistically different, it was decided to incorporate the measurements of the sample quoted error into the statistical analysis scheme by using measures for the 1 s quoted error, effectively doubling the number of observations. Thus, the Arizona measurement of 591 +/-30 was replaced by the two measurements 561 and 621. This process was applied to each of the measurements made at each of the labs. The statistical results are shown on the tables that follow:

By this measure, the radiocarbon dates incorporating the quoted error as part of the measurement, shows a statistically significant likelihood, at the 97% confidence level, that the sample date measurements from the three labs are actually different and should not be combined into one date. Once again, a series of Bonferoni T-tests were employed to ascertain the source of this:

Here, the Oxford results are statistically significantly different from the Arizona results at the 95% confidence level. Finally, a randomized Analysis of Variance was conducted on this data:

Once again the hypothesis that there are no statistically significant differences among the lab measurements was rejected.

Since the samples used to derive these dates were drawn from the same parent sample and the labs exercised great care in taking their measurements, either the combination of pre-treatments the samples had been exposed to had somehow affected the residual amount of radiocarbon on the cloth - a possibility not supported by a statistical analysis of the differences in treatment modalities and unlikely considering the lack of any such effects on the control samples used - or the radiocarbon itself must be non-uniformly distributed through the fabric sample.

Whatever its source, a statistical analysis of the radiocarbon dates reported in 1988 shows significant statistical differences between the Oxford observations and the Arizona observations. It should be noted that the head of one of the labs that originally conducted the radiocarbon tests on the Shroud subsamples agrees with the determination that the Oxford lab results were statistically significantly different from the others. Robert E.M. Hedges of Oxford University’s Laboratory for Archaeology, who was a participant in the original radiocarbon dating of the Shroud, noted in 1997(6) that,

“in taking only the shroud result, there was a just statistically significant difference between Oxford's

result and the other two laboratories (this is most likely to be due to an underestimate - of 5-10 years -

of the errors by the laboratories; in any case, in the context of the question whether the Shroud date could

be in error by centuries, the difference is negligible)”.

Dr. Hedges comment, while recognizing the statistical fact that the Oxford results were significantly different, appears to misinterpret the statistical nature of this difference. Statistically, the sample tested should have been rejected as being non-homogeneous. If the difference observed was related to an understatement of the error by the other laboratories as Dr. Hedges asserted, then the 95% confidence interval originally noted for the Shroud dating must necessarily be incorrect and an amended date and confidence interval should have been produced. Further, the statistical results appear to suggest that the samples evaluated in 1988 were non-homogeneous as far as their ¹⁴C content was concerned. The reason for this is not readily apparent until one examines the location on the Shroud cloth from which the sample(s) were extracted.

_________________________________________________________________________________________________________

To proceed to the second part of this research paper, press here.

To Proceed to Barrie Schwortz’s Shroud of Turin website, press here.

To return to our homepage, press here.

_________________________________________________________________________________________________________

References

1. Jull, A.J.T., and others, Radiocarbon dating of the Shroud of Turin, Nature, 337, 611-615, 16 February 1989.

2. Kouznetsov, D.A., and others, A Re-evaluation of the Radiocarbon Date of the Shroud of Turin Based on

Biofractionation of Carbon Isotopes and a Fire-Simulating Model, Archaeological Chemistry - Organic,

Inorganic, and Biochemical Analysis, Mary Virginia Orna, Editor, American Chemical Society, (1996).

3. Garza-Valdes, L. A., Scientific Analysis of the Shroud of Turin, Proceedings of the Texas Medieval Association,

(September 11, 1993).

4. Van Haelst, R., Radiocarbon Dating the Shroud - A Critical Statistical Analysis, Barrie Schwortz website:

http://www.shroud.com/vanhels3.htm, (1999).

5. Ward, G.K. & Wilson, S.R., Archaeometry 20, 19-31 (1978).

6. Hedges, R.E.M. A Note Concerning the Application of Radiocarbon Dating to the Turin Shroud,

Approfondimento Sindone, 1, 1 (1997).

Note: additional references are listed at the end of Part II.