The 1988 Shroud of Turin Radiocarbon Tests Reconsidered
Bryan
J Walsh
Shroud
of Turin Center
Richmond
Virginia USA
Copyright 1999
All Rights Reserved
The radiocarbon dating performed on the Shroud of Turin in 1988 by
laboratories located in Oxford, Tucson and Zurich concluded with a 95%
probability that the linen cloth of the Shroud of Turin dated from between 1260
- 1390AD. A re-analysis of the data used to derive this range of dates
suggests that the statistical tests performed earlier assumed 14C
homogeneity in the samples and as a result may have lead to a misleading
range of dates. A different series of
statistical evaluations has been applied to this radiocarbon date data leading
to the conclusion that the Shroud subsamples each contained differing levels of
14C. An evaluation of this conclusion was conducted and found to be
statistically supportable. Further analysis revealed that the sample dates
observed were directly related to the physical location of the sample on the
Shroud linen. This necessarily implies that the linen samples were non-homogeneous
as regards 14C and the radiocarbon date
derived for the Shroud samples are of questionable validity. The hypothesis of
a relationship between the sample location on the Shroud cloth and the date
measured was evaluated and found to be statistically significant.
The results
of the radiocarbon dating of the Shroud of Turin(1) caused many who
believed that the cloth which was tested could no longer be considered the
authentic burial cloth of Jesus of Nazareth. Further, a number of researchers who
had devoted a substantial portion of their research time on issues related to
the Shroud saw their funding evaporate or lost further interest. For a number
of years, interest in the Shroud waned, only to revive in the late 1990’s in
anticipation of two public viewings of the Shroud, one in 1998 and the other in
2000. Further, there have been several proposals in recent years(2)(3) that
have offered the possibility that the radiocarbon dating may have been
distorted by physical or biological agents.
In 1997 one
researcher(4) critically analyzed the statistical evaluation performed
in the original dating and found the assumptions made and statistical
conclusions drawn to be of questionable certainty. As noted in the original
research, “the spread of measurements
for sample 1 (Shroud of Turin sample) is somewhat greater than would be
expected from the errors quoted”. Applying a c2 test to their data, the authors noted, “that
it is unlikely that the errors quoted by the laboratories for sample 1 (Shroud
of Turin Sample) fully reflect the overall scatter”. This paper has been
written to reevaluate the data collected in 1988 and offer alternative and
statistically significant explanations for the 1988 measurements.
Statistical
Analysis
Radiocarbon
dating laboratories attempt to ascertain, using standard scientific methods,
the ratios of three carbon isotopes - 12C, 13C and 14C - in
carbon-bearing samples extracted from objects whose chronological age is to be
determined. Since the abundance of 14C in any object is measured in parts per
trillion, radiocarbon laboratories take great care in attempting to remove all
exogenous sources of contamination so as to permit precise measurements of the
carbon isotopic mix. Once all contaminants have been removed, measurements of 12C,
13C and 14C are taken
through a sampling procedure in which a series of runs are performed using
small sections of the sample provided. The measurements taken on these
subsamples are then generalized, using statistical techniques, to obtain conclusions
about the whole population - i.e. the object to be dated. These data are
then reported along with the statistical and systematic errors observed.
Therefore, a
statistical evaluation of the reported data is important in understanding the
validity and significance of the conclusions reached in dating an object. In
practice, radiocarbon dates are always accompanied by confidence intervals that
are mathematical expressions of the range of statistical certainty for the date
itself. The precision of any particular radiocarbon date can thus be estimated
using statistical techniques. Consequently, evaluating the precision and
accuracy of any radiocarbon date involves an analysis of the statistical
methodology applied in deriving the radiocarbon date of the object.
To perform an
independent statistical assessment on the data reported for the radiocarbon
testing of the Shroud of Turin, a series of standard statistical evaluations
were performed on the radiocarbon dates for sample AA-3367, 2575 and ETH-3883 -
the Shroud of Turin samples. The data originally reported, along with the
laboratory that produced it, is shown on Table 1.
Sample AA-3367 Sample 2575 Sample ETH-3883
Arizona |
A1.1b |
591 |
+/- |
30 |
|
Oxford |
O1.1u |
795 |
+/- |
65 |
|
Zurich |
Z1.1u |
733 |
+/- |
61 |
|
|
A1.2b |
690 |
+/- |
35 |
|
|
O1.2b |
730 |
+/- |
45 |
|
|
Z1.1w |
722 |
+/- |
56 |
|
|
A1.3a |
606 |
+/- |
41 |
|
|
O1.1b |
745 |
+/- |
55 |
|
|
Z1.1s |
635 |
+/- |
57 |
|
|
A1.4a |
701 |
+/- |
33 |
|
|
|
|
|
|
|
|
Z1.2w |
639 |
+/- |
45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Z1.2s |
679 |
+/- |
51 |
A
preliminary review of the data reveals that the quoted error reported for the
Arizona lab was only about 5/8 the size of the quoted error reported by the
other two labs. Quoted error, as
described in the Nature paper, is the combined measurement of
statistical error, the scatter of results for the standards and blanks used in
benchmarking each test run, and the uncertainty of the d 13C determination.
Why the Arizona quoted error was only 5/8 of the error reported by the other
labs was not immediately obvious. So, to evaluate the significance of the
apparent difference in quoted error, several statistical tests were performed.
The initial
test, the Kruskal-Wallis One-Way Analysis of Variance, is a nonparametric
statistical analysis employed to determine whether or not 2 or more independent
samples are from different populations. Since it is non-parametric, the
statistical distribution of the underlying population of radiocarbon dates need
not be known. A further advantage is that this technique can be employed when
there are differing group sample sizes as there were in the case of the
reported radiocarbon dates from each of the labs.
The result of
this evaluation required that the hypothesis that the quoted errors reported by
each of the labs have the same mean value and are thus from the same
statistical population should be rejected. To evaluate the source of the
difference noted, a series of Bonferoni pair-wise T-tests was performed using
the unpooled variances of each of the lab error observations. This Bonferoni
procedure adjusts upward the level of critical probability required before
results can be considered statistically significant:

These
evaluations confirmed the initial observation that the Arizona quoted error
measurements were statistically significantly different from the quoted error
measurements reported by either of the other labs. Several years later it was
reported(4) that the Arizona lab had actually produced eight separate
measurements rather than the four indicated in the Nature paper. It was
revealed that, in 1988, the Arizona lab had been requested by the British
Museum - the coordinator of the Shroud radiocarbon test - to combine
measurement “runs” if they were conducted on the same day using the same
standards and blanks into one “independent” measurement run. This procedure
reduced the number of test observations reported by Arizona from 8 to 4. The
statistical technique used to combine these observations was based on
mathematical statistical procedures developed by Ward and Wilson(5). One
of the assumptions in these statistics is that the samples being evaluated come
from a homogeneous population.
The observed
quoted error that was reported as a result of using this procedure was different
from the arithmetic average of the quoted error as is shown below:
Arizona test
results - Quoted Error term
Original run errors
weighed error reported
arithmetic mean error
41,
45
30 43
51, 49 35
50
57,
59
41 58
47,
47
33 47
Unfortunately,
the combination of measurement runs and the application of this statistical
procedure to combine runs was not mentioned in the original report and it may
have had some bearing on the validity of the conclusions reached. The reason
for concern about the application of this statistical procedure is best
illustrated by an example.
Suppose you
conduct a series of ten measurements of two observations each to determine the
radiocarbon age of a sample. The results of the measurements are shown below:
Applying
standard statistical measurements to this population of observations results in
the following: Arithmetic Mean RC age: 600 Error: +/- 3.44 Applying the
statistical techniques employed in dating the Shroud in 1988 results in the
following statistical measures: Mean
RC age: 600 Error: +/- 3.35 Next, assume
the last five measurements are modified in the manner shown below: The standard
statistical data now become: Arithmetic Mean RC age: 615 Error: +/- 4.87 Using the
Wilson & Ward statistical techniques employed in the 1988 dating of
the Shroud sample yields: Mean
RC age: 615 Error: +/- 3.35 Note, there
has been no change in the error statistic of the type used in the 1988
dating while the error statistic has increased substantially using
more conventional statistical tools. The change in the error term reflects the
change in population characteristics using conventional statistical
measurements. The Wilson & Ward statistics correctly reflect the value of
measurement error but underestimates the population error since the population
is no longer homogeneous. In this
illustration, the assumption of homogeneity leads to an underestimate of
measurements of population variance when Wilson & Ward statistics were
utilized and could lead to incorrect conclusions if applied to determine point
estimate confidence intervals, for example. Further, the Wilson
& Ward procedure utilized in the 1988 statistical evaluation of the Shroud
sample measurements requires that each subsample be drawn from a homogeneous
population if the correct measurement of population error is to be derived. If
the deposition of 14C on the Shroud linen were non-homogeneous,
then the statistical approach employed in radiocarbon dating the Shroud sample
would underestimate the error involved. Since the
combination of error results using the weighed quoted error assumes the error
terms have the same statistical distribution for each sub-sample measured, its
use would appear to have placed on the measurements taken a presumption
of 14C homogeneity since the technique employed requires that
the samples are each drawn from the same population. The fact that the quoted
errors reported for Arizona were statistically significantly different from
either of the other labs was an indicator that either the statistical technique
employed to combine the error terms was creating a problem or was
indicative of a major difference in the statistical characteristics of the
quoted error measurements themselves. To determine the underlying source
of the differences observed, we
performed additional statistical evaluations on the radiocarbon date
measurements themselves The
radiocarbon date measurements themselves, as reported in Nature on Table
1, were evaluated utilizing the same series of statistical tests as were
employed to evaluate the error terms. In this
non-parametric test, the hypothesis that the sample dates are statistically indistinguishable
must be rejected. The implication of this is that the labs were measuring
different mean dates. The source of the difference in the radiocarbon dates was
then ascertained using two other tests: This
Bonferoni pairwise T-Test revealed that
the Oxford measurements were statistically significantly different from the
Arizona measurements at the 95% confidence level. A randomized
Analysis of Variance was also performed on the radiocarbon date measurements to
further ascertain if there were any statistically significant differences among
the various labs: The between
lab variability was statistically significant at the 96% confidence level.
This evaluation serves to confirm the other analyses that there was a
statistically significant difference among the lab date measurements and
indicates that there is a statistically significant probability that the dates
the laboratories reported were actually different. The implication of this is
that the labs were actually measuring differing levels of radiocarbon in their
respective samples. To determine
the effects of the different measurement error terms on the likelihood that the
samples were statistically different, it was decided to incorporate the
measurements of the sample quoted error into the statistical analysis scheme by
using measures for the 1 s quoted
error, effectively doubling the number of observations. Thus, the Arizona
measurement of 591 +/-30 was
replaced by the two measurements 561
and 621. This process was applied to each of the measurements made at
each of the labs. The statistical results are shown on the tables that follow: By this
measure, the radiocarbon dates incorporating the quoted error as part of the
measurement, shows a statistically significant likelihood, at the 97%
confidence level, that the sample date measurements from the three labs are
actually different and should not be combined into one date. Once again, a
series of Bonferoni T-tests were employed to ascertain the source of this: Here, the
Oxford results are statistically significantly different from the Arizona
results at the 95% confidence level. Finally, a randomized Analysis of Variance
was conducted on this data: Once again
the hypothesis that there are no statistically significant differences among
the lab measurements was rejected. Since the
samples used to derive these dates were drawn from the same parent sample and
the labs exercised great care in taking their measurements, either the
combination of pre-treatments the samples had been exposed to had somehow
affected the residual amount of radiocarbon on the cloth - a possibility not
supported by a statistical analysis of the differences in treatment modalities
and unlikely considering the lack of any such effects on the control samples
used - or the radiocarbon itself must be non-uniformly distributed
through the fabric sample. Whatever its
source, a statistical analysis of the radiocarbon dates reported in 1988 shows
significant statistical differences between the Oxford observations and the
Arizona observations. It should be noted that the head of one of the labs that
originally conducted the radiocarbon tests on the Shroud subsamples agrees with
the determination that the Oxford lab results were statistically significantly
different from the others. Robert E.M. Hedges of Oxford University’s Laboratory
for Archaeology, who was a participant in the original radiocarbon dating of the
Shroud, noted in 1997(6) that, “in taking only the shroud result, there was a just statistically significant
difference between Oxford's result and the other two laboratories (this
is most likely to be due to an underestimate - of 5-10 years - of the errors by the laboratories; in any case, in the context of the
question whether the Shroud date could be in error by centuries, the
difference is negligible)”. Dr. Hedges
comment, while recognizing the statistical fact that the Oxford results were
significantly different, appears to misinterpret the statistical nature of this
difference. Statistically, the sample tested should have been rejected as being
non-homogeneous. If the difference observed was related to an understatement of
the error by the other laboratories as Dr. Hedges asserted, then the 95%
confidence interval originally noted for the Shroud dating must necessarily be
incorrect and an amended date and confidence interval should have been
produced. Further, the statistical results appear to suggest that the samples
evaluated in 1988 were non-homogeneous as far as their 14C
content was concerned. The reason for this is not readily apparent until one
examines the location on the Shroud cloth from which the sample(s) were
extracted.
_________________________________________________________________________________________________________
_________________________________________________________________________________________________________ To Proceed to Barrie Schwortz’s Shroud of Turin website, press here. To return to our homepage, press here.
_________________________________________________________________________________________________________ _________________________________________________________________________________________________________ 1. Jull, A.J.T., and
others, Radiocarbon dating of the Shroud of Turin, Nature, 337,
611-615, 16 February 1989. 2. Kouznetsov, D.A., and
others, A Re-evaluation of the Radiocarbon Date of the Shroud of Turin Based
on Biofractionation of Carbon Isotopes and a
Fire-Simulating Model, Archaeological
Chemistry - Organic, Inorganic, and Biochemical Analysis, Mary Virginia Orna,
Editor, American Chemical Society, (1996). 3. Garza-Valdes, L. A., Scientific
Analysis of the Shroud of Turin, Proceedings of the Texas Medieval
Association, (September 11, 1993). 4. Van Haelst, R., Radiocarbon
Dating the Shroud - A Critical Statistical Analysis, Barrie Schwortz
website: http://www.shroud.com/vanhels3.htm, (1999). 5. Ward, G.K. &
Wilson, S.R., Archaeometry 20, 19-31 (1978). 6. Hedges, R.E.M. A Note
Concerning the Application of Radiocarbon Dating to the Turin Shroud, Approfondimento Sindone, 1, 1 (1997). Note: additional references are listed at the
end of Part II. 







To proceed to the
second part of this research paper, press here.
References