Introduction

Autoantibodies against islet cell antigens are important markers of type 1 diabetes-associated autoimmunity, and insulin autoantibodies (IAA) were the first beta cell specific autoantibody described [1]. IAA are important in the prediction of type 1 diabetes, especially in young children, as their level and prevalence at diagnosis correlate inversely with age [2, 3]. Measurement of IAA was initially limited by the high serum volume required for the early immunoprecipitation assays, which used polyethylene glycol to separate immune complexes [4]. The microassay format using protein A/G-Sepharose for precipitation allowed a major reduction of the amount of serum used and has improved assay specificity [5, 6]. IAA measurement using the microassay method is now performed in many laboratories and standardisation of assays is needed to compare the results of studies throughout the world.

The Diabetes Antibody Standardization Program (DASP), a collaboration of the Immunology of Diabetes Society (IDS) and the US Centers for Disease Control and Prevention (CDC), was established as an extension of previous IDS antibody workshops to improve and standardise the measurement of the diabetes-associated autoantibodies [58]. The objective of the programme is to assist laboratories to improve their methods by providing technical support, training and information as well as assessing proficiency in order to harmonise antibody measurement between key laboratories worldwide. DASP proficiency evaluations since 2000 have shown that the majority of participating laboratories achieved a high concordance in measurement of antibodies to GAD and islet antigen-2 (IA-2) using the new WHO international reference reagent [79]. In contrast, the first DASP proficiency testing demonstrated wide variation among IAA microassays, with poor overall performance and low sensitivity [8]. Nonetheless, this first workshop also showed that some laboratories were able to achieve high sensitivity and specificity with good levels of concordance.

We now report the results for IAA assays of the follow-up proficiency-evaluation rounds performed in 2002, 2003 and 2005, and compare them with those of DASP 2000. The format of the evaluations, which involved a relatively large number of coded serum samples from patients and controls, allowed a comprehensive comparison of different laboratories and assays. We used the same format for each round, allowing analysis of changes in assay performance over time. The main aims of these workshops were to allow participating laboratories to determine the sensitivity and specificity of their IAA assays, to assess concordance between laboratories and to examine whether the overall performance of assays had improved. We also evaluated the concordance between laboratories in quantifying antibody levels by ranking and using common reference standards and a potential IA standard curve based on dilutions of sera from two patients with long-standing insulin-treated type 1 diabetes.

Methods

Study design

Participating laboratories received uniquely coded sets of frozen 100 μl aliquots of sera from 50 patients with newly diagnosed type 1 diabetes and 100 healthy controls as described by Törn et al. [9]. Sets were distributed to 26–32 laboratories in up to 14 countries in each round (see Electronic supplementary material [ESM] for a list of participating laboratories). In addition to the proficiency samples, participating laboratories received the DASP IAA standard (sample 686) and the DASP IAA-negative serum. In 2005, serial dilutions of two potential standards of IA-positive type 1 diabetic patients’ sera (IA standards IB4.4, IB4.6–IB4.10 and IC9.3, IC9.5–IC9.9) were also distributed. Laboratories were asked to test the sera using their usual method and to provide some details of the local assay. Results were reported as raw data, units calculated according to the local protocol and the classification of each serum sample as IAA negative or positive using the local cut-off. Informed consent has been obtained from all patients who have donated samples for DASP, and the investigations were carried out in accordance with the Declaration of Helsinki as revised in 2000.

Data analysis

Data were analysed as in previous DASP proficiency evaluations [8, 9]. Laboratory-assigned sensitivity and specificity were based on the local cut-off. Receiver operator characteristic (ROC) curves were used to evaluate the performance of each assay in discriminating disease from non-disease on the basis of the area under the curve. To facilitate comparison between laboratories, the coordinates of the ROC curve were used to determine the level of sensitivity that corresponded to a specificity of 95%, defined as the adjusted sensitivity 95 (AS95). The combined ROC curve was compiled from the median values for each patient and control sample from all assay measurements in DASP 2005.

The reported relative levels of autoantibody in different assays in DASP 2005 were compared by ranking the patient sera for each assay. Concordance was assessed by linear regression of individual assay rank against the median rank for all sera from patients with type 1 diabetes.

To evaluate the use of the IAA index and IA standard curve in antibody quantification, the reported Δcpm without and with unlabelled insulin was used in assays that included competition and cpm without unlabelled insulin in those that did not.

The IAA index for each serum in each laboratory compared with the DASP IAA standard and DASP negative serum was calculated as follows:

$$ {\text{IAA}}\ {\text{index}} = 100 \times \frac{{[\Delta {\text{cpm}}\ {\text{unknown}} - \Delta {\text{cpm}}\ {\text{negative}}\ {\text{serum}}]}}{{[\Delta {\text{cpm}}\ {\text{positive}}\ {\text{standard}} - \Delta {\text{cpm}}\ {\text{negative}}\ {\text{serum}}]}} $$

The IA index for each serum in each laboratory was calculated from the IB4.4 standard dilution, which was reported positive by all laboratories, and the DASP negative serum as follows:

$$ {\text{IA}}\ {\text{index}} = 100 \times \frac{{[\Delta {\text{cpm}}\,{\text{unknown}} - \Delta {\text{cpm}}\,{\text{negative}}\,{\text{serum}}]}}{{[\Delta {\text{cpm}}\,{\text{IB4}}{\text{.4}}\,{\text{standard}} - \Delta {\text{cpm}}\,{\text{negative}}\,{\text{serum}}]}} $$

In addition IA units were derived from a logarithmic standard curve constructed from dilutions of the IB4 standard provided. The IA units assigned to the dilutions were: 125 IA units for IB4.4; 31.25 IA units for IB4.6; 15.6 IA units for IB4.7; 7.8 IA units for IB4.8; 3.9 IA units for IB4.9; and 1.95 IA units for IB4.10. Values outside the standard curve were calculated by extrapolation.

Differences in inter-laboratory concordance between ranking by laboratory-reported IAA levels, calculated IAA and IA indices and standard-curve-derived IA units were analysed by comparing the variance of the regression using the F test.

Non-parametric tests were used to compare antibody levels in patient and control samples, and for comparisons between laboratories, workshops and assay methods. Assays for which results for more than 10% of samples were missing were not included in inter-laboratory comparisons. These are indicated in Table 2. For all statistical analyses, two-tailed p values <0.05 were considered significant.

Results

IAA assay characteristics in DASP proficiency testing

The results of IAA determination in DASP 2000, 2002, 2003 and 2005 are summarised in Table 1 . Twenty-three laboratories reported the results of 23 assays in 2000, 32 reported results of 35 assays in 2002, 28 reported results of 28 assays in 2003, and 26 reported results of 30 assays in 2005. Data-reporting errors resulted in poor performance for two laboratories in DASP 2003, and incomplete results (<90%) were reported by two laboratories in 2002, two in 2003 and one in 2005. In DASP 2005, 25 laboratories used competitive assays with displacement of IAA binding with unlabelled insulin and five laboratories used non-competitive assays. As shown in Table 1, the median AUC improved progressively from DASP 2000 to DASP 2005 for all participating laboratories (p = 0.001; Fig. 1a), and in laboratories participating in three or four workshops (p = 0.011; Table 1). There was no overall difference in AS95 between the workshop 2002, 2003 and 2005 (p = 0.268; Fig. 1b). Laboratory-assigned sensitivity using local thresholds (p < 0.0001) also improved between workshops. In particular, the median sensitivity was up to fourfold higher in 2005 compared with 2000 in laboratories that participated in three or four workshops (53% [IQR 33–58%] vs 14% [IQR 9–31%]; p = 0.0001). In contrast, the median laboratory-assigned specificity decreased from 2000 to 2005 (p < 0.0001), and this occurred also in the subset of laboratories that participated in three or four workshops (p = 0.0009). Full results for individual laboratories are given in Table 2.

Table 1 Results of IAA determinations in DASP proficiency evaluations
Fig. 1
figure 1

The changes in area under the ROC curve (p = 0.001) (a) and AS95 (b) for IAA in DASP 2000–5. In DASP 2000, only 50 control samples were circulated and the AS95 was therefore not calculated

Table 2 DASP results for insulin autoantibody assays

Of 22 laboratories with assay performance below the median AUC in 2002 and/or 2003, ten did not register for DASP 2005 (five participants in 2002, one in 2003 and four in 2002 and in 2003), and the performance of a further five laboratories remained below the median AUC in DASP 2005.

Assay format

In house radioimmunoassays vs commercial kits

In every DASP workshop, the highest laboratory-assigned sensitivity, specificity, AUC and AS95 for IAA were achieved by laboratories using in-house radioimmunoassays. In DASP 2002, two commercial RIA kits, one time-resolved immunofluorometric assay and one ELISA kit were tested in five different laboratories, but achieved lower sensitivity, specificity, AUC and AS95 (Table 2). In DASP 2003, three commercial RIA kits and one time-resolved immunofluorometric assay were tested in four laboratories, and in DASP 2005, six laboratories tested commercial RIA kits. The results obtained with the six commercial kits are shown together with those of the 26 in-house RIAs in Fig. 2.

Fig. 2
figure 2

The effects of IAA assay format on AUC (a) and AS95 (b) in DASP 2005. A wide variation was seen in the results for both commercial kits and in-house assays. Commercial RIA kits using a competitive assay format (black circles) achieved assay performance comparable with that of the in-house RIA, but those using the non-competitive assay format (white circles) had a low assay performance

Variation between commercial kits

The performance of the kits was variable. In DASP 2005, the median laboratory-assigned sensitivity for assays using kits was 33% (IQR 18–49%) vs 52% (IQR 25–58%) for in-house RIA (p = 0.147), median specificity was 96% (IQR 58.5–99%) vs 98% (IQR 96–99%; p = 0.35), median AUC was 0.78 (IQR 0.48–0.86) vs 0.81 (IQR 0.72–0.83; p = 0.539) and median AS95 was 37% (IQR 14–65%) vs 47% (IQR 33–63%; p = 0.351). In DASP 2002–2005, only one commercial RIA kit (laboratory 132) achieved sensitivity, specificity, AUC and/or AS95 above the median values of all participating laboratories. One RIA kit (laboratory 209) achieved the highest AUC and AS95 of all assays in DASP 2005, but the laboratory-assigned sensitivity was only 22%. Of note, the two RIA kits with lowest AUC and AS95 values used the non-competitive assay format without displacement of IAA binding with unlabelled insulin (Fig. 2a, b; white circles). In DASP 2005, four assays (laboratories 121, 150, 153 and 209) reported values for both AUC and AS95 in the upper quartile.

Concordance of laboratory-reported measurements

In DASP 2005, serum samples from nine patients and one healthy control were reported positive in ≥75% of assays. An additional 12 patient samples, but none of the control samples, were reported positive in ≥50% of assays, and an additional nine patient samples and another two control samples were positive in ≥25% of assays. There was agreement on positive/negative status in ≥75% of assays for 108 samples (nine patient samples and 99 control samples; ESM Fig. 1a, b). In three of four laboratories with assay performances for both AUC and AS95 in the upper quartile, there was agreement for either positivity or negativity in 127 samples (27 patients samples and 100 control samples; data not shown).

The concordance of ranking of the IAA level in the patient samples between all laboratories by linear regression analysis was highly significant (r 2 = 0.642, variance = 73.7, p < 0.0001; Fig. 3). As expected, concordance in ranking of patient samples was lower between the assays with both AUC and AS95 below the 25th centile (n = 5 assays, r 2 = 0.392, variance = 126) than between the assays with AUC and AS95 between the 25th and 75th centile (n = 21 assays, r 2 = 0.669, variance = 67.9; p < 0.0001) and between assays with AUC and AS95 above the 75th centile (n = 4 assays, r 2 = 0.861, variance = 29.3; p < 0.0001 vs lower 25th centile, and p < 0.0001 vs 25th–75th centiles using the F test).

Fig. 3
figure 3

IAA in samples from 50 patients with newly diagnosed type 1 diabetes in the DASP 2005 proficiency evaluation. The rank of individual samples in each assay is plotted against the median rank obtained for all 30 participating assays

Concordance of laboratory-reported IAA levels, common IAA index and common IA index

IAA and IA indices were calculated in 27 of the 30 assays in DASP 2005. Three laboratories failed to include the standards in their measurements. The DASP IAA standard was reported positive in all assays. The median IAA index of the IB4.4 IA standard was 56.1 (IQR 42.5–82.1) and the median IAA index of the IC9.3-IA standard was 43.3 (IQR 22.4–53.1; p = 0.003); IB4.4 was reported positive in all 27 assays and IC9.3 was reported positive in 26 assays. Further analyses were therefore based on units derived from the IB4.4 standard curve.

The ranking of patient samples by laboratory-reported IAA levels varied greatly between the 27 assays (r 2 = 0.088, variance 128,000; p < 0.0001) and also between the four assays with AUC and AS95 performances in the upper quartile (r 2 = 0.467, variance 468,000; p < 0.0001). The overall concordance of ranking was markedly improved by expressing results as an index in relation to either the IAA or IA common standard (r 2 = 0.779, variance 385, p < 0.0001, and r 2 = 0.747, variance 1,100, p < 0.0001, respectively; F test IAA index and IA index vs laboratory-reported IAA level, p < 0.0001; Fig. 4a, c). This was particularly apparent in the four laboratories with AUC and AS95 performances above the 75th centile (IAA index: r 2 = 0.904, variance 173, p < 0.0001; IA index: r 2 = 0.918, variance 356, p < 0.0001, respectively; F test, IAA index and IA index vs laboratory-reported IAA level, p < 0.0001; Fig. 4b, d). In all assays, the variance of ranking was lower using the IAA index compared with the IA index (F test, p < 0.0001). The ranking by units derived from the complete IB4 standard curve did not improve the inter-laboratory concordance of all assays, or of the four assays with AUC and AS95 in the upper quartile (r 2 = 0.147, variance 204, p < 0.0001, and r 2 = 0.786, variance 711, p < 0.0001; F test IAA index and IA index vs IB4-IA units, p < 0.0001; Fig. 4e, f).

Fig. 4
figure 4

IAA index (a, b), IA index (c, d) and standard-curve-derived IA units (e, f) in samples from 50 patients with newly diagnosed type 1 diabetes in the DASP 2005 proficiency evaluation. Samples are arranged in order of ascending median rank of 27 assays (a, c, e) and of the four assays with AUC and AS95 performances in the upper quartile (b, d, f). Boxes represent the median and interquartile range

Combined ROC curve

The median IAA index values for each patient and control sample compiled from 27 assay measurements including the standards provided in DASP 2005 were used to construct a combined ROC curve with AUC 0.89 (CI 95% 0.824–0.957, p < 0.0001; Fig. 5). Using this combined curve, the AS95 was defined at 70%. The cut-off IAA index value of 1.5 corresponded to a specificity of 98% and a sensitivity of 54%. In comparison, for autoantibodies to GAD (GADA), the AUC was 0.95 (95% CI 0.91–1.0), and at specificity 98% sensitivity was 88%. For IA-2A, the AUC was 0.86 (95% CI 0.78–0.94), and at specificity 98% sensitivity it was 74% [9].

Fig. 5
figure 5

Generalised ROC curve for IAA (solid line), GADA (dotted line), and IA-2A (dashed line) in DASP 2005. These were compiled from 27 IAA assays, 50 GADA assays and 50 IA-2A assays of all proficiency samples. For IAA, the AUC was 0.89 (95% CI 0.82–0.96), and a cut-off IAA index value of 1.5 corresponded to a specificity of 98% and a sensitivity of 54%. For GADA, the AUC was 0.95 (95% CI 0.91–1.0), and at a specificity of 98% the sensitivity was 88%. For IA-2A, the AUC was 0.86 (95% CI 0.78–0.94), and at a specificity of 98% the sensitivity was 74%

Discussion

In the first DASP proficiency testing performed in 2000 we reported that, in contrast to GADA and IA-2A assays, IAA microassays generally achieved low sensitivity, with poor inter-laboratory concordance [8]. The overall performance of the IAA microassays, however, improved in the three subsequent proficiency-testing rounds. In particular, there was a stepwise increase in the laboratory-assigned sensitivity, which was fourfold higher in DASP 2005 than in DASP 2000. Although there was some reduction in specificity, the improved sensitivity was associated with improvement of the overall ability to discriminate diabetes and control sera as demonstrated by ROC curve analysis.

Although there was still wide variation between assays in the ranking of IAA levels in different serum samples, there was overall concordance. As expected, this varied with assay sensitivity; assays with the highest AUC and adjusted sensitivity were the most concordant and, as the overall performance of IAA microassays has improved, concordance between laboratories in reporting patients as positive has also been enhanced. Whereas in DASP 2000 only three patients were reported positive in ≥75% of assays and six patients were reported positive in ≥50% of assays [8], in DASP 2005, nine patients were reported positive by ≥75% and 21 by ≥50%. Three out of four laboratories with AUC and AS95 values in the upper quartile concordantly reported 27 patients as positive.

Some caveats are needed in analysing changes between workshops. First, there were some differences in the laboratories taking part. The improvement in sensitivity and AUC were, however, also seen in the subset of laboratories that took part in three or four workshops. Second, only a minority of serum samples were included in more than one workshop and improvements in sensitivity could potentially be an artefact of differences between the serum samples included.

In all three workshops the highest laboratory-assigned sensitivity, specificity, AUC and AS95 values were achieved in laboratories using in-house radioimmunoassays. This is in accordance with the outcomes from the 4th International Workshop on the Standardization of Insulin Autoantibody Measurements and the first DASP workshop [8, 10]. Time-resolved immunofluorometric assays, an ELISA kit and the majority of commercial RIA kits performed less well, although one RIA kit did achieve sensitivity and specificity comparable with that of the reliable in-house RIAs. One further commercial RIA kit achieved the highest AUC and AS95 of all DASP 2005 IAA assays, but the laboratory-assigned sensitivity was low (22%), suggesting that the threshold for positivity was inappropriately high. Some kits did not include competition with unlabelled insulin and these performed relatively poorly. Displacement of IAA binding with unlabelled insulin is strongly recommended for laboratories using RIA kits.

The format of the workshops also allowed us to evaluate the potential benefits of introducing an IAA reference standard preparation, as has been done for GADA and IA-2A [11]. The DASP IAA standard and negative sera were tested in all DASP workshops. Using this common standard to quantify levels as an IAA index clearly improved concordance between assays, particularly among those with the highest sensitivity. Sources of this standard are, however, very limited and, because IAA are usually found in young children, there will always be problems with obtaining a sample of large volume from a single patient, particularly at the time of diagnosis. We therefore included serial dilutions of sera obtained from two insulin-treated patients in DASP 2005 to see whether the same improvement could result from using an index derived from an IA-positive serum and to evaluate whether units derived from an IA standard curve by regression analysis could further improve concordance between laboratories. Such serum samples are more readily available, can be diluted, and can therefore be used in several workshops. The IB4.4 dilution was selected because the antibody level was closest to the IAA standard and was designated positive by all participating laboratories. Using the IA index based on this standard and the negative serum, the concordance between assays was clearly improved, though to a lesser extent than was achieved when using an index based on the IAA standard. This may be explained by the lower antibody levels in the selected standard dilution resulting in a higher variance of the calculated indices. The lower IAA levels in the alternative standard (IC9.3) meant that it was reported negative by one laboratory and was thus unsuitable for use as a common standard. Although in a previous workshop, we demonstrated that concordance between laboratories was improved by reporting IAA results in units derived from a standard curve based on the DASP IAA standard (P. Bingley, unpublished data), reporting results in units derived from the standard curve based on dilutions of the IA-positive standard, IB4, did not improve concordance between laboratories. This difference may be due to the lower antibody levels in the top standard of the IA curve, with high variability in the lower binding range—particularly in less sensitive assays—leading to lower concordance. In addition, extrapolation to values outside the range of the standard curve will have exaggerated differences between laboratories.

The combined ROC curves for IAA, GADA and IA-2A in the 50 patients and 100 control samples in DASP 2005 show that IAA were less sensitive than GADA or IA-2A at the level giving 98% specificity, perhaps reflecting the age distribution of the patients in DASP, which generally includes few young children. As demonstrated in the DASP Insulin Autoantibody Affinity Workshop, the combined measurement of affinity and titre has the potential to markedly improve the sensitivity, specificity and concordance of IAA measurement [12]. At present, the techniques and analyses to determine affinity are cumbersome, but this issue will be addressed again in future DASP workshops.

In summary, the reported DASP proficiency evaluations, involving blinded testing of large numbers of serum samples in a wide range of laboratories, have shown that insulin autoantibody measurement by microassay methods has markedly improved since the first DASP workshop in 2000. Comprehensive comparison of assay performance has shown that both in-house RIAs and commercial RIA kits can achieve high levels of sensitivity, specificity and reproducibility as well as good inter-laboratory concordance, although there is still a wide variation in the reported data. Further improvement in inter-laboratory concordance might be achieved by using the competitive assay format in all laboratories as well as by re-defining the threshold for positivity used in some assays. The introduction of a common IAA index based on the IAA-positive standard and negative control serum provided considerably improved the concordance of results between laboratories. The use of an appropriate dilution of an IA-positive standard with binding characteristics similar to the IAA-positive standard serum may achieve comparable concordance. These results are of great importance when comparing and interpreting data reported from different studies in type 1 diabetes research.