Statistical analysis of highly skewed immune response data

doi:10.1016/S0022-1759(96)00216-5

Journal of Immunological Methods

Volume 201, Issue 1, 14 February 1997, Pages 99-114

https://doi.org/10.1016/S0022-1759(96)00216-5 Get rights and content

Abstract

This paper considers methods of statistical analysis for highly skewed immune response data. Observations from population studies of immunological variables are rarely normally distributed between individuals; typically the distribution shows extreme levels of skewness. In some situations, skewness remains considerable even after transforming the data. Using resampling techniques, applied to several actual datasets of ELISA assay data, we consider the robustness of normal parametric methods, e.g. t tests and linear regression. Despite the skewness of the transformed data, we demonstrate that such methods are quite robust depending on the number of observations, type of analysis and severity of skewness. We also illustrate how bootstrap resampling can be used to provide a valid alternative method of analysis that can be used either for checking normal parametric analysis or as a direct method of analysis. We illustrate this combined approach by analysing real data to test for association between human serum antibodies to malaria merozoite surface proteins, MSP1 and MSP2, and resistance to clinical malaria, and confirm the protective effect of antibodies to MSP1 and demonstrated a similar protective effect for some antibodies to MSP2.

Introduction

Population surveys of naturally acquired immune responses can be used to obtain quantitative and qualitative data for a number of immunological variables which can be related to clinical variables such as susceptibility to disease, disease progression or prognosis and individual variables such as age, sex, previous medical history, etc. From our own work, and the published data of many other research groups, we have observed that immune response data collected in such studies are often distributed, between individuals, in a very uneven or asymmetrical pattern. A rapid search of the published literature has revealed a number of serological datasets for a variety of different parasitic organisms, all of which showed some degree of skewness (Nutman et al., 1985; Gabra et al., 1986; Perlmann et al., 1989; Tenter et al., 1991; Deplazes et al., 1992; Helbeig et al., 1993; Van Gelder et al., 1993; Paranhos-Bacalla et al., 1994; Ferrari et al., 1995); additional examples can be found in a recent review by Muller et al. (1995). The degree of skewness appears to depend, to some extent, on the type of antigen used in the assay with the responses to defined single antigens, typically represented by recombinant proteins or synthetic peptides, being highly positively skewed and responses to crude parasite extracts, which are mixtures of many different antigens, tending to be less skewed. For example, severely skewed data was obtained for purified antigens of Echinococcus granulosus (Helbeig et al., 1993) and Plasmodium falciparum (Gabra et al., 1986; Perlmann et al., 1989) and moderately skewed data for defined Toxoplasma gondii (Tenter et al., 1991; Van Gelder et al., 1993) and Trypanosoma cruzi (Paranhos-Bacalla et al., 1994) antigens. In contrast, serological responses to crude extracts of a number of helminth parasites, including Echinococcus and Schistosoma, are only slightly skewed (Nutman et al., 1985; Deplazes et al., 1992; Ferrari et al., 1995).

A recent paper (Bennett and Riley, 1992) confirmed that skewness across a range of datasets is more the norm than the exception and that there is no consensus among immunologists about how such data should be analysed. Skewness may be so pronounced that the application of standard analysis can be rather problematic. One common approach to analysis is to ignore the skewness in the data and apply normal parametric methods such as t tests, analysis of variance (or covariance) or standard linear regression, with or without applying a transformation (usually logarithmic) to the data to reduce or remove skewness. As will be confirmed in this paper, transforming immune response data does not always normalize the distribution and may only have a minor impact on reducing the level of skewness (Bennett and Riley, 1992). At first sight, using normal parametric methods on skewed data may be expected to be invalid: confidence intervals intended to provide 95% probability of coverage (of the population value) may actually only attain a lower level of coverage and perhaps give misleading `statistically significant' results. However, because of the `Central Limit Theorem' (CLT) of statistics¹ (Ross, 1976) normal methods may still provide approximately valid confidence intervals (or equivalently, hypothesis test p values) provided that, (a) the sample size is large enough, and (b) the distribution is not too severely skewed. A second approach is to use standard non-parametric methods, e.g. Mann-Whitney or Kruskal-Wallis significance tests. However, these methods do not make full use of the available quantitative data and are limited when a multivariate analysis (e.g. multiple regression) is required which is often the case with epidemiological data. One aim of this paper is to investigate the range of values relating to (a) and (b) which will permit the use of normal parametric methods for sero-epidemiologic studies. A third approach to the analysis of immune response data is to avoid analysing the original quantitative data altogether but instead to analyse corresponding binary data after categorising each individual as a `responder' or a `non-responder', depending on whether the measured response is above or below some estimated cut-off value. Standard statistical methods such as the chi-squared test or logistic regression can then be validly applied. Although this approach may make biological sense, it is not ideal. Firstly, defining a cut-off value when, from plots of the data, there may be no obvious division of individuals into two groups, is potentially unjustified and a rather arbitrary process. Results from such analysis may also be sensitive to the particular choice of cut-off, especially for small studies. In addition, it has been shown previously (Bennett and Riley, 1992), that statistical power will be decreased when qualitative rather than quantitative data are analysed.

A relatively new approach to constructing confidence intervals, without making any parametric assumptions about the shape of the underlying population of data, is the bootstrap resampling method, first introduced by Efron (1979). A very readable account of the method is provided by Efron and Tibshirami (1993). The bootstrap method involves Monte Carlo sampling and is computationally intensive. It differs from traditional formula-driven methods in that the original data are resampled, many times, to simulate sampling variability and to provide an approximation for the unknown sampling distribution of the statistic of interest. This sampling distribution is used to provide estimates of standard error and to estimate confidence intervals. Traditional normal methods are based on the assumption that the underlying data distribution is normal so that the sampling variability of the statistic of interest is also normal and so predictable. The bootstrap method does not require assumptions about the underlying population of data, and therefore, intuitively, it is appropriate for analysing immune response data regardless of, for example, severe skewness. The bootstrap is a non-parametric method of analysis and provides a `gold standard' against which other analyses can be compared.

This paper aims to: (i) consider the robustness and limitations of normal parametric methods for the analysis of highly skewed immune response data, and (ii) to explore the use of the bootstrap method as a tool for checking normal analysis and also as a direct method of analysis.

We shall use as examples datasets obtained from enzyme-linked immunosorbent assays (ELISA) of human sera tested for the presence of antibodies to the malaria parasite, Plasmodium falciparum. The data obtained are examined in relation to both clinical and demographic covariates in order to test for any statistical association between immune responsiveness and malarial disease.

Section snippets

Data and immunological methods

Serum samples were collected from children aged 3–8 years living in a highly malaria endemic area of West Africa. Details of the study area, malaria endemicity and demography have been published previously (Greenwood et al., 1987; Riley et al., 1990). Sera were tested for antibodies to recombinant malaria antigens by ELISA (Egan et al., 1995; Taylor et al., 1995). The recombinant antigens used were produced as glutathione-S-transferase fusion proteins in pGex expression vectors, as described

Normal parametric methods

Three different and commonly encountered situations were considered to assess the robustness of normal parametric analysis.

(a)

The 1-sample situation of estimating the population average immune response. The relevant normal method is a one-sample t test (or an equivalent t-based confidence interval). This problem can arise in practice when paired data, i.e. two OD measurements per individual are involved and the within-subject differences are calculated for analysis.

(b)

The 2-sample situation of

Descriptive analysis

Fig. 1 shows histograms for five datasets of OD data ordered from left to right by level of skewness (upper histograms). Histograms for the corresponding log transformed data are also shown (lower histograms). A standard skewness statistic (see formula in Fig. 1) was calculated for each set of data and all values were significantly greater than zero (two standard errors≅0.34, p<0.05). For untransformed data, skewness ranged from extreme levels of 6.15 (MAD2) down to moderate levels of 1.02

Discussion

Using several real immune response datasets we have considered (i) the robustness of normal parametric methods for analysing skewed immune response data and (ii) the use of bootstrap resampling as an alternative method of analysis. These data are typical of highly skewed data which are often obtained in immuno-epidemiological studies. Although some datasets, for particularly immunogenic antigens, may give normal or near normal distributions, for less immunogenic antigens, where the proportion

Acknowledgements

We are very grateful to Andrea Egan and Rachel Taylor for providing ELISA data, Bruce Worton for helpful discussions, and to the Wellcome Trust for providing financial support. Steve Bennett was supported by the Medical Research Council.

References (25)

Aitken, M. (1990) Statistical Modelling in GLIM. Oxford Science Publications,...
Armitage, P. and Berry, G. (1994) Statistical Methods in Medical Research. Blackwell Scientific Publications,...
Bennett, S. and Riley, E. (1992) The statistical analysis of data from immunoepidemiological studies. J. Immunol....
Critchfield, G. et al. (1992) Nonparametric assessment of toxicologic assay linearity by bootstrap analysis. J. Analyt....
Deplazes, P. et al. (1992) Detection of Echinococcus coproantigens by enzyme-linked immunosorbent assay in dogs, dingos...
Efron, B. (1979) Bootstrap methods: another look at the jackknife. Ann. Stat. 7,...
Efron, B. and Tibshirami, R. (1993) An Introduction to the Bootstrap. Chapman and Hall,...
Egan, A. et al. (1995) Serum antibodies from malaria-exposed people recognize conserved epitopes formed by the two...
Egan, A. et al. (1996) Clinical immunity to plasmodium falciparum malaria is associated with serum antibodies to the...
Ferrari, T. et al. (1995) The value of an enzyme-linked immunosorbant assay for the diagnosis of schistosomiasis...

Fulford, A.C. (1994) Dispersion and Bias: Can we trust geometric means? Parasitol. Today 10,...

Gabra, M. et al. (1986) Defined Plasmodium falciparum antigens in malaria serology. Bull. WHO 64,...

Cited by (41)

Patient global assessment and inflammatory markers in patients with idiopathic inflammatory myopathies – A longitudinal study
2024, Seminars in Arthritis and Rheumatism
To explore if patient global assessment (PGA) is associated with inflammation over time and if associations are explained by other measures of disease activity and function in patients with idiopathic inflammatory myopathies (IIM).
PGA and systemic inflammatory markers prospectively collected over five years were retrieved from the International MyoNet registry for 1200 patients with IIM. Associations between PGA, erythrocyte sedimentation rate (ESR), C-reactive protein (CRP) and creatine kinase (CK) were analyzed using mixed models. Mediation analysis was used to test if the association between PGA and inflammatory markers during the first year of observation could be explained by measures of disease activity and function.
PGA improved, and inflammatory markers decreased during the first year of observation. In the mixed models, high levels of inflammatory markers were associated with worse PGA in both men and women across time points during five years of observation. In men, but not in women, the association between elevated ESR, CRP and poorer PGA was explained by measures of function and disease activity. With a few exceptions, the association between improved PGA and reduced inflammatory markers was partially mediated by improvements in all measures of function and disease activity.
Increased levels of systemic inflammation are associated with poorer PGA in patients with IIM. In addition to known benefits of lowered inflammation, these findings emphasize the need to reduce systemic inflammation to improve subjective health in patients with IIM. Furthermore, the results demonstrate the importance of incorporating PGA as an outcome measure in clinical practice and clinical trials.
The impact of BCG strains and repeat vaccinations on immunodiagnostic tests in Eurasian badgers (Meles meles)
2022, Vaccine
Bacille Calmette-Guerin (BCG) is a potential tool in the control of Mycobacterium bovis in European badgers (Meles meles). A five year Test and Vaccinate or Remove (TVR) research intervention project commenced in 2014 using two BCG strains (BCG Copenhagen 1331 (Years 1–3/ BadgerBCG) and BCG Sofia SL2222 (Years 4–5). Badgers were recaptured around 9 weeks after the Year 5 vaccination and then again a year later.
The Dual-Path Platform (DPP) Vet TB assay was used to detect serological evidence of M. bovis infection. Of the 48 badgers, 47 had increased Line 1 readings (MPB83 antigen) between the Year 5 vaccination and subsequent recapture. The number of BCG Sofia vaccinations influenced whether a badger tested positive to the recapture DPP VetTB assay Line 1 (p < 0.001) while the number of BadgerBCG vaccinations did not significantly affect recapture Line 1 results (p = 0.59). Line 1 relative light units (RLU) were more pronounced in tests run with sera than whole blood. The results from an in_house MPB83 ELISA results indicated that the WB DPP VetTB assay may not detect lower MPB83 IgG levels as well as the serum DPP VetTB assay. Changes in interferon gamma assay (IFN-γ) results were seen in 2019 with significantly increased CFP-10 and PPDB readings.
Unlike BadgerBCG, BCG Sofia induces an immune response to MPB83 (the immune dominant antigen in M. bovis badger infection) that then affects the use of immunodiagnostic tests. The use of the DPP VetTB assay in recaptured BCG Sofia vaccinated badgers within the same trapping season is precluded and caution should be used in badgers vaccinated with BCG Sofia in previous years. The results suggest that the DPP VetTB assay can be used with confidence in badgers vaccinated with BadgerBCG as a single or repeated doses.
Fatigue and sleepiness responses to experimental inflammation and exploratory analysis of the effect of baseline inflammation in healthy humans
2020, Brain, Behavior, and Immunity
Citation Excerpt :
All analyses were performed using Stata 14 (StataCorp, College Station, Texas) using an alpha level of 0.05. Following recommendations by (McGuinness et al., 1997), cytokine data were log-transformed and analyses were performed using bootstrap resampling method (1000 resamples). The data files are freely available on the OSF repository: [https://osf.io/u78h6/?view_only=a8a203227c854e35a496597d40e2cf5e].
Inflammation is believed to be a central mechanism in the pathophysiology of fatigue. While it is likely that dynamic of the fatigue response after an immune challenge relates to the corresponding cytokine release, this lacks evidence. Although both fatigue and sleepiness are strong signals to rest, they constitute distinct symptoms which are not necessarily associated, and sleepiness in relation to inflammation has been rarely investigated. Here, we have assessed the effect of an experimental immune challenge (administration of lipopolysaccharide, LPS) on the development of both fatigue and sleepiness, and the associations between increases in cytokine concentrations, fatigue and sleepiness, in healthy volunteers. In addition, because chronic-low grade inflammation may represent a risk factor for fatigue, we tested whether higher baseline levels of inflammation result in a more pronounced development of cytokine-induced fatigue and sleepiness. Data from four experimental studies was combined, giving a total of 120 subjects (LPS N = 79, 18 (23%) women; Placebo N = 69, 12 (17%) women). Administration of LPS resulted in a stronger increase in fatigue and sleepiness compared to the placebo condition, and the development of both fatigue and sleepiness closely paralleled the cytokine responses. Individuals with stronger increases in cytokine concentrations after LPS administration also suffered more from fatigue and sleepiness (N = 75), independent of gender. However, there was no support for the hypothesis that higher baseline inflammatory markers moderated the responses in fatigue or sleepiness after an inflammatory challenge. The results demonstrate a tight connection between the acute inflammatory response and development of both fatigue and sleepiness, and motivates further investigation of the involvement of inflammation in the pathophysiology of central fatigue.
Maternal BCG scar is associated with increased infant proinflammatory immune responses
2017, Vaccine
Citation Excerpt :
Cytokine and chemokine concentrations showed skewed distributions. Results were transformed to log10 (cytokine concentration + 1) for graphical representation using GraphPad Prism v6.0c (GraphPad software, Inc., La Jolla, CA, USA) and for analysis by linear regression using bootstrapping [33] using STATA v. 13.1 (College Station, TX, USA). Results from regression analyses are presented as adjusted geometric mean ratios (aGMR) [95% confidence interval (CI)].
Prenatal exposures such as infections and immunisation may influence infant responses. We had an opportunity to undertake an analysis of innate responses in infants within the context of a study investigating the effects of maternal mycobacterial exposures and infection on BCG vaccine-induced responses in Ugandan infants.
Maternal and cord blood samples from 29 mother-infant pairs were stimulated with innate stimuli for 24 h and cytokines and chemokines in supernatants were measured using the Luminex® assay. The associations between maternal latent Mycobacterium tuberculosis infection (LTBI), maternal BCG scar (adjusted for each other’s effect) and infant responses were examined using linear regression. Principal Component Analysis (PCA) was used to assess patterns of cytokine and chemokine responses. Gene expression profiles for pathways associated with maternal LTBI and with maternal BCG scar were examined using samples collected at one (n = 42) and six (n = 51) weeks after BCG immunisation using microarray.
Maternal LTBI was positively associated with infant IP-10 responses with an adjusted geometric mean ratio (aGMR) [95% confidence interval (CI)] of 5.10 [1.21, 21.48]. Maternal BCG scar showed strong and consistent associations with IFN-γ (aGMR 2.69 [1.15, 6.17]), IL-12p70 (1.95 [1.10, 3.55]), IL-10 (1.82 [1.07, 3.09]), VEGF (3.55 [1.07, 11.48]) and IP-10 (6.76 [1.17, 38.02]). Further assessment of the associations using PCA showed no differences for maternal LTBI, but maternal BCG scar was associated with higher scores for principal component (PC) 1 (median level of scores: 1.44 in scar-positive versus −0.94 in scar-negative, p = 0.020) in the infants. PC1 represented a controlled proinflammatory response. Interferon and inflammation response pathways were up-regulated in infants of mothers with LTBI at six weeks, and in infants of mothers with a BCG scar at one and six weeks after BCG immunisation.
Maternal BCG scar had a stronger association with infant responses than maternal LTBI, with an increased proinflammatory immune profile.
Factors associated with tuberculosis infection, and with anti-mycobacterial immune responses, among five year olds BCG-immunised at birth in Entebbe, Uganda
2015, Vaccine
Citation Excerpt :
Logistic regression was used to examine associations with LTBI at age five. Cytokine responses were transformed to log10 (concentration + 1) and then analysed using linear regression with bootstrapping to estimate bias corrected accelerated confidence intervals [31]; results were back-transformed to give geometric mean ratios. Spearman's correlation coefficients between cytokines responses at one and five years were calculated.
BCG is used widely as the sole licensed vaccine against tuberculosis, but it has variable efficacy and the reasons for this are still unclear. No reliable biomarkers to predict future protection against, or acquisition of, TB infection following immunisation have been identified. Lessons from BCG could be valuable in the development of effective tuberculosis vaccines.
Within the Entebbe Mother and Baby Study birth cohort in Uganda, infants received BCG at birth. We investigated factors associated with latent tuberculosis infection (LTBI) and with cytokine response to mycobacterial antigen at age five years. We also investigated whether cytokine responses at one year were associated with LTBI at five years of age.
Blood samples from age one and five years were stimulated using crude culture filtrates of Mycobacterium tuberculosis in a six-day whole blood assay. IFN-γ, IL-5, IL-13 and IL-10 production was measured. LTBI at five years was determined using T-SPOT.TB^® assay. Associations with LTBI at five years were assessed using multivariable logistic regression. Multiple linear regression with bootstrapping was used to determine factors associated with cytokine responses at age five years.
LTBI prevalence was 9% at age five years. Only urban residence and history of TB contact/disease were positively associated with LTBI. BCG vaccine strain, LTBI, HIV infection, asymptomatic malaria, growth z-scores, childhood anthelminthic treatment and maternal BCG scar were associated with cytokine responses at age five. Cytokine responses at one year were not associated with acquisition of LTBI by five years of age.
Although multiple factors influenced anti-myocbacterial immune responses at age five, factors likely to be associated with exposure to infectious cases (history of household contact, and urban residence) dominated the risk of LTBI.
The influence of BCG vaccine strain on mycobacteria-specific and non-specific immune responses in a prospective cohort of infants in Uganda
2012, Vaccine
Citation Excerpt :
Random effects were used to account for potential between-lot variability (since several lots of vaccine were administered within each BCG strain group). As some cytokine results remained skewed after log10 transformation, analyses were boostrapped [33] with 10,000 repeats to calculate bias-corrected accelerated confidence intervals. Cytokine responses of infants with and without a BCG scar were compared using the same methods but without random effects (being independent of potential between-lot variability).
Globally, BCG vaccination varies in efficacy and has some non-specific protective effects. Previous studies comparing BCG strains have been small-scale, with few or no immunological outcomes and have compared TB-specific responses only. We aimed to evaluate both specific and non-specific immune responses to different strains of BCG within a large infant cohort and to evaluate further the relationship between BCG strain, scarring and cytokine responses.
Infants from the Entebbe Mother and Baby Study (ISRCTN32849447) who received BCG-Russia, BCG-Bulgaria or BCG-Denmark at birth, were analysed by BCG strain group. At one year, interferon-gamma (IFN-γ), interleukin (IL)-5, IL-13 and IL-10 responses to mycobacteria-specific antigens (crude culture filtrate proteins and antigen 85) and non-mycobacterial stimuli (tetanus toxoid and phytohaemagglutinin) were measured using ELISA. Cytokine responses, scar frequency, BCG associated adverse event frequency and mortality rates were compared across groups, with adjustments for potential confounders.
Both specific and non-specific IFN-γ, IL-13 and IL-10 responses in 1341 infants differed between BCG strain groups including in response to stimulation with tetanus toxoid. BCG-Denmark immunised infants showed the highest cytokine responses. The proportion of infants who scarred differed significantly, with BCG scars occurring in 52.2%, 64.1% and 92.6% of infants immunised with BCG Russia, BCG-Bulgaria and BCG-Denmark, respectively (p < 0.001). Scarred infants had higher IFN-γ and IL-13 responses to mycobacterial antigens only than infants without a scar. The BCG-Denmark group had the highest frequency of adverse events (p = 0.025). Mortality differences were not significant.
Both specific and non-specific immune responses to the BCG vaccine differ by strain. Scarring after BCG vaccination is also strain-dependent and is associated with higher IFN-γ and IL-13 responses to mycobacterial antigens. The choice of BCG strain may be an important factor and should be evaluated when testing novel vaccine strategies that employ BCG in prime–boost sequences, or as a vector for other vaccine antigens.

View all citing articles on Scopus

View full text