Article Text

Download PDFPDF

Context and disease when disease risk is low: the case of type 1 diabetes in Sweden
  1. K F Lynch1,2,
  2. S V Subramanian3,
  3. H Ohlsson1,
  4. B Chaix1,4,
  5. Å Lernmark2,5,
  6. J Merlo1
  1. 1Unit of Social Epidemiology, Department of Clinical Sciences, Malmö University Hospital, Lund University, Malmö, Sweden
  2. 2Unit of Diabetes and Celiac Disease, Department of Clinical Sciences, Malmö University Hospital, Lund University, Malmö, Sweden
  3. 3Department of Society, Human Development and Health, Harvard School of Public Health, Boston, Massachusetts, USA
  4. 4Inserm U707, UMR-S Inserm—Université Pierre et Mariae Curie-Paris6, Paris, France
  5. 5Department of Medicine, University of Washington, Seattle, Washington, USA
  1. Correspondence to Kristian F Lynch, Department of Clinical Sciences, CRC, Entrance 72, House 28, Floor 12, Malmö University Hospital, 205 02 Malmö, Sweden; kristian.lynch{at}med.lu.se

Abstract

Background Several European studies have found significant small area variation in the risk of childhood onset (type 1) diabetes (T1D) which has been interpreted as evidence for contextual determinants of T1D. However, this conclusion may be fallacious since the limited number of newborn infants and the low risk for T1D is a source of spurious variability not properly handled by usual statistical methods. This study investigates the existence of contextual effects in the genesis of T1D, compares conclusions in previous reports with results obtained in a multilevel regression framework and highlights analysis of variance as a useful approach in public health.

Methods All singletons born in Sweden between 1987 and 1991 were identified in the Medical Birth Registry (n=560 766) and followed for diabetes until age 14 using the Hospital Discharge Registry. Area variation in the cumulative incidence of T1D was estimated by different statistical methods including multilevel logistic regression.

Results The risk of T1D ranged from 4.3 to 6.5 per 1000 newborns across the counties (n=24) and from 0.0 to 19.2 per 1000 newborns across the municipalities (n=284). These differences were significant in standard statistical tests (counties, p=0.02; municipalities, p=0.007). However, according to multilevel analyses, the risk of T1D ranged from 4.7 to 5.7 and from 4.4 to 6.0 per 1000 newborns in counties and municipalities, respectively, and the area variation was small and without practical relevance (counties, σ2=0.006; municipalities, σ2=0.017).

Conclusions Previous reports based on standard statistical tests are misleading. According to multilevel analysis, administrative areas have minor relevance for individual risk of T1D in Sweden.

  • Multilevel modelling
  • diabetes
  • variations
  • small area epidemiology
  • diabetes DI

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The epidemiology of childhood onset (type 1) diabetes (T1D) suggests that this disease has clear contextual determinants.1 Previous ecological analyses have shown a wide geographical variation in the incidence of T1D that probably cannot be explained by area differences in individual genetic background2 or by case-finding procedures. For example, in Europe, where data are rather complete and reliable, the incidence ranges from 3.6/100 000 person years in the former Yugoslav Republic of Macedonia to 43.9/100 000 in Finland.3 Geographical variation has also been described within countries. For instance, a number of European studies,4–16 including one performed in Sweden (table 1),4 5 13 16 have applied straightforward statistical analyses and found significant variation in the incidence of T1D between counties and municipalities. However, these results may be fallacious since, when analysing smaller areas, the limited number of newborns and the expected low risk for T1D is a source of spurious variability not properly accounted for by usual statistical methods.

Table 1

Previous studies that have examined overall geographical variation in point estimates of the incidence of childhood type 1 diabetes (T1D) across administrative areas of Sweden

Within the last two decades the importance of filtering out random noise from the area-specific variation has slowly been recognised and some recent European studies have considered the amount of information in the small administrative areas by using a Bayesian approach to obtain T1D disease rates. However, these studies have so far only rendered a visual quantification of the geographical variation.17–21 Most commonly, the observed small area variation is still evaluated by traditional χ2 tests 8–10 12–15 or likelihood ratio tests in fixed effect regression analyses7 11 that have been interpreted as evidence of the existence of important geographical/contextual determinants for T1D.7 9 13 While this interpretation might seem naive, it deserves serious consideration as it may have relevant repercussions for a correct understanding of the aetiology of T1D and for planning strategies of prevention at the appropriate intervention level (ie, individual or areas).22

While international ecological analyses may actually suggest the existence of contextual effects, it is possible that the conclusions drawn from international studies are less valid within countries where contextual factors are more homogenously distributed than between countries. We have previously observed this phenomenon when studying contextual determinants of blood pressure in countries23 and in city areas.24

A more suitable approach to examine area variation would be to employ a hierarchical multilevel regression.25 26 Multilevel regression modelling has several advantages compared with traditional ecological analysis as it distinguishes variation between areas from variation within areas. This methodology also takes into account the uncertainty derived from the small amount of information that might be present in some areas and produces more accurate estimates of incidence by considering the overall information existing in the sample.25 27 28

The aim of this study was to quantify and assess the overall area variation in the risk of T1D between counties and municipalities in Sweden, and thereby evaluate the possible relevance of general contextual effects (ie, the context as a whole) in the genesis of T1D. To perform this evaluation, common statistical approaches, previously applied to investigate small area variation in T1D, were compared with a multilevel regression framework. In the multilevel analysis we also investigated whether any possible small area variance could be explained by compositional factors (ie, differences in individual risk factors for T1D) or by contextual factors (ie, the urban type of municipality).

Methods

Study population

Using the Swedish Medical Birth Registry29 we identified a cohort comprised of all 560 766 live singletons born between 31 December 1986 and 1 January 1992 to mothers living in one of 284 municipalities from the 24 counties that existed at that time in Sweden.

Identification of T1D cases

All cases were identified using the Swedish National Hospital Discharge Registry. Children were followed from birth until 14 years of age or until the first discharge diagnosis of diabetes mellitus (ie, the outcome in this study) defined according to the International Classification of Diseases, Ninth (ICD9) and Tenth (ICD10) Editions, codes 250 and codes E10–E14 respectively. In Sweden, more than 98.5% of children diagnosed with diabetes have the type 1 classification.30

T1D risk score

A score that estimated the risk of a newborn developing T1D was included in the multilevel analyses to account for individual compositional factors of the areas. The risk score was constructed using perinatal factors previously reported to be associated with T1D.31 In short, perinatal factors that were significantly associated (p>0.10) with T1D were retained in a multiple logistic regression analysis to obtain the predicted probability of T1D (see table 2). The calculated predicted probabilities were categorised into quartiles so that those newborns within a higher quartile had an increased risk of developing T1D. (Extended information on the risk score calculation and properties are available on request).

Table 2

Multiple logistic regression model used to calculate type 1 diabetes (T1D) risk score*

Contextual variable

Type of municipality was examined for its association with risk of T1D. This specific contextual variable was based on structural characteristics such as population size, commuting patterns and the structure of the business world according to the definition provided by the Swedish Association of Local Authorities and Regions in 1996. Municipalities were classified as metropolitan (n=3), suburban (n=36), large town (n=25), medium-sized town (n=39), industrial towns (n=53) and the remaining were classified as non-urban or rural (n=128).

Literature review

Using the PubMed service from the US National Library of Medicine and the National Institutes of Health, USA, a literature review for published studies between 1980 and 2007 on geographical and administrative area variation of childhood diabetes was conducted using the following search terms: ‘diabetes’ and ‘geographical variation’ and either ‘counties’, ‘municipalities’ or ‘districts’. An inclusion criterion was based on whether the authors examined geographical variation of childhood T1D across administrative areas of Sweden. Publications in the reference lists of obtained articles that met the inclusion criteria were also included in our review.

Statistical analysis

Differences in observed cumulative incidence of T1D between counties and between municipalities were first investigated using traditional statistical approaches such as the asymptotic Pearson χ2 test and the exact Pearson χ2 test in SPSS. Counties and municipalities were examined separately.

Also, using a simple logistic regression model, the probability of T1D was modelled as a function of the area in which the mother resided at birth (ie, area was included as a fixed effect in the model). Differences in log odds from the reference administrative area were globally tested using a likelihood ratio test.32

Cumulative incidence was calculated as the number of cases of T1D divided by the number of newborns born between 1987 and 1991. The incidence was expressed as cases of T1D per 1000 children.

Multilevel analysis

Multilevel logistic regression models were fitted with newborns (level 1) nested within administrative areas (level 2). Analyses were performed separately for individuals within counties and for individuals within municipalities.

The first simplest model or ‘empty model’ did not include any variable other than a random term for the area level. In the second model we investigated if any possible area variation was due to differences in their newborn composition. We therefore adjusted for known risk factors for T1D (see table 2) by including the T1D risk score in the empty model. Finally, in the third model we included the variable ‘type of municipality’. We aimed to investigate if this contextual characteristic explained a possible residual area variance.

The proportional (percentage) change in variance in the simplest model compared with the model with more terms was calculated as:

PCV=(σb2σa2)/σa2100

where σ2a is the variance of the simplest model and σ2b is the variance of the model with more terms.

Area variation was also calculated as the median odds ratio (MOR)33 34:

MORexp(0.95σ)

where σ is the square root of the area variance.

Conceptually, the MOR can be understood as the increased risk of T1D onset (in the median case) had a child been born in another area with a higher risk. A MOR equal to 1 would indicate no overall area variation in the cumulative incidence of T1D between administrative areas.34

Although not as appropriate to use in the logistic model (dichotomous response variable) as in the linear model (normally distributed continuous response variable), we also expressed the area variance as a percentage of the total variance by calculating the intraclass correlation (ICC)33 34 according to the latent variable method25 34:

ICC=σ2/σ2+3.29

Analysis was also performed for individuals within municipalities nested within counties, but there was little or no area variation in the risk of T1D between the counties. The results of the two-level analyses on counties were still reported so they could be compared with results from the traditional statistical techniques.

Parameters were estimated by Markov Chain Monte Carlo (MCMC) methods.35 Statistical analyses were performed using SPSS Version 15 and MLwiN Version 2.02; p values <0.05 were considered statistically significant. Observed and predicted cumulative incidences of T1D in administrative areas were plotted using GraphPad Version 4.

Results

Overall, the observed cumulative incidence of T1D in Sweden was 5.2 per 1000 children. The median number of newborns in the counties was 17 945 (range 3932–114 027) and observed cumulative incidence ranged from 4.3 to 6.5 per 1000 children. Across municipalities, the median number of newborns was smaller at 993 (range 150–44 788) and the range in observed cumulative incidence was larger than across counties, ranging from 0.0 to 19.2 per 1000 children (figure 1).

Figure 1

Observed cumulative incidence (dashed line) of early onset type 1 diabetes (T1D) among 560 766 newborn infants born between 1987 and 1991 in Sweden, and predicted cumulative incidence (solid line) as estimated by multilevel logistic regression for (A) counties and (B) municipalities.

As tested by Pearson χ2 tests, statistically significant differences in the observed cumulative incidence was seen across counties (p=0.02) and municipalities (p=0.007) (table 3).

Table 3

Area variation in cumulative incidence of type 1 diabetes (T1D) analysed by traditional statistical tests

Logistic regression models were fitted with administrative areas as a dummy variable. The odds of T1D were compared with the county of Skaraborgs (number of children 17 945, cumulative incidence 5.2 per 1000 children) and with the municipality of Sandviken (number of children 2317, cumulative incidence 5.2 per 1000 children) as their cumulative incidence was closest to the cumulative incidence in Sweden as a whole (overall cumulative incidence). A statistically significant difference in log odds was observed across counties (p=0.02) and municipalities (p=0.007).

As estimated in the multilevel analysis, the range in estimated area cumulative incidence was smaller and similar for both counties (4.7–5.7 per 1000 children) and municipalities (4.4–6.0 per 1000 children; figure 1). Some estimated area cumulative incidences showed a large ‘shrinkage’ towards the overall cumulative incidence with the degree of shrinkage correlating with fewer children in areas as expected (data not shown). The area variation in the risk of T1D was small with a MOR close to 1 across counties, (σ2=0.007, MOR=1.08, 95% credible interval 1.03 to 1.14) and municipalities (σ2=0.017, MOR=1.13, 95% credible interval 1.07 to 1.19; table 4).

Table 4

Multilevel hierarchical logistic regression analysis of risk of type 1 diabetes (T1D) across municipalities of Sweden

Expressing the area variance as a percentage of the total variance, we observed that a very small percentage of the individual propensity of T1D was at the county (0.2%) and at the municipality (0.5%) level (ie, ≤0.5% of the individual variation in the propensity of T1D was at the area level).

The minor municipality variance observed almost disappeared when the T1D risk score was included as a compositional variable (σ2=0.011, MOR=1.11, 95% credible interval 1.05 to 1.17). The inclusion of type of municipality as a contextual variable further reduced this variance (σ2=0.003, MOR=1.05, 95% credible interval 1.00 to 1.08) and showed an association between the type of municipality and the risk of T1D. Compared with children born in the metropolitan municipalities of Stockholm, Gothenburg and Malmö, children born in all other municipality types showed an increased risk of T1D, especially those living in industrial municipalities (OR=1.39, 95% credible interval 1.19 to 1.62; table 4).

Discussion

Using a very large population of newborn children in Sweden, we investigated the overall area variation in the risk of T1D between counties and municipalities and thereby evaluated the possible relevance of general contextual effects (ie, the context as a whole) in the genesis of T1D. A naïve interpretation of the variance observed by common statistical methods might lead to the conclusion that there were large and highly significant geographical differences across counties and municipalities. In contrast, the more appropriate multilevel analysis showed that these differences were actually very small.

In performing the multilevel analyses, we applied analytical procedures that considered the hierarchical structure of the data and estimated the area cumulative incidence of T1D based on the empirical Bayes' estimates also called ‘shrunken’ residuals.25 From a statistical point of view, the observed area cumulative incidence of T1D is the actual estimate of the true area cumulative incidence. However, by calculating shrunken residuals, we minimise the bias produced by random noise affecting small areas. In our data we observed greater shrinkage in small areas with fewer children where the cumulative incidence is expected to be unstable. With larger numbers within counties, the observed cumulative incidences were more stable, the shrinkage smaller but variance remained small.

It is possible that the risk of T1D in Sweden as a whole is conditioned by a general contextual effect as suggested in the international studies,3 36 but such small area variance within Sweden indicates that the geographical boundaries that define counties and municipalities appear to have little relevance for the individual risk of T1D in the country. Our findings did not exclude the fact that, when using boundaries other than administrative ones, the context where the children were born might play a more notable role for understanding T1D risk.37 A cluster scan analysis38 (http://www.satscan.org) (data not shown), however, failed to identify any appreciable spatial clustering of T1D in Sweden. Nevertheless, a consideration when using empirical Bayes' estimates is that smoothing towards the overall cumulative incidence could conceal a true localised effect in a very small area.39 40 In any case, if these localised effects exist, it would be difficult to truly identify them as there would not be enough information to draw reliable conclusions.

We found an association between the type of municipality and risk of T1D, with a lower risk in the metropolitan areas. Our results therefore agree with previous ecological studies in Sweden reporting a lower incidence of T1D in urban-type areas.16 41 Since factors related to hygiene (ie, lifestyle habits, exposure to infections, diet) are widely considered to be involved in the aetiology of T1D, it is possible that the effect of type of municipality was mediated by differences in these hygienic factors. However, investigation of these possible mechanisms is beyond the scope of the present study.

An innovative approach in contextual epidemiology is the distinction between ‘components of health variation’ and ‘measures of association’.22 28 Measures of variance and clustering (eg, ICC, median OR,33 interquartile OR,42 spatial autocorrelation and spatial range of correlation43) are pertinent for understanding the relevance of specific context for different disease outcomes. Explicitly, before investigating whether a contextual characteristic is associated with disease risk, it is useful to estimate the extent to which individual risk is conditioned by the context of the area as a whole. Considering this approach in multilevel as well as in spatial analyses,44 we can assess the scale on which contextual influences operate (eg, local neighbourhoods, parishes, municipalities).

The approach of ‘components of health variation’ might have relevance for both aetiological research and especially for planning strategies of prevention when it comes to deciding whether public health resources should be directed to individuals or to specific communities.22 Therefore, from a public health perspective, our results show that possible strategies for the prevention of T1D should be focused on individual factors all over Sweden. A naïve allocation of resources to counties or municipalities based on the raw estimation of incidence and statistical tests of significance would be erroneous. Administrative areas could still be the appropriate arena for individual level interventions for practical reasons, but not because of differences in the incidence of T1D between areas.

A common observation in multilevel analyses is the possibility of detecting conclusive associations between contextual variables and the individual outcome even in the presence of minor area variance.37 In fact, we observed that, in spite of the very small municipality variance observed (σ2=0.017), being born in a non-metropolitan municipality was associated with an increased risk of T1D. There is a fundamental difference between measures of variation and measures of association. While the variance and its interpretation are constricted by time (ie, when the study was performed) and space (ie, the geographical setting of the study), measures of association can bear causal information that can be generalised to other contexts beyond the one where the study is performed. A contextual association can be detected as long as there is enough contrast between the exposure categories, but has less to do with the overall variance.23 28 This distinction is still not widespread in social epidemiology, and although several authors do report measures of variance45 even to the degree of focusing directly on them,23 24 46–49 many do not.50 Confounded interpretations of measures of association are also frequently found in the literature; for example, in the present investigation we could have erroneously concluded that the municipalities in Sweden are relevant to the risk of T1D since there is an association between being in a non-metropolitan area and T1D risk. However, we can only say there is an association. For quantifying the relevance of the municipalities we need to investigate variance.

Therefore, since the analysis of variance is always limited to the place and time of the study, our conclusions are necessarily specific to Sweden. We cannot exclude the possibility that a similar investigation in other countries may find a different general collective effect from that in Sweden. A detailed discussion of this concept is provided elsewhere.37 Nevertheless, most previous studies that have reported an association between specific area characteristics and T1D do not quantify the overall area variation.51 52 This may cause confusion between the information obtained by measures of association and that provided by measures of variance, as discussed above. However, it is also possible that authors who have reached a similar conclusion to ours (ie, low relevance of the geographical units at the time of birth for understanding T1D risk) have not published their results. Publication bias must always be considered in the presence of negative results.

It is also possible for the area variance to increase—rather than decrease—when considering individual level variables in the logistic regression models; that is, the geographical variance could actually be hidden by the different individual composition of areas. This circumstance has been explained by Snijders and Bokser25 and also by Fielding.53 However, we investigated the influence on the area variance of each of the individual variables included in the risk score and did not find any increase in variance.

A final caveat is that our study was more concerned with geographical factors at the time of birth than at the time of clinical diagnosis, as in most previous studies. However, our conclusions are methodologically relevant in both cases.

In summary, our study shows that multilevel regression analysis is a more suitable approach than the analyses commonly performed hitherto for investigating area variation in T1D. Among other aspects, multilevel regression takes into account the uncertainty derived from the small amount of information that might be available in some areas. When quantifying geographical variation of a disease with a low risk such as T1D, failing to account for random noise inflates estimates of area variation and may lead to erroneous conclusions on the role of contextual factors in disease risk.

Concerning T1D in Sweden, it appears that previous reports based on standard statistical tests were misleading. According to multilevel analysis, administrative areas such as counties and municipalities seem irrelevant for the individual risk of developing T1D in childhood.

What is already known on this subject

Previous studies have found small area variation in the incidence of type 1 diabetes (T1D) which has been interpreted as evidence of the existence of important geographical/contextual determinants for T1D. However, these results may be fallacious since, when analysing smaller areas, the limited number of newborn infants and the expected low risk for T1D is a source of spurious variability not properly accounted for by usual statistical methods.

What this paper adds

It appears that previous reports based on standard statistical tests were misleading. According to the more appropriate multilevel analysis, administrative areas such as counties and municipalities seem irrelevant for the individual risk of developing T1D in Sweden. Both measures of association (such as odds ratios) and of variance (such as median odds ratios or interclass correlation) should be reported and interpreted in multilevel analyses.

Acknowledgments

The authors thank the Centre for Epidemiology (National Board of Health and Welfare), Statistics Sweden and Region Skåne.

References

Footnotes

  • Funding This investigation was supported by a Doctoral Grant from the Faculty of Medicine at the Lund University to KFL, by the Swedish Research Council to JM (DNR 2007-1772), the Center for Economic Demography, the Swedish Diabetes Association to ÅL and by a government Grand ALF Research program to JM (DNR M: B 39 977).

  • Competing interests None.

  • Ethics approval This study was conducted with the approval of the regional ethical review board in Southern Sweden (DNR 71/2006).

  • Provenance and peer review Not commissioned; externally peer reviewed.