Article Text

## Footnotes

Contributors FS and WM designed the study. FS performed the analysis and wrote the initial version of the manuscript. CK gave statistical advice, provided the greedy maximization algorithm and gave support preparing the manuscript. WM provided the German Index of Multiple Deprivation (GIMD), supervised the study and gave support preparing the manuscript. JF advised on deprivation indices and gave support preparing the manuscript. All authors read and approved the final manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement Extra data regarding the GIMD is available by emailing Werner Maier (werner.maier@helmholtz-muenchen.de).

Patient consent for publication Not required.

## Statistics from Altmetric.com

### Strengths and limitations of this study

There is only limited literature on the application of different weighting approaches of deprivation indices—this study adds to that body of work.

Our study provides an overview of established weighting approaches for deprivation indices used in Europe.

Sensitivity testing of deprivation indices is particularly important as there seems to be no gold standard.

We compare a broad range of normative and empirical weighting approaches for the domains of an Index of Multiple Deprivation.

Limitations of the study concern the selection of weighting methods resulting from restricted data access at regional level.

## Introduction

Indices of deprivation are increasingly being used to investigate health and, in some countries, as tools of public policy.1–5 Therefore, it is important that these indices are transparent and rigorous in their construction so that confidence and understanding in their use are maintained.

In the 2000s, a series of deprivation indices with a multidimensional structure were introduced in the UK. These ‘Indices of Multiple Deprivation’ (IMDs) have been updated regularly ever since.6 The domains of deprivation were identified from the literature and were a result of the availability of data at the time. A key aspect to consider when constructing such indices is the weighting and consolidation of the different deprivation domains that produce the final overall index.

Transparency and availability of data used in the indices mean that indicators and weightings can be adapted to particular demands by researchers. Adaptation may be needed, for example, to prevent autocorrelation effects where a component of the index is also related to the independent variable under consideration.

An IMD for Germany has been developed based on the methodology according to Noble *et al*.6 It was first applied in the German federal state of Bavaria (‘Bavarian Index of Multiple Deprivation’, BIMD) and subsequently as a nationwide IMD (‘German Index of Multiple Deprivation’, GIMD).7 8 For the construction of the German deprivation indices, domains from the British IMDs were partly used (eg, income and employment), and additional domains for social capital and municipal revenue were introduced. The GIMD includes both aspects, material deprivation (eg, income) as well as social deprivation (eg, social capital).

The GIMD has been used repeatedly for analyses regarding the relationship between area deprivation and morbidity, mortality and healthcare provision in Germany, and a persistent positive association has been shown between area deprivation and health outcomes.9–11

One crucial point in building IMDs involves the weighting of the different deprivation domains. So far, weightings of IMDs have been conducted mainly by analysing literature on multiple deprivation and based on expert consultation.12 Regarding the domain weights of the English IMD, alternative empirical weightings were carried out by C Dibben, which led to a recommendation of adjustment of the weights.13 However, this did not yield an alteration in the weighting of subsequent IMDs, as user surveys ‘did not reveal significant support for moving to new weights’,12 and consisted only of two different empirical methods.

Besides the IMDs in the UK and Germany, several alternative approaches to the development and weighting of deprivation indices have been developed in other European countries14 as well as non-European countries.15 16 These approaches consist of a variety of (empirical) weighting approaches, which have not been applied to the British IMDs. However, it seems that almost all the approaches to weight deprivation indices are based on single methods, and sensitivity analysis regarding the application of different methods to a specific deprivation index has not been done. Additionally, literature regarding the application of different weighting procedures to a deprivation index is lacking.

As the GIMD was weighted by experts following the model of the British IMDs, we conducted a sensitivity analysis for the domain weighting of the GIMD following the example of Dibben *et al*.13 The aim of this study was to test the stability of the GIMD to different weighting approaches by conducting correlation analyses with mortality as a key health outcome. We decided to examine several alternative weighting approaches for the domains of the GIMD by stepwise comparison:

From a literature review, we obtained an overview of weighting approaches for deprivation indices in Europe and selected methods that can be used for alternative weighting approaches to the domains of the GIMD.

Regarding the weighting of the domains and the distribution of the GIMD scores, we analysed the results of the different weighting approaches and compared them with each other.

We compared the associations of these new versions of the GIMD with total mortality (all age groups) and premature mortality in Germany (<65 years) in order to conduct a sensitivity analysis concerning the different approaches.

Finally, we identified the weighting set that maximises the association between the GIMD and mortality.

A conceptual distinction between the different weighting methods was established with the identification of normative and empirically based approaches.

## Methods

### Data for the statistical analysis

In order to construct different GIMD versions using alternative weighting approaches, we used regional data from the original GIMD from 2010 (GIMD 2010) for the domain and composite scores of the 412 districts in Germany.17 For the construction of the original GIMD, Maier *et al*7 17 standardised nine deprivation indicators and assigned them to seven deprivation domains, which represent different dimensions of deprivation: income, employment, education, environment, security, municipal revenue and social capital (online supplementary 1). Each district is provided with a deprivation score for every single domain. The domain score is a statistical measure for the extent of area deprivation in a regional unit. The higher the deprivation within a district, the higher the domain score for the district. Subsequently, the domain scores are weighted based on a theoretical foundation and expert consultation and summed for an overall deprivation score for every district. For further details, see Maier *et al*.7 17

### Supplementary material 1

Regarding an analysis of the relationship between area deprivation and both total mortality and premature mortality, we used raw mortality data and population data from 2010 at the district level, derived from the German Federal Statistical Office.18 The districts were identified by official district code numbers. Using the mortality and population data, we indirectly calculated standardised mortality rates (SMRs) for both total mortality (SMR ‘total’) as well as premature mortality (SMR ‘premature’). This was necessary to compare districts because of their highly varying population size.19 For details on the calculation of the SMR, see online supplementary 2.

We used the variable ‘available living space per inhabitant’ from the German Federal Statistical Office from 201018 as a proxy of deprivation. We reversed the polarity of its values and thus make it more comparable to the GIMD scores.

### Methods for the weighting and methods for the statistical analysis

Additional to the original weighting of the GIMD 2010, we decided to use four methods for the weighting of the GIMD domains found in a literature review (table 1). We searched relevant literature in the databases PubMed and Embase (eg, keywords used in PubMed: (deprivation OR deprived) AND (index OR indices) AND (area* OR region* OR neighborhood OR neighbourhood), limits: English OR German OR French OR Italian OR Spanish).

Besides the equal weighting of the domains, we used two commonly used empirical methods and an additional greedy maximization algorithm method. The purpose of the empirical approaches was to extract relative weights for the domain scores from an empirical dataset. The extracted coefficients of the methods were used as relative weights for the domain scores, which should sum to 1 (or 100%), before the summation of the domains to an overall deprivation score.

*Original weighting of the domains of the GIMD through theoretical foundation and expert opinion according to Maier et al.*8 For weights used, see online supplementary 1.*Equal weighting of the seven GIMD domains; thus, each domain weighted with 1/7.*This approach was originally used for deprivation indices by Carstairs and Townsend.20 21 To date, this approach is still used for deprivation indices consisting of just single deprivation indicators.21 22 For this approach, an equal importance of all deprivation indicators is assumed. In our analysis, we transferred this approach to the domain level.*Weighting of the domains by the coefficients of a linear regression analysis with a proxy for deprivation (‘available living space per inhabitant’) as the dependent variable and the GIMD domains as the independent variables.*We had to choose a dependent variable for the linear regression that had not been used for the construction of the GIMD domains and could be considered as an indicator of deprivation.13 Townsend, Carstairs and Jarman considered overcrowding of living space as an indicator of deprivation.20 21 23 We assumed that the availability of living space per inhabitant in an area could act as a proxy for area deprivation: the more deprived the area, the less living space is available per inhabitant.24–26 For this approach, we calculated the absolute value of the regression coefficients and then used them as relative weights for the specific domains. Subsequently, the weighted domain scores were summed to an overall score. Linear models for the extraction of weights for a deprivation score have already been conducted in several studies.13 27 28 Because of the normal distribution of the dependent variable, we conducted an ordinary least square regression.*Weighting of the GIMD domains using a greedy maximization algorithm (Kurz C, Maier W, Rink C. A greedy stacking algorithm for model ensembling and domain weighting. Working paper. 2019)*. This yields weights for the domains close to the maximum possible correlation between the GIMD 2010 and mortality as a relevant outcome of deprivation (online supplementary 3). The weighted domain scores of the GIMD were then added together to an overall index for both total mortality and premature mortality. This addition to the methods of the literature search aimed to extract weights for the maximum Spearman correlation between GIMD and mortality and can thus be seen as an outcome-specific approach with the independent variable mortality. Complete circularity was present because mortality had already been used for the extraction of the weights. In contrast, the other methods could be seen as general weighting approaches for deprivation indices.*Weighting of the domains according to the results of an exploratory factor analysis (EFA).*We chose a principal axis factoring (PAF) approach for the extraction of the factors. PAF is a commonly used extraction method for factor analysis and requires no specific distribution of the entered variables. This non-parametric approach was necessary because of the exponentially transformed domains.29 A priori, we specified the extraction of one factor (as a latent factor, measuring ‘multiple deprivation’) out of the seven domains. The absolute values of the factor loadings of the different domains were used as relative weights for the domains. Again, the weighted domains were added together to an overall deprivation score.

### Correlation analysis and statistical software

Subsequently, we performed a sensitivity analysis of the newly weighted GIMD versions. We conducted correlation analyses in order to calculate the relationship between the different GIMD versions and both total as well as premature mortality (in terms of SMRs) and compared their results. For the analysis, we used Spearman’s rank correlation coefficient (ρ) as a robust approach. This was required, in our opinion, as the GIMD score could be interpreted as an ordinal variable because of the ranking of the districts during the generation of the domain scores.7 30 Correlation analyses were each performed with a GIMD version and both total mortality and premature mortality. We also tested for significance of these bivariate correlation coefficients at an α-level of 5%.31 For comparison of the bivariate correlations among each other, we performed t-tests for paired correlations. For this, we used Williams’s t-test for the comparison of correlations out of dependent samples.32 We compared two correlation coefficients in terms of both total and premature mortality at an α-level of 5%. For the statistical analysis, we used the Software R 3.2.3.33

### Patient and public involvement

Patients and/or public were not involved in this study.

## Results

### Population size of the districts and estimation of the SMRs

The size of the population of the 412 districts varied with median size of 139 188 inhabitants, interquartile range (IQR) of 130 170 persons, minimum size of 33 944 and maximum size of 3 460 725 persons. Raw mortality of the 412 districts varied with median of 1522 death cases, IQR of 1347 cases, minimum of 413 cases and maximum of 32 234 cases. Qualifying date of the data was 31 December 2010. We estimated total mortality by calculating ‘SMR_{total}’ for the districts with a mean of 1.0175 (SE: 0.004) and premature mortality ‘SMR_{premature}’ with a mean of 1.0165 (SE: 0.004).

### Weights of the domains of the alternative approaches

An overview of the identified weighting methods for deprivation indices is given in table 1. Alongside a description of the weighting and the construction of the deprivation indices, we offered selected advantages and disadvantages of the methods. This was completed with selected examples. From this table, we chose four approaches additional to the original weighting of the GIMD.

We found considerable differences between the domain weights resulting from the different approaches (table 2).

The weights for employment deprivation showed the largest variation with a range of 34 percentage points. The other deprivation domains showed a range of at least 13 percentage points. Educational deprivation within the maximization algorithm and income deprivation within the linear regression showed very small weights compared with the weights of the original GIMD 2010. Municipal revenue deprivation resulted in a weight twice as high as the original weight within the linear regression. Concerning the algorithm, the weight for social capital deprivation was three times the original weight. Concerning premature mortality, the weight for employment deprivation was twice as high as the original weight for the GIMD. Deprivation domains for social capital and district income showed constantly higher weights for the empirical approaches compared with the two normative methods. The different GIMD versions revealed different distributions of the overall deprivation scores (table 3).

Assumptions for the linear regression were generally met, and the model had significant explanatory power (adj. R^{2}=0.33). Five of the seven domains showed a significant effect on the deprivation proxy. Heterogeneity was present; thus, we presented robust SEs. Additionally, we provided tests of the assumptions of the linear regression model (online supplementary 4). The factor analysis generally had significant explanatory power (χ^{2}: 584.65, p<0.0001), but showed low reliability (Tucker-Lewis index=0.50) and an RMSEA of 0.32 with tight CIs (0.30 to 0.34) indicated that this one factor was not a good fit to the data (online supplementary 5).

### Results of the statistical analysis

Correlation analysis between the differently weighted deprivation indices and mortality showed different results (table 4).

Deprivation indices, domains weighted by the maximization algorithm, showed the maximum correlation with total mortality (ρ=0.615) and premature mortality (ρ=0.832). Correlations between the original GIMD and both total and premature mortality were ρ=0.578 and 0.767, respectively. Correlations between the equally weighted GIMD and mortality were the lowest with ρ=0.535 and 0.699. All correlations were significant concerning both total and premature mortality (p<0.001). Additionally, bivariate correlations between all indices were significant (ρ between 0.86 and 0.98).

Pairwise comparisons of the correlation coefficients with Williams’s t-tests showed a differentiated result (see online supplementary 6). Almost every pairwise difference in the correlation coefficients was significant at the 5% α-level. One exception was the difference in the coefficients between the original GIMD and the GIMD weighted by linear regression concerning total mortality. The other deviation was the difference between the original GIMD and the GIMD weighted by factor analysis concerning premature mortality. The difference was not significant, neither one-sided nor two-sided. Maximum correlation coefficients of the GIMD, weighted by the algorithm, differed significantly from all the correlation coefficients of the other methods regarding both total and premature mortality. When we corrected for the multiple comparison of the difference of the correlation between the GIMD versions, there was a slight difference present in the significances (online supplementary 7).

## Discussion

The central objective of the study was to explore whether alternative weighting approaches had an influence on the relationship between area deprivation and mortality when applied to the GIMD. Thereby, different weighting methods were selected if they were, on the one hand, applicable to the domain-based construction of the GIMD and, on the other hand, seemed feasible in the course of an application of a multimethodical approach. The four different methods were applied to the weighting of the domains of the GIMD 2010. The selected approaches and the original method were compared concerning both the weighting of the domains of the GIMD and the relationship between GIMD and mortality.

There was little evidence in the literature concerning the application of different weighting methods for multidimensional deprivation indices. However, a summary of different weighting approaches and their classification was presented by Noble *et al*.6 They briefly assessed the specific procedures of the methods (eg, empirical approaches) and were in favour of a weighting driven by literature considerations on multiple deprivation. Regarding the application of empirical weighting approaches for the English IMD 2004, we want to emphasise Dibben’s work.13 He recommended new weights for the domains of the IMD, as the empirical weighting approaches indicated a higher weighting of the health domain and a lower weighting of the employment domain. However, this suggested swapping of weights was not eventually applied to the subsequent versions of the English IMD. The maintenance of the weights was justified by a consultation of IMD users and stable results of the IMD with either existing or suggested weights.12

In this study, we pursued a multimethodical approach for the weighting of the GIMD, including empirical methods. Owing to the different inherent intentions of the selected methods, we integrated the approaches as follows:

*Normative approaches:*the original weighting of the domains according to Maier*et al*7 17 through theory and experts’ opinion. We used the term ‘normative’ because weights for the domains must be selected a priori subjectively before they can be validated with data.*Specific empirical approaches*: concerning the maximization algorithm with the dependent variable mortality, a weighting of the domains has been sought that was in line with the relationship between area deprivation and both total mortality and premature mortality and should maximise the correlation between them.*General empirical approaches:*in contrast to the specific empirical approaches, the weighting of the domains was realised according to the results of a linear regression model or according to a factor analysis to generate generally applicable indices, which can also be used for the analysis of other health outcomes.

A further distinction of the methods can be made regarding their conceptual aspects. Factor analysis and principal components analysis are unsupervised methods that require no prior judgements and construct deprivation solely based on the domain knowledge. On the other hand, linear regression and the maximization algorithm are supervised or predictive methods considering deprivation based on a specific proxy and assuming a relationship between this proxy and deprivation.

### Assessment of the alternative weighting approaches

The high weighting of the deprivation domains income and employment of 50% altogether within the original GIMD was confirmed by the empirical weighting of the factor analysis approximately, as well as the weighting of the environment deprivation domain. Educational deprivation was weighted considerably lower by the factor analysis and algorithm than by the original GIMD. Deprivation domains for district income and social capital were constantly weighted much higher by the empirical approaches than by the approach of the original GIMD. The shift in the weighting of the domains can be explained by the data dependency of the empirical approaches and should be reviewed using alternative data. Should the higher weighting of the district income and social capital domains be confirmed, an adjustment in the domain weights could be considered. Perhaps those context variables have a higher relevance concerning area deprivation than expected by Maier *et al*.8

The low weighting of the deprivation domains of income by the linear regression and education by the maximization algorithm can barely be reconciled with existing evidence regarding the positive relationship of these two deprivation domains and mortality.34 35 The high weighting of the employment deprivation domain (49%) by the algorithm, concerning mortality, could reflect the high relevance of unemployment relating to premature mortality.

### Relationship of the GIMD versions and mortality

Throughout the analysis, we could not find a weighting method that could be seen as superior compared with the other approaches or could even be recommended as a gold standard. Even though almost all GIMD weighting approaches differed significantly in their correlation with mortality, using only significance as a method of evaluation for the approaches seemed inappropriate. The correlation coefficient between the different GIMD versions was already very high (ρ>0.89), so that even small non-relevant differences could have produced significant results. All correlations of the GIMD versions with mortality were highly significant and showed rather small differences in respect to absolute values (ρ between 0.54 and 0.62). Since we conducted multiple paired t-tests, type-1 error inflation was present. In an additional analysis, we corrected for multiple testing with Benjamini and Yekutieli adjustment.36 When we corrected for the correlation of the GIMD versions with mortality, the significance of the results did not change (table 4). When we corrected for the multiple comparison of the difference of the correlation between the GIMD versions (online supplementary 6), there was a slight difference present in the significance (online supplementary 7).

The empirical weighting of the GIMD by an EFA represented an adequate alternative to the theory-based weighting of the domains, on account of the simple operability and the highly significant association of this GIMD version with mortality. Thereby, a general applicability of the GIMD for the analysis of implications for other health outcomes can be ensured, and the results of different datasets can be compared by model fit measures.29 Despite the significant correlation, the application of equal weighting of the domains could be considered as obsolete, as this would produce an implicit weighting of the domains depending on the availability of indicators for each domain.6

### Strengths and limitations of the study

Using a multimethodological strategy, we were able to cover a broad bandwidth of weighting approaches. As there seems to be no gold standard for weighting of deprivation indices, we recommend that sensitivity testing of the GIMD is particularly important. An equal weighting as well as an EFA for the weighting of multiple deprivation domains were carried out in this study for the first time. A factor analysis of the IMD domains was advised by Deas *et al*,22 but has not been implemented to date. Furthermore, we provided an outcome-specific weighting approach in the form of a greedy maximization algorithm: this method produced a domain weighting of the GIMD that maximised a specific measure concerning one health outcome (in this case, the correlation between GIMD and mortality). A transfer of the algorithm to other areas of interest is possible without difficulty but should be used mainly for orientation, which is possible concerning a selected measure, given a dataset.

Limitations of the study concerned the selection of weighting methods such as the revealed preferences or Bayesian factor analysis (cf table 1), which resulted from restricted data access at a regional level. Empirical methods are always data dependent and are restricted concerning a possible comparison over time, especially with the use of cross-sectional data. This could be addressed by using longitudinal data and would enable us to measure ‘between variation’ (ie, over different locations) to ‘within variation’ (ie, the same location over time).

Using correlation coefficients to evaluate the association between different GIMD versions and mortality does not necessarily imply a causal association between area deprivation and mortality. Additionally, overfitting is present by using the greedy maximization algorithm as a weighting approach, since it already yields the weights for the maximum correlation between the GIMD and mortality. However, there is reliability of using the GIMD to evaluate total and premature mortality, since the correlation between the GIMD and mortality is very stable over time (GIMD scores from 2006 to 2010 yield very similar correlations with mortality). Another point was the lack of literature regarding the application of different weighting procedures. This limitation could partly be counterbalanced with the input of expert interviews. With regard to the linear regression, the selection of the deprivation proxy should be reconsidered ex post, as the use of the deprivation measure regarding living space per inhabitant showed a rather weak (yet significant) positive correlation with overall deprivation (ρ=0.35). This could be explained by the idea that, in less deprived cities such as Hamburg and Munich, there can be—in general—less available living space because of a very competitive housing market. So, there could be a partial negative correlation between deprivation regarding available living space and overall deprivation in some areas. Unfortunately, multidimensional proxies at district level were not available for Germany. We tested other measures like the overall gross domestic product (GDP) per district and the GDP per employed persons per district. They had a similar or lower correlation with the original GIMD as the living space variable, but using them had some major drawbacks. We understand, that the use of a one-dimensional proxy is a limitation in our work. However, given the very restricted variety of appropriate variables at the district level in Germany, the selection of this proxy was a pragmatic approach to test a weighting approach based on a linear regression.

We are aware that the stability of the GIMD could have also been tested by applying systematic changes to the weighting of the GIMD domains without using a framework of different weighting approaches. The correlation between some deprivation domains (eg, income or employment) is relatively high and thus any weighting scheme would likely give highly correlated results with mortality. A recent study from the UK showed that 94% of the variance in the English IMD could be explained by the income and employment domains alone, even though they had weights of 22.5% each in the overall index. The authors stated that even if the weights for the other domains had been zero, there would have been very little impact on the overall index.37 Nevertheless, the aim of our study was to provide a conceptual framework of weighting approaches (normative and empirical) for an index of multiple deprivation and to combine the results of the literature search with a sensitivity analysis based on the GIMD.

## Conclusion

The variation in the domain weights of the GIMD did not have a large measurable impact on the relationship between area deprivation and mortality. The correlation between the GIMD and both total mortality and premature mortality proved to be very stable, regardless of the application of the different weighting approaches and the resulting different sets of domain weights. The GIMD versions produced relatively stable results with regard to the central distribution measures of the overall scores (table 3).

The theory-based weighting of Maier *et al* can be interpreted ex post as more conservative than the empirical weighting approaches, as the weighting of the income and employment domains is relatively strong at 50% in contrast to the empirical methods. Nevertheless, a theory-based selection of domains seems to be more meaningful than an empirically based selection because the results of the empirical methods are restricted, as discussed above. The stability with respect to the scores and the relationship to mortality supports this advice. A modelling of the GIMD with a confirmatory factor analysis could be considered as a promising empirical approach with the prospect of temporal comparability in future studies.

## References

## Footnotes

Contributors FS and WM designed the study. FS performed the analysis and wrote the initial version of the manuscript. CK gave statistical advice, provided the greedy maximization algorithm and gave support preparing the manuscript. WM provided the German Index of Multiple Deprivation (GIMD), supervised the study and gave support preparing the manuscript. JF advised on deprivation indices and gave support preparing the manuscript. All authors read and approved the final manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement Extra data regarding the GIMD is available by emailing Werner Maier (werner.maier@helmholtz-muenchen.de).

Patient consent for publication Not required.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.