Article Text

Original research
Development and internal validation of prognostic models to predict negative health outcomes in older patients with multimorbidity and polypharmacy in general practice
  1. Beate S Müller1,
  2. Lorenz Uhlmann2,
  3. Peter Ihle3,
  4. Christian Stock2,
  5. Fiona von Buedingen1,
  6. Martin Beyer1,
  7. Ferdinand M Gerlach1,
  8. Rafael Perera4,
  9. Jose Maria Valderas5,
  10. Paul Glasziou6,
  11. Marjan van den Akker1,7,
  12. Christiane Muth1
  1. 1Institute of General Practice, Goethe University Frankfurt, Frankfurt am Main, Hessen, Germany
  2. 2Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Baden-Württemberg, Germany
  3. 3PMV Research Group, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Nordrhein-Westfalen, Germany
  4. 4Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, Oxfordshire, UK
  5. 5APEx Collaboration for Academic Primary Care, University of Exeter Medical School, Exeter, Devon, UK
  6. 6Centre for Research in Evidence-Based Practice, Faculty of Health Sciences and Medicine, Bond University, Gold Coast, Queensland, Australia
  7. 7Department of Family Medicine, School CAPHRI, Maastricht University, Maastricht, Limburg, The Netherlands
  1. Correspondence to Dr Beate S Müller; b.mueller{at}


Background Polypharmacy interventions are resource-intensive and should be targeted to those at risk of negative health outcomes. Our aim was to develop and internally validate prognostic models to predict health-related quality of life (HRQoL) and the combined outcome of falls, hospitalisation, institutionalisation and nursing care needs, in older patients with multimorbidity and polypharmacy in general practices.

Methods Design: two independent data sets, one comprising health insurance claims data (n=592 456), the other data from the PRIoritising MUltimedication in Multimorbidity (PRIMUM) cluster randomised controlled trial (n=502). Population: ≥60 years, ≥5 drugs, ≥3 chronic diseases, excluding dementia. Outcomes: combined outcome of falls, hospitalisation, institutionalisation and nursing care needs (after 6, 9 and 24 months) (claims data); and HRQoL (after 6 and 9 months) (trial data). Predictor variables in both data sets: age, sex, morbidity-related variables (disease count), medication-related variables (European Union-Potentially Inappropriate Medication list (EU-PIM list)) and health service utilisation. Predictor variables exclusively in trial data: additional socio-demographics, morbidity-related variables (Cumulative Illness Rating Scale, depression), Medication Appropriateness Index (MAI), lifestyle, functional status and HRQoL (EuroQol EQ-5D-3L). Analysis: mixed regression models, combined with stepwise variable selection, 10-fold cross validation and sensitivity analyses.

Results Most important predictors of EQ-5D-3L at 6 months in best model (Nagelkerke’s R² 0.507) were depressive symptoms (−2.73 (95% CI: −3.56 to −1.91)), MAI (−0.39 (95% CI: −0.7 to −0.08)), baseline EQ-5D-3L (0.55 (95% CI: 0.47 to 0.64)). Models based on claims data and those predicting long-term outcomes based on both data sets produced low R² values. In claims data-based model with highest explanatory power (R²=0.16), previous falls/fall-related injuries, previous hospitalisations, age, number of involved physicians and disease count were most important predictor variables.

Conclusions Best trial data-based model predicted HRQoL after 6 months well and included parameters of well-being not found in claims. Performance of claims data-based models and models predicting long-term outcomes was relatively weak. For generalisability, future studies should refit models by considering parameters representing well-being and functional status.

  • primary care
  • therapeutics
  • geriatric medicine
  • health services administration & management

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • We developed our predictive models using two completely different data sets—claims data and data primarily collected in a cluster-randomised trial.

  • The claims data contained a large number of cases, enabling our models to include many possible predictors without any convergence issues.

  • The trial data provided a rich set of potential predictor variables of high data quality and included data on patient-reported outcome measures, such as well-being and functional status.

  • Both data sets have their own methodological limitations, such as imprecise claims data (collected for reimbursement purposes) and the trial’s small sample size.

  • The nature of the data meant neither data set could be used to validate a predictive model based on the other.


Currently, up to 80% of primary care consultations involve patients with multiple chronic conditions (multimorbidity).1 A multiplicity of disorders in patients is associated with polypharmacy. Both multimorbidity and polypharmacy are recognised as a major challenge facing healthcare systems.2–5 Polypharmacy can increase the risk of mortality, hospitalisation6 7 and falls and fall-related injuries with resulting disability and loss of autonomy.8 9 It can also reduce cognitive and physical function, as well as health-related quality of life (HRQoL).2 10

The number of drugs increases the probability of adverse drug reactions, but the relationship is inconsistent, suggesting that the number of medications alone may not adequately indicate the quality of an individual’s medication regimen.11 12 The kind of drugs prescribed plays an important role in the type of reaction, with certain medication classes, such as benzodiazepines, demonstrating a significant association with falls, and medications with anti-cholinergic properties being associated with impaired cognitive and physical function in elderly individuals.13 14 At a physician level, the cause of these negative health outcomes of polypharmacy may be inappropriate prescribing, including undertreatment.15–18 At a patient level, a high number of drugs and the complexity of a drug regimen is often associated with poor adherence,19 which may be exacerbated by the presence of depression and/or cognitive impairment.20 Moreover, polypharmacy may also result in an accumulation of potentially inappropriate medications (PIMs).

Several complex interventions have been developed to optimise (inappropriate) polypharmacy. However, despite their evidence-based rationale, they have led to inconsistent improvements in process parameters of care and failed to impact patient-relevant outcomes.21 22 One possible reason for this is that the included populations are too heterogeneous in terms of their baseline risk and potentially achievable intervention effects. For example, the majority of the study population included in the PRIMUM (PRIoritising MUltimedication in Multimorbidity) trial showed very good quality of life and functional status at baseline, even though participants had at least three chronic conditions affecting more than two organ systems, five or more chronic drug prescriptions and were 60 years of age or older. The authors therefore concluded that there was not enough room for improvement.23 This highlights current difficulties in defining inclusion criteria in polypharmacy trials in such a way that selected populations have a considerable baseline risk and can be expected to benefit from the intervention. Moreover, as polypharmacy interventions tend to address inappropriate prescribing, healthcare coordination, and so on, they are generally complex.21 22 As the complex interventions are also resource-intensive, it would be preferable for a stratified approach to address patients that are at high risk of negative health outcomes and most likely to benefit from them.24

The course of multimorbidity (and associated polypharmacy) has been characterised by a decline in well-being (eg, functional decline or worsening of quality of life due to inappropriate prescriptions and/or deterioration in one or more chronic diseases), interrupted by adverse events (eg, exacerbations of chronic diseases or adverse drug reactions).25 26 In order to identify a population at high risk, it is therefore necessary to predict a wide array of possible negative health outcomes. Several prognostic models have predicted single outcomes, mainly mortality or unplanned hospital (re-)admission and to a lesser extent a future decline in quality of life, but no studies have investigated the risk for the above-mentioned combined endpoints, or involved polypharmacy-related predictors.27

The aim of this exploratory study was to develop and internally validate prognostic models to predict the risk of adverse events or a decline in well-being in general practice patients with multimorbidity and polypharmacy, and to operationalise these negative health outcomes in terms of hospitalisation, falls, level of required nursing care, institutionalisation and HRQoL. The models were based on morbidity and medication-related variables, as well as socio-demographic characteristics and parameters of healthcare utilisation.


We developed and internally validated prognostic models to identify key health problems linked with multimorbidity and associated polypharmacy (decline in well-being and adverse events: figure 1). (1) Based on claims data, we predicted the combined endpoint of hospitalisation, falls/fall-related injuries, need for nursing care, deterioration in the required level of care (nursing level) or institutionalisation, after 6, 9 and 24 months. (2) We predicted HRQoL after 6 and 9 months based on data from a cluster-randomised trial.23

Figure 1

Predicted outcomes with regard to general trajectories of well-being and quality of life over time. HRQoL, health-related quality of life.

Design and setting/study samples

Two data sets were used in modelling:

Claims data obtained from the Techniker Krankenkasse (TK) statutory health insurance company between January 2012 and December 2014. TK is the largest statutory health insurer in Germany and provided health insurance to 8.1 million persons in 2012.28 In accordance with Social Code book V, all statutory health insurance companies in Germany collect basic data on socio-demographics, details of pharmacological and non-pharmacological prescriptions and information on other health services utilisation and data on morbidity.

Trial data from the cluster-randomised PRIMUM trial23 conducted in general practices in Hesse, Germany, from August 2010 to February 2012.


Claims-based models: We aimed to use the same inclusion criteria for both data sets as far as possible. We therefore included health insurance claims data of older patients (≥60 years) with multimorbidity (at least three documented chronic diseases, from a list of 46 diagnoses and conditions, from 01 January 2012 to 31 December 2012)29 and polypharmacy (at least five documented and concurrent prescriptions from 01 July 2012 to 31 December 2012). Included patients had to have been continuously insured by TK from 01 January 2012 to 31 December 2014 (except in case of death at any time after 31 December 2012) and had to have contacted a primary care provider at least once in 2012. Patients were excluded if they were diagnosed with dementia (International Classification of Diseases, 10th Edition (ICD-10): F00-03, F05.1, G30-31, R54) or under guardianship from 01 January 2012 to 31 December 2012.

Trial data-based models: We included data from patients that participated in the cluster-randomised PRIMUM trial (n=502, intervention group: n=252, control group n=250).23 Patients with multimorbidity and polypharmacy were included in the study if they were at least 60 years old, had at least three chronic diseases from two or more chapters of ICD-10 and at least five prescriptions. Patients were excluded if they were cognitively impaired (defined as a score lower or equal to 26 on the Mini-Mental Status Exam30), had an alcohol or drug addiction or were not able to participate in telephone interviews, fill in questionnaires or express their own free will. Four out of the 502 patients (0.79%) died during the 9-month follow-up period.


Models based on claims data: We predicted the combined endpoint of hospitalisation, falls/fall-related injuries or institutionalisation in a long-term care facility, or if the need for nursing care was recognised, or the level of care (‘Pflegestufe’) had worsened at 6-month, 9-month, 24-month follow-up. We treated the parameters of health service use (hospitalisation, level of nursing care and institutionalisation) as surrogate parameters for a decline in functional status and well-being, as details of these are not included in German claims data. Outcomes were operationalised as follows:

  • Hospitalisation: We included all-cause hospitalisations, as our data did not permit us to differentiate between unplanned and elective hospitalisations.

  • Falls and fall-related injuries: We included all fractures and injuries coded in ICD-10 chapters ‘S’ and ‘T’. We excluded ICD codes for severe body injuries such as S31 (‘open wound of abdomen, lower back and pelvis’), which we assessed as related to severe bodily impact, rather than drug-related falls (see online supplemental additional file 1 for all excluded ICD codes). We also excluded osteoporosis-related fractures (ICD-10 M80).

  • Institutionalisation was defined as the admission of a patient to a long-term care facility for at least 28 days (in Germany, this is the maximum length of time considered as ‘short-term care’ in such facilities).

  • Level of (nursing) care (‘Pflegestufe’) referred to dependency on care. In the period under review, the German nursing care insurance system recognised four levels of care (‘1’ – lowest level to ‘3’ – highest level, and ‘H’, which was mainly used for people with mental illnesses who are in need for support). The onset of care and any increase in care level were taken into consideration.

Models based on trial data: We predicted HRQoL 6 and 9 months after baseline. HRQoL was measured using the EQ-5D-3L index score.31–33 The EQ-5D-3L index score is a weighted summary score of five different dimensions of health (mobility, self-care, usual activities, pain/discomfort and anxiety/depression). Each dimension has three levels. The index score is calculated based on time trade-off (TTO) norm values and ranges from 0 to 1, with ‘0’ signifying death and ‘1’ in full health. Patients who died during follow-up were assigned the value ‘0’.

Potential predictors

The potential predictors that were initially used in the two modelling approaches were available in both claims and trial data (‘core predictors’, see figure 2): To compare the two models, we first used these ’core predictors’ (all variables were continuous variables, if not stated otherwise).

Figure 2

Models and sensitivity analyses with regard to data source and predictor set. CRT, cluster-randomised controlled trial; †Best Model.

  • Socio-demographics: Age (in years), sex (male/female, binary)

  • Morbidity-related (excluding dementia): Number of chronic diseases (based on a modified list of 46 diagnoses and conditions),29 Charlson comorbidity index,34 number of specific chronic conditions according to Diederichs’ list35 consisting of 17 chronic diseases identified in a systematic review of existing comorbidity indices. As dementia was excluded, the final list contained 16 diagnoses. (All instruments including ICD-10 codes are provided in online supplemental additional file 2)

  • Medication: Number of prescriptions (defined as Anatomical Therapeutic Chemical (ATC) agents using fifth-level coding, ATC version 2011 to 2014), excluding drugs for topical applications and drug groups that were irrelevant to our research question, for example, contrast agents (ATC V-08, three-digit level).

  • Potentially inappropriate medication: We constructed two patient co-variables: (1) exposure to any PIM (yes/no) and (2) number of PIMs between 01 July 2012 and 31 December 2012 (claims-based models) and at baseline (trial data-based models). We used the following two lists to identify PIMs:

    • Modified EU-PIM list36: The list of PIMs for the elderly contains 282 chemical substances or drug classes divided into 34 therapeutic groups.

    • Modified PRISCUS list37: The German list of PIMs for the elderly includes 83 chemical substances from a total of 18 drug classes.

      We excluded from the lists PIMs that referred to specific doses, treatment duration and disease severity, as valid information on these could not be obtained from the claims data. (All instruments including ATC codes are provided in online supplemental additional file 3)

  • Anticholinergic drug burden: Scores were calculated based on all prescribed drugs with anticholinergic properties per patient. Despite substantial differences between existing scales, associations with adverse clinical outcomes, such as hospital admissions, fall-related hospitalisations, length of stays in hospital, and general practitioner (GP) visits, have been found for all of them.38 As the evidence does not support the preferred use of any particular scale, we tested the following (all instruments including ATC codes are provided in online supplemental additional file 3):

    • Anticholinergic Drug Scale (ADS)39 : The ADS weights anticholinergic properties per drug from ‘0’ – no anticholinergic activity, ‘1’ – mild, ‘2’ – moderate and ‘3’ – strong anticholinergic activity. The overall anticholinergic burden per patient was calculated as a sum score for the entire medication regimen.

    • Modified Anticholinergic Drug Burden Index (DBI)13 : The DBI comprises drugs with sedative effects (which form the sedative burden (BS)), and drugs with anticholinergic or both sedative and anticholinergic effects (which form the anticholinergic burden (BAC)). As claims data do not provide dosages, the cumulative number of sedative and anticholinergic drugs was calculated (modified DBI score).

  • Healthcare utilisation: For each patient, we obtained information on all-cause hospitalisations (yes/no), falls and fall-related injuries (yes/no) and the number of physicians involved in ambulatory healthcare, between 01 January 2012 and 31 December 2012 for models based on claims data, and in the 6 months previous to baseline for models based on trial data.

Additional potential predictor variables were used exclusively to re-fit models based on trial data, as they were only available in these data (‘additional predictors’, see figure 2; all variables were continuous variables unless stated otherwise):

  1. Socio-demographics: Education (CASMIN40) and number of persons living in the household.

  2. Lifestyle: Alcohol consumption (audit-C, categorical variables on number of drinking occasions and amount of alcohol consumed),41 smoking status (smoker/non-smoker, binary) and body mass index.

  3. Inappropriateness of medication: MAI consists of 10 items (indication, effectiveness, correctness of dosage, correctness of direction, practicality of direction, drug–drug interactions, drug–disease interactions, unnecessary drug duplications, correctness of treatment duration and costs).42 The MAI item on cost was omitted because variable discount contracts between pharmaceutical companies and statutory health insurers preclude cost comparisons in Germany. The medication reviews were conducted by a trained clinical pharmacologist (SH), who rated nine items for each prescription. Values ranged from ‘0’ (appropriate) to ‘2’ (inappropriate) whereby ‘1’ represented a middle rating of uncertain appropriateness. The assigned values were summed to give an MAI score between 0 and 18 for each prescription and across the entire medication regimen of the patient.23

  4. Morbidity-related: Severity of multimorbidity, as measured using the CIRS (the CIRS differentiates between 14 organ systems, which are assessed on a 5-point Likert scale according to severity of impairment, with the ratings ranging from no impairment to extreme impairment),43 with scores calculated as the total sum score, the number of affected organ systems and the HRQoL-CI (HRQoL-CI consists of a mental and a physical subscale, whereby the presence of certain diseases are assigned weights from ‘1’ to ‘3’, see online supplemental additional file 2).44

  5. Depressive symptoms, as measured using the GDS with 15 items.45

  6. HRQoL at baseline, as measured using the EQ-5D-3L index score.31–33

Missing values and imputation

There were no missing values in the claims data, so no imputation was carried out in models that were based on them. In models based on trial data, imputation of missing values in predictors and outcomes was conducted using multiple imputation via chained equations.46 47 We used a fully conditional specification approach by setting up an appropriate conditional density for each variable. In the imputation process, we included all variables that were used in each model. We imputed m=50 data sets and combined the results using ‘Rubin’s rules’.46

Statistical analyses

In both models, we first investigated the core predictors that were available in both data sets, including socio-demographics, morbidity-related and medication-related variables and variables for healthcare utilisation. We then refitted the trial data-based models using the additional predictors that were exclusively available for trial data, such as variables for lifestyle and well-being (figure 2).

Models based on claims data: In order to develop a prediction model for the binary combined outcome (containing all-cause hospitalisation, falls/fall-related injuries, institutionalisation or level of (nursing) care required) at 6-month, 9-month and 24-month follow-up, we performed multiple logistic regression analyses with the occurrence of at least one of the components at 6-month, 9-month and 24-month follow-up as the dependent variable. As patients were not always assigned a single general practice,48 we did not perform cluster analysis on the claims data.

Models based on trial data: In order to develop a prediction model for the continuous outcome HRQoL at 6-month and 9-month follow-up, we performed multiple linear regression analyses using the EQ-5D-3L index score at 6-month and 9-month follow-up as the dependent variable. The cluster structure of the data was taken into account by including a random intercept to produce a mixed regression model. We assumed a compound symmetry structure when estimating the covariance matrix.

Univariate analyses in both claims and trial data: Prior to conducting regression analyses, we performed univariate analyses to identify any associations between our potential predictors (at baseline) and the outcomes (at 6-month, 9-month and 24-month follow-up).

Regression analyses and variable selection: To find out which predictor variables influence the outcome variables, we used a stepwise variable selection procedure (combining forward and backward steps). We started with the full model and all potential predictor variables. After this, we used a selection procedure based on p values.49 In the backward selection step, we deleted the variable with the highest p value from the model if its p value was greater than 0.157. In the forward selection step, the variable with the lowest p value was included in the model if its p value was less than 0.156. As long as each covariate had only one df, the use of these boundaries led to the same results as variable selection using the Akaike Information Criterion.50 The resulting models are presented by providing the estimated regression coefficients (models based on trial data) or ORs (models based on claims) with 95% CIs and corresponding p values. As we expected the large sample size of claims-based models to result in low p values, we calculated additional z values and continuous net reclassification indices to gain information on the predictive power of each variable.51 Multi-collinearity was assessed using the variance inflation factor (VIF).52 In the models based on trial data, we did not account for the clustering structure when we calculated the VIF.

Performance of the models

We calculated R2 for linear models based on trial data (according to Nakagawa and Schielzeth53), and Nagelkerke’s R2 for logistic models (according to Steyerberg and Nagelkerke54 55) based on claims data. Furthermore, in order to assess performance more realistically and to internally validate the models, we used the AUC (area under the receiver operator curve, equivalent to the concordance index) to validate the logistic regression model based on claims data, and R2 to validate the linear regression model based on randomised controlled trial data, in combination with 10-fold cross-validation.56 R2 and Nagelkerke’s R2 are measures of the overall model’s ability to assess explained variance. The AUC provides a measure of the model’s discriminatory ability to distinguish patients at risk from those that are not.

Sensitivity analyses

Using sensitivity analysis, we applied two further modelling approaches (at first separately and then in combination): (1) modelling without multiple imputation and (2) modelling without variable selection.

Software: We made use of different statistical packages to analyse the data in R.47 57–63

We used TRIPOD reporting guidelines (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) in the preparation of this manuscript.64

Patient and public involvement statement

Neither patients nor the public were involved in this study.



Claims data

The total sample of those ≥60 years that were continuously insured by TK from 01 January 2012 to 31 December 2014, and had at least one primary care contact during 2012, amounted to 1 377 917 persons. Overall, 592 456 patients met the pre-specified criteria and were included in the analyses (see study flow-chart, online supplemental additional file 4).

Trial data

Of the 505 patients that participated in the PRIMUM trial, all but 3 were 60 years or older. The final analyses therefore included 502 patients.

Key characteristics of study participants are shown in table 1.

Table 1

Characteristics of study participants

Univariate analyses

In the claims data, univariate analyses revealed significant associations between the combined outcome and the following predictors: Age, sex, disease count, Charlson Comorbidity Index, EU-PIMs, ADS, DBI, previous hospitalisations, previous falls and number of physicians involved in the patient’s care at all follow-ups (after 6, 9 and 24 months) (online supplemental additional file 5). In the trial data, HRQoL was significantly correlated with the shared predictor variables disease count, number of chronic prescriptions, previous falls and sex and the additional predictors depression and HRQoL at baseline (online supplemental additional file 6).

Prognostic models

Claims data

The model predicting the combined endpoint at 6 months had the highest C-statistic (AUC with 10-fold cross validation: 0.71, see table 2), but a low explanation of variance (Nagelkerke’s R2 without cross validation: 0.16). Variables in the model with the highest predictive power were previous falls/fall-related injuries and previous hospitalisations, as well as age, number of involved physicians, and number of chronic diseases (‘disease count’) (table 3). The models predicting the combined outcome at 9 and 24 months had AUCs calculated with 10-fold cross validation of 0.68 (R² without cross validation: 0.15) and 0.69 (R² without cross validation: 0.13) respectively. The VIF (to assess any multi-collinearity) showed moderate values (maximum 7.5).

Table 2

Comparison of models

Table 3

Best performing models per data set and set of predictors

Trial data

All results presented in this section are based on the modelling approach and involve multiple imputation of missing values and the variable selection procedure. Models predicting the HRQoL endpoint at 6 months that were based on core predictors available in both claims and trial data showed low predictive accuracy (R2 with 10-fold cross validation: 0.111) (table 3, model 2.4). HRQoL at 6 months was best predicted when additional predictors that were exclusively available in the trial data were also included (R2 with 10-fold cross validation: 0.507). The variables with the highest predictive power were depressive symptoms (GDS) and EQ-5D-3L Index Score (Baseline). MAI was also predictive (table 3, model 3.4). The VIF showed small values (maximum 2.2).

Comparison of model quality and sensitivity analyses

The shorter the time span of the prediction, the better the explained variance and hence, the performance of the model. However, model performance remained fair to poor when it only included predictor variables that were available for both claims and trial data. Sensitivity analyses confirmed these results (table 2).


Main results

Our best overall prognostic model predicted HRQoL after 6 months in older general practice patients with multimorbidity and polypharmacy. It performed well, was based on trial data and explained more than half of the variance. The most important predictors were depressive symptoms, the initial level of HRQoL and MAI—all of which were only available as ‘additional predictors’ in trial data. Prognostic models in trial data, which were exclusively developed from ‘core predictors’ (available in both data sets) performed worse, as well as claims based models and models based on both data sets that had longer forecast periods (9 months or more). In both trial data-based and claims-based models, outcome components at baseline had a relatively high impact (ie, HRQoL at baseline in the trial data-based model and previous hospitalisation and previous falls/fall-related injuries in claims-based models). Although this is unsurprising and is often the case in prognostic models,65 it nonetheless seems reasonable to retain the variables in the model. Furthermore, we identified further predictors, such as depressive symptoms and medication appropriateness, which had a relatively high predictive power.

Comparison with the literature

The presented results are consistent with results from other studies. The AUC values in our claims-based models (AUC 0.68 to 0.71) are comparable to those of 23 prognostic models for Case Finding conducted in elderly patients in primary care. These models predicted (re)hospitalisation, functional impairment, institutionalisation and death.65 The quality of models with a low risk of bias was AUC 0.60 to 0.78, but no explanation of variance was provided. The best model for predicting death within 4 years (AUC: 0.82) included 12 predictors comprising age, sex, body mass index, chronic diseases, smoking status and functional parameters.65 Models that included additional trial data (eg, clinical data) predicted endpoints better than models based only on claims data.65–67 In many models described in other studies, healthcare utilisation parameters, and especially previous hospitalisations, were predictive of (re)hospitalisations, emergency admissions and functional impairment.66 68 69 The predictive power of sex is inconsistent: in 18/27 risk models, sex was included in the final model;66 in 7/23 risk models, male sex was predictive,65 while a further 25 studies found sex to have no influence.68 69 Model quality also improved in studies that included multimorbidity and polypharmacy parameters.66 68 70 However, the parameters and instruments used in modelling (eg, CIRS, Charlson Comorbidity Index and disease count, as reported here) varied considerably among studies. They were neither consistently predictive, nor were certain parameters or instruments better than others.66 69 70

Most published models were developed to predict the risk of hospitalisation.66 68–74 Other models predicted functional outcomes,70 while four models predicted adverse drug reactions.74 So far, little is known about the predictive power of polypharmacy parameters and the appropriateness of prescriptions, especially the MAI has never been used in prognostic models. Furthermore, no models have yet been developed to predict HRQoL in patients with multimorbidity and polypharmacy in general practice.27 70

Strengths and limitations

One strength of our study is that we could use two data sources with differing advantages in our exploratory analysis: claims data contained a large number of cases, and trial data provided additional high-quality patient data including functional status and HRQoL. Both data sets also have their limitations, since claims are documented for billing purposes and are therefore imprecise, whereas our trial data set consisted of only a limited number of observations. Thus, each data set allows its own endpoints to be modelled. Risk modelling is especially complex in multimorbid patients with polypharmacy, as predictor variables in this patient collective are often associated with one another (eg, diagnoses and prescriptions). In addition, comparable risk situations can lead to different endpoints, as risk often depends on context. For example, a drug-induced fall may have no health-related consequences or may lead to impairment and institutionalisation.

Further to these key limitations, our results need careful interpretation for several reasons: First, the combined endpoint in the claims-based models yielded a high event rate, which may have resulted in overoptimistic results in our logistic regression. However, other approaches would not have resolved this problem to suit our purposes either. Additionally, we still have enough cases in both categories of the dependent variable to conduct a valid model estimation. Nonetheless, the low performance of the claims model may have been because predictors acted in different ways on the different elements of the combined outcome, thus resulting in greater heterogeneity.75 Second, the small sample size of the trial population may have led to some overfitting of the model. At the same time, the VIF (to assess any multi-collinearity) showed only up to moderate values. The application of shrinkage methods would have been a possible alternative to address this limitation.76 However, there is an ongoing debate whether it solves such problems, and a recent study has suggested that although shrinkage can result in improved calibration, it may not be superior in terms of reducing overfitting.77 Furthermore, shrinkage models lead to biassed estimates of the regression coefficients, thus making results more difficult to interpret. Third, in our modelling approach we tested disease-based indicators such as the Charlson Comorbidity Index and CIRS that were developed and validated for other purposes. However, we chose indicators that showed a strong association with negative health outcomes.35

Relevance for primary care and research implications

As the models derived in our study have not been externally validated and our methods have some limitations, we do not claim to have developed comprehensive prognostic models to identify older general practice patients with multimorbidity and polypharmacy at risk of negative health outcomes. For this reason, we plan to conduct an individual patient data-based meta-analysis to further develop and externally validate the models presented here (PROSPERO ID: CRD42018088129).

It is, however, very likely that baseline components of our predicted endpoints are important predictors, especially considering these results are unsurprising and entirely plausible. A decline in HRQoL, a previous hospitalisation and a previous falls/fall-related injury can therefore be seen as a warning parameter ('red flag') that may help general practitioners in recognising older patients with multimorbidity and polypharmacy at high risk of adverse health outcomes. These patients are therefore more likely to benefit from an intervention than others with low or no risk.24 Hence, researchers evaluating polypharmacy interventions, such as medication reviews, may like to consider our models when deciding on selection and inclusion criteria for a study population.


This study provides prognostic models to identify older general practice patients with multimorbidity and polypharmacy at high risk of deterioration in HRQoL, hospitalisation, falls/fall-related injuries, institutionalisation and a need of nursing care. Outcome components, such as previous falls, hospital stays, reduced HRQoL and depression, were important predictors of these negative health outcomes in our models. They can be seen as warning signs of future worsening and an indication that these patients are likely to benefit from interventions to optimise their medication. Future studies should externally validate the models and evaluate the effectiveness of polypharmacy interventions in high-risk patients.

Ethics approval and consent to participate

Claims may be analysed by statutory health insurance companies in accordance with § 284 of Social Code Book V. For the research questions of this project, claims data were analysed by Cologne University, Goethe-University Frankfurt and Heidelberg University collaboratively with TK. When claims are anonymously analysed in accordance with Good Practice in Claims Data Analysis,78 no further ethics vote is required. Regarding our trial data, the ethics commission of the medical faculty of the Johann Wolfgang Goethe University, Frankfurt/Main approved the PRIMUM trial (resolution number E 46/10, file number 123/10, date: 20 May 2010) and all of the participants gave their written informed consent before taking part.


The authors would like to thank Phillip Elliott for the language review of the paper.


Supplementary materials


  • Twitter @rafaoxford

  • Contributors MB, FMG and CM designed the study. PI, LU and CS analysed the data. BSM, LU, PI, CS, FvB, MB, FMG, RP, JMV, PG, MvdA and CM contributed to the interpretation of the data. CM and BM drafted the manuscript and all authors revised it and subsequent versions of the manuscript critically for important intellectual content. All authors approved the version to be submitted for publication. LU, CS and CM had full access to all data and are responsible for the integrity and the accuracy of the data analysis.

  • Funding This study was supported by the German Statutory Healthcare Insurance Company Techniker Krankenkasse.

  • Competing interests FMG, BSM, MB and CM received grants from the German Statutory Healthcare Insurance Company Techniker Krankenkasse during the course of the study. CS has been employed by Boehringer Ingelheim GmbH & Co KG since October 2019.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement The data sets generated and analysed in the current study are not publicly available, as further analyses are ongoing.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.