Introduction

Computer simulation is a method of modelling the progression of type 2 diabetes mellitus and predicting long-term outcomes of the disease. Since the publication of the first United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS-OM1) [1], the use of simulation modelling in diabetes has increased, with at least eight models in use worldwide [2, 3], many of which use the published equations from UKPDS-OM1 [3]. UKPDS-OM1 is a multi-application model and has been used in a wide variety of applications, including cost-effectiveness analyses [4, 5] and prediction of life expectancy [6].

The UKPDS-OM1 has been tested alongside several other diabetes simulation models at Mount Hood Challenge meetings. A general conclusion was that the models performed reasonably well in terms of predicting the relative risk of interventions vs control treatments, but less well in predicting absolute risk [3]. Additionally, a temporal validation study found that UKPDS-OM1 over-predicted the probability of events for high-risk subgroups [7].

Model building is an iterative process and models need to be updated as new information becomes available [8]. The UKPDS-OM1 was based on trial data collected in the UKPDS up until 1997. Additional information collected during the UKPDS 10 year post-trial monitoring (PTM) period [9] provided an opportunity to update the simulation model and to incorporate data on new risk factors and outcomes that were unavailable when the UKPDS-OM1 was constructed.

Our aim was to build a new model, Outcomes Model version 2 (UKPDS-OM2), based on the larger dataset which included additional more recent data that were also more clinically relevant, as participants were no longer in a clinical trial. This involved: (1) re-estimating, over a longer duration of follow-up, the seven original risk equations for complications (myocardial infarction [MI], ischaemic heart disease [IHD], stroke, congestive heart failure [CHF], amputation, blindness and renal failure); (2) estimating new equations, not in the original model, for diabetic ulcer and some second events; (3) developing new equations for all-cause mortality; (4) exploring the use of new risk factors such as microalbuminuria which have been shown to be predictive of diabetes-related complications.

We also present internal validation of the UKPDS-OM2 over 25 years of follow-up, carry out a sensitivity analysis and compare predictions from the original and new models using a contemporary patient-level input dataset.

Methods

The model—UKPDS-OM2

In common with UKPDS-OM1 [1], the major objective of the new model is to simulate major diabetes-related complications over a lifetime and to calculate health outcomes such as life expectancy. It is a patient-level epidemiological model for a target population of adults aged 30 and over with any duration of diabetes. UKPDS-OM2 integrates separate risk equations for eight diabetes-related complications and death. Figure 1 shows the sequence of modelling events. The inputs to the model are individual patient characteristics (demographic, clinical risk factors and complication history); outputs include annual incidence of death or complications (MI, stroke, IHD, CHF, amputation, blindness, renal failure and ulcer), including second events for MI, stroke and amputation, life expectancy and quality-adjusted life years (QALYs). In each annual cycle the probability of death or of experiencing one or more complications is calculated for each patient according to the appropriate risk equation. Each probability is compared with a random number drawn from a uniform distribution ranging from zero to one to determine whether an event actually occurs for this patient. Equations for complications are executed in random order and if an event is predicted to occur in a given cycle it will inform the remaining set of equations still to be estimated in the same cycle. Probability of death is based on whether complications have occurred, and which complications have occurred in the current annual cycle. If the model predicts that an individual dies, their total years lived and QALYs are calculated and the individual exits the simulation; if the individual survives that cycle, the age, years of diabetes, clinical risk factor values and event histories are updated and carried forward to the next annual cycle. Clinical risk factors can either be updated from a known patient-level dataset or be modelled using risk factor time-path equations. Quality of life decrements based on the complications experienced can also be calculated; details of these decrements and new time-path equations will be reported elsewhere.

Fig. 1
figure 1

Flowchart showing structure of simulation model

Derivation of risk model equations

Study subjects and measurement of outcomes

Model equations were based on patient-level data for the 5,102 UKPDS participants with newly diagnosed type 2 diabetes mellitus, aged 25–65 years, recruited between 1977 and 1991 [10]. These patients were followed until the trial concluded in 1997. All 4,031 surviving participants entered the 10 year PTM observational study [9], during which time they returned to their community or hospital-based diabetes care providers, with no attempt made to maintain their previous allocated trial regimen. All patients provided written informed consent. Approval was obtained from the ethics committees at all 23 clinical centres, and the study conformed to the Declaration of Helsinki guidelines.

During the main trial, patients were seen three or four times each year in UKPDS clinics. During PTM, patients were seen annually for 5 years in UKPDS clinics, with continued standardised collection of outcome data plus clinical examination every 3 years. In years 6–10, patient and general practitioner questionnaires were used to follow patients remotely, since funding for clinic visits was not available. The vital status of all patients who were still living in the UK was obtained from the Office for National Statistics.

Outcomes were adjudicated exactly as in the original trial, by the UKPDS endpoint committee, which was blinded to study groups. The definition of the outcomes used in the UKPDS-OM2 match adjudicated trial endpoints and the original UKPDS-OM1 outcomes, except for vascular cardiac events, where CHF and other IHD now include both fatal and non-fatal events. Additional outcomes of diabetic ulcer of the lower limb, and second events for MI, stroke and amputation, were derived. Definitions of outcomes by ICD-9 are detailed in electronic supplementary material (ESM) Table 1.

Clinical risk factors

We used a set of clinical risk factors as candidate predictor variables that were similar to those used in UKPDS-OM1 (i.e. systolic blood pressure [SBP], HbA1c, lipids) but with the following modifications: HDL and LDL cholesterol were included separately; BMI, peripheral vascular disease (PVD) and atrial fibrillation used updated rather than just baseline values. We also included risk factors shown in recent studies to be potentially predictive of diabetic complications: micro- or macroalbuminuria [11], estimated GFR (eGFR) [12], heart rate [13] and white blood cell count [14]. Haemoglobin was also included, as it has been shown to be an independent predictor of mortality in patients with CHF [15].

Statistical analysis

Risk equations for first occurrence of eight diabetes complications and three additional second event equations for MI, stroke and amputation were developed. Multivariate semi-parametric proportional hazards survival models were derived with time to event determined in continuous time from onset of diabetes, using the censor date of death or the date of last contact with the patient. The set of candidate covariates for each equation included time invariant factors (e.g. sex, age at diagnosis of diabetes), time varying clinical risk factors (e.g. HbA1c and SBP) and time varying comorbidities (e.g. history of stroke). One year lagged values were used for clinical risk factors to avoid possible confounding of risk factors measured post complication. A full description of risk factors covariates is presented in ESM Table 2.

Risk equations for all-cause mortality were developed to take account of patients’ complication status in different years. These included logistic models to capture the high mortality in the year of a complication, and Gompertz proportional hazards survival models for years in which there were no complications. Thus, in any patient-year, only one of the four mutually exclusive equations for prediction of absolute annual risk of mortality would be used. Preliminary analysis showed that all complications except blindness and ulcer were associated with mortality in the current year (p < 0.05). Hence, logistic regression models were used to estimate the probability of death in the year of any MI, stroke, amputation (first or second), CHF, IHD or renal failure. As a result of testing to best fit and to maximise transparency, we derived two separate logistic equations for patients with and without a history of complications.

The two remaining equations for death are multivariate Gompertz proportional hazards survival models to estimate the hazard of death in years without any of the complications defined above. Time to death was determined in continuous time, with time at risk modelled by a patient’s current age in order to allow extrapolation beyond the observed follow-up period [16]. The censor date for deaths was 30 September 2007, the date of linkage to the national mortality database from the Office for National Statistics, or the date of emigration from the UK, which represents date lost to follow-up in the national statistics.

Proportional hazards models for complications and death were derived using a consistent process to select significant covariates from the candidate risk predictors. First, binary covariates were excluded from particular event equations if cross-tabulations indicated that they occurred rarely (fewer than ten occurrences). Then a multivariate model was fitted with all remaining covariates, and any not significant at p > 0.3 were dropped. The significant covariates of the final risk model were selected in a backwards stepwise regression at p < 0.05. The parametric form of the underlying hazard was examined graphically and models were chosen by consideration of Akaike’s information criterion for exponential, Weibull and Gompertz parametric forms. The proportional hazards assumption was tested by examination of Schoenfeld residuals [17] in comparable Cox models and through Cox–Snell semi-log plots. If the effect of any covariate was identified as non-linear, it was modelled either as a categorical variable or as a continuous spline function with suitable knot points. We specifically investigated any U-shaped HbA1c effect [18] using continuous splines. All analyses were carried out using Stata version 12.0 software (Stata, College Station, TX, USA).

Handling uncertainty and heterogeneity

Modelled outcomes are subject to several sources of uncertainty, which are important to report [19]. Two forms of uncertainty are addressed within UKPDS-OM2: (1) Monte Carlo or ‘first order’ uncertainty arises as a result of comparing probabilities from risk equations against a random number to determine whether events take place at a patient level. Thus, in any model cycle, two identical patients may have different outcomes due to chance. We minimise this uncertainty by using large numbers of Monte Carlo replications until the mean of the outcome of interest is stable: (2) Parameter or ‘second order’ uncertainty in the estimated coefficients of the equations arises as a result of natural variation in the patient sample and limitations in the sample size for deriving the equations. Model parameters cannot be known with certainty but only within a certain parameter distribution. We captured parameter uncertainty by bootstrapping (with replacement) the UKPDS patient-level data and re-estimating all equations to derive sets of fully correlated regression coefficients. Parameter uncertainty was then propagated by using, in turn, these sets of regression coefficients to estimate different outcomes, thus providing a distribution from which CIs can be derived. This approach conforms to the American Diabetes Association guidelines on computer simulation modelling in diabetes [20].

Patient heterogeneity is reflected through individual patient-level simulation, where it is possible to simulate whole populations, one patient at a time, and aggregate their outcomes. Each individual has a unique set of risk factors for estimation of their probability of events. Simulations presented in this manuscript use real data on 5,102 (UKPDS) and 3,984 (Lipids in Diabetes Study [LDS]) unique patients.

Internal validation of the model using the UKPDS trial population

Internal validation is a necessary step in the development of a model, providing confidence that model equations have been correctly specified and coded [8]. We carried out internal validation of the simulation model by testing its performance in replicating the incidence of complications and mortality over 25 years of follow-up. This involved using the observed clinical risk factor profiles of all 5,102 UKPDS patients over 25 years, with risk factors carried forward when missing or at the end of follow-up. We compared simulated cumulative failure of each of the major outcomes of the model with the observed (Kaplan–Meier) cumulative failure of events under the assumption adopted in many clinical studies that death as well as date of last contact are censoring events.

External validation, which tests model output against independent data, is beyond the scope of this manuscript and will be fully addressed in future publications. We present, instead, simulated outcomes using an external patient-level dataset as inputs to check on the consistency and face validity of the model.

Comparisons of outcomes from UKPDS-OM1 and UKPDS-OM2 simulations

We compared UKPDS-OM1 and UKPDS-OM2 predictions using as model inputs data on 3,984 patients with non-missing risk factors from a contemporary external dataset, the LDS [21]. We used both models to predict 10 year cumulative event rates and remaining life expectancy for selected age groups. There are no observed event rates for the LDS, due to the study being stopped early. Given the illustrative nature of these applications we assumed clinical risk factors to remain constant over the 10 years and did not apply a discount rate. For comparisons of life expectancy, we ran both models to age 100 for selected age groups: 50–54 years, 60–64 years and 70–74 years. Point estimates of life expectancy were derived from 1,000 Monte Carlo replications and 95% CIs were determined from non-parametric bootstrapping.

Sensitivity analysis

We carried out one-way deterministic sensitivity analysis to increase understanding of the relationship between model inputs and outputs [20, 22] and to determine the relative importance of patient characteristics in driving aggregate outcomes of life expectancy. Using patient-level data from the LDS as inputs, we investigated the impact on remaining life expectancy of individually changing continuous risk factors by ±1 SD of the mean and of doubling and halving the rates of binary variables such as smoking.

Results

Risk equations

The median follow-up time of patients was 17.6 years and up to 89,760 patient-years of data were available for model fitting. Numbers of events and average event rates observed during the UKPDS trial and PTM are shown in Table 1. For each outcome we had at least twice the number of events as used in the estimation of the UKPDS-OM1 risk equations (e.g. 504 stroke events compared with 157 in UKPDS-OM1).

Table 1 Number of events and average event rates observed during UKPDS and PTM for 5,102 participants with newly diagnosed type 2 diabetes mellitus

We observed many linkages between events (e.g. having a history of IHD increases the probability of having an MI), shown schematically in ESM Fig. 1. The new model has more linkages between equations: in UKPDS-OM1 there were only five linkages across seven event equations, whilst in UKPDS-OM2 there are 15 linkages between the same seven equations (ESM Table 3).

In general, there were more significant covariates in the new set of event equations. Comparing the seven common event equations across both models, UKPDS-OM1 equations had approximately five, whereas UKPDS-OM2 equations have a mean of 11 covariates per equation (ESM Table 3). The new risk factors such as eGFR and micro- or macroalbuminuria were associated with a number of outcomes (ESM Fig. 1), including several types of vascular events (e.g. MI). White blood cell count, an indicator of inflammation, was also associated with a wide range of complications (MI, stroke, blindness, amputation and renal failure). A description of the risk equation covariates, including units, transformations and interpretation of hazard ratios, is presented in ESM Table 2. All fully specified risk equations, including constants, significant coefficients and standard errors are provided in ESM Tables 46, and worked examples of how to calculate the absolute risk of an event occurring are in the ESM text.

A diagram summarising the four mutually exclusive death equations and how they relate to patients’ complications status in different years is presented in ESM Fig. 2. Smoking was a significant predictor in three of the four mortality equations, but the classic risk factors HbA1c and SBP were not independently significant predictors of mortality.

Internal validation

The results of the internal validation are presented in Fig. 2. These graphs compare the simulated and actual cumulative failure (calculated from 1 minus Kaplan–Meier survival) for each diabetes complication and death up to 25 years after diagnosis of diabetes. The model used actual time-paths of clinical risk factors observed in the UKPDS data, but event histories were simulated by the model. The predicted curves were within the 95% CIs of the actual cumulative failure curves for all events and death. For these simulations with an input dataset of 5,102 patients, the predictions stabilised at around 200 Monte Carlo replications.

Fig. 2
figure 2

Comparison of simulated (red squares) and actual (black lines) Kaplan–Meier cumulative failure (CF) and 95% CI (grey lines) of diabetes complications and death over 25 years. (a) First MI; (b) first stroke; (c) CHF; (d) IHD; (e) first amputation; (f) renal failure; (g) blindness; (h) ulcer; (i) second MI; (j) second stroke; (k) second amputation; (l) death from any cause. Graphs (il) have different y-axis scales. Large CIs for second events reflect the low numbers of patients at risk, particularly in the early years of diabetes. Ulcer data only available to 20 years

Comparison of UKPDS-OM1 and UKPDS-OM2

Estimates of 10 year event rates for the entire LDS cohort (n = 3,984) and for three selected age cohorts are shown in Table 2. For the cohort as a whole, fewer events were predicted by UKPDS-OM2 than by UKPDS-OM1 (e.g. UKPDS-OM2 predicted that 10% have MI compared with 21% predicted by UKPDS-OM1), with the differences most pronounced in the 70–74 year age group for all events. Predictions of all-cause mortality were also lower at 22.5% from UKPDS-OM2 compared with 31.6% from UKPDS-OM1. The new model predicted less than half the number of renal failure events predicted by UKPDS-OM1 (0.5% vs 1.3%, respectively), but stroke, blindness and amputation were similar in both versions of the outcomes model.

Table 2 Comparisons of simulated percentagea of patients with events at 10 years from UKPDS-OM1 and UKPDS-OM2

Consistent with lower predictions of all-cause mortality, life expectancy predictions from UKPDS-OM2 were greater than from UKPDS-OM1 for each age group examined (Table 3). For example, remaining life expectancy for the 50-to 54-year-old cohort was predicted at 25 years from UKPDS-OM2 compared with only 20 years from UKPDS-OM1. Notably, the 95% CIs of life expectancy predictions were narrower from UKPDS-OM2.

Table 3 Simulated life expectancy (95% CI) for three age cohorts using patient-level data from the LDS cohort

Sensitivity analysis

A tornado plot of the results of the sensitivity analysis (Fig. 3) shows the relative importance of the classic risk factors (SBP, HbA1c and lipids) in predicting life expectancy. It also shows the importance of many of the novel risk factors, in particular eGFR, micro- or macroalbuminuria, heart rate and white blood cell count. By contrast, haemoglobin and atrial fibrillation were significant in few equations and had less impact on aggregate model outcomes.

Fig. 3
figure 3

Tornado plot showing one-way sensitivity analysis of change in life expectancy of a cohort of 3,984 LDS patients arising from +1 SD (grey) and −1 SD (white) change in continuous clinical risk factors and a doubling and halving of binary risk factors. To convert values for HbA1c in % into mmol/mol, subtract 2.15 and multiply by 10.929 or use the conversion calculator at www.hba1c.nu/eng/

Discussion

We have developed a substantially enhanced UKPDS outcomes model that uses an additional 38,000 patient-years of observational data (primarily from the PTM period), almost doubling the follow-up time used to develop the original model. During the extra follow-up many additional complications were observed, including second events, as the patients were older and had a longer duration of diabetes; this was also possibly because the patients were no longer participating in a clinical trial. The new outcomes model has a number of important enhancements: re-estimation of the event equations to include many additional risk factors; inclusion of an additional outcome of lower extremity ulcer; prediction of second events for MI, stroke and amputation; inclusion of additional linkages between complications; and substantial changes to modelling mortality. The greater number of linkages and the greater number of significant clinical risk factors in the equations compared with UKPDS-OM1 reflects the greater statistical power of a much larger dataset. For example, in the original model, being diagnosed with IHD elevated the subsequent risk of MI; but, in UKPDS-OM2, IHD also elevated the risk of stroke and blindness. We note that these interrelationships between complications may be due to other common factors not currently captured in the model.

Internal validation demonstrated a high degree of consistency between simulated and observed events over a long time period. This reliable representation of the epidemiology of diabetes complications is pertinent to future use of the model in cost-effectiveness analyses, as outcomes are usually modelled over a lifetime horizon. Sensitivity analysis confirmed the importance of the classic risk factors in driving model outcomes, but also demonstrated the importance of the new risk factors eGFR, micro- and macroalbuminuria, heart rate and white blood cells. The relative importance of clinical risk factors in predicting life expectancy depends on the number of risk equations in which they are significant and their associated hazard ratios.

In the head-to-head comparison of both simulation models, UKPDS-OM2 predicted fewer macrovascular events and higher survival over a 10 year period. There are a number of explanations. UKPDS-OM1 equations were derived from shorter durations of diabetes (up to 10 years) and represent greater out-of-sample extrapolation when evaluated over longer durations of diabetes. Also, the simplifying assumption in this example, of clinical risk factors remaining at baseline levels, confers additional reductions in risk for UKPDS-OM2 that are not captured in UKPDS-OM1, but the models diverge in their predictions in year 1 when these assumptions are not made (ESM Table 7). Finally, there were some changes in definitions in some endpoints (such as the removal of vascular death from MI) that limited the degree to which results can be directly compared. Life expectancy projections were also longer using UKPDS-OM2 for all age cohorts. These are consistent with downward secular trends in cardiovascular disease and improvements in mortality and are in line with previously reported estimates of the reduction in life expectancy due to diabetes [23].

The major strength of our model is that it is based on data from the longest follow-up study of patients with type 2 diabetes, including both clinical trial and observational data in patients with a long duration of diabetes who were considerably older than usual clinical trial participants (up to age 90 years). The explicit modelling of second events will enable the model to be used more confidently in secondary prevention for patients with already complicated diabetes [7]. This comprehensive model of type 2 diabetes mellitus incorporates detailed modelling at a patient level and has the capacity to inform individualised medicine and analyses for patient subgroups.

There are a number of limitations to the simulation model in its current form. While the range of modelled outcomes has been extended, complications such as hyper- and hypoglycaemic episodes are not included. These were collected during the UKPDS, but not as true frequencies beyond a small number of deaths from hypoglycaemia. By expanding the number of outcomes and the number of input risk factors, the model has increased in complexity, but in many ways this reflects the nature of a disease that is characterised by so many different complications, the occurrence of which often is determined by interrelationships between risk factors and the patient’s clinical history. Finally, as we have used stepwise regression it would be useful to explore the relationship between risk factors and outcomes in other populations to test further the associations observed here.

There are a number of areas requiring further development. In its present form, the model requires individual patient time-paths of clinical risk factors or assumptions regarding the time-paths of baseline risk factors. We are currently developing models to predict these time-paths, which can be easily integrated as additional sub-models to reflect diabetes management practices in the populations of interest. We also need to demonstrate external validity and the applicability of the model to other populations such as those in South East Asia, which are known to have a different profile of complications [24]. Further comparisons with other diabetes simulation models and stand-alone equations such as the UKPDS risk engine [25] will allow us to assess whether the new outcomes model has improved predictions in different diabetic populations. Finally, the use of this new outcomes model for cost-effectiveness analysis will require derivation of QALY weights for events including ulcer and second events, and estimation of costs associated with complications. These enhancements will be addressed in future work.

A common criticism of many computer simulations is that they are a ‘black box’ with users having little understanding of the underlying relationships between input values and outcomes of the model. By contrast, the UKPDS-OM2 takes a completely transparent approach [6, 26], in which we have fully reported its development, the equations that determine all outcomes and the algorithm used to bring the elements of the model together.

The model will contribute to a greater understanding of the progression of diabetes and its complications and is likely to be used widely by epidemiologists, health economists and trialists. It will play a major role in comparative effectiveness, in cost-effectiveness analyses and in the evaluation of strategies for the management of diabetes in the future.