Development of an enhanced scoring system to predict ICU readmission or in-hospital death within 24 hours using routine patient data from two NHS Foundation Trusts

Rationale Intensive care units (ICUs) admit the most severely ill patients. Once these patients are discharged from the ICU to a step-down ward, they continue to have their vital signs monitored by nursing staff, with Early Warning Score (EWS) systems being used to identify those at risk of deterioration. Objectives We report the development and validation of an enhanced continuous scoring system for predicting adverse events, which combines vital signs measured routinely on acute care wards (as used by most EWS systems) with a risk score of a future adverse event calculated on discharge from the ICU. Design A modified Delphi process identified candidate variables commonly available in electronic records as the basis for a ‘static’ score of the patient’s condition immediately after discharge from the ICU. L1-regularised logistic regression was used to estimate the in-hospital risk of future adverse event. We then constructed a model of physiological normality using vital sign data from the day of hospital discharge. This is combined with the static score and used continuously to quantify and update the patient’s risk of deterioration throughout their hospital stay. Setting Data from two National Health Service Foundation Trusts (UK) were used to develop and (externally) validate the model. Participants A total of 12 394 vital sign measurements were acquired from 273 patients after ICU discharge for the development set, and 4831 from 136 patients in the validation cohort. Results Outcome validation of our model yielded an area under the receiver operating characteristic curve of 0.724 for predicting ICU readmission or in-hospital death within 24 hours. It showed an improved performance with respect to other competitive risk scoring systems, including the National EWS (0.653). Conclusions We showed that a scoring system incorporating data from a patient’s stay in the ICU has better performance than commonly used EWS systems based on vital signs alone. Trial registration number ISRCTN32008295.


ICU-based feature representation:
To estimate the risk of a patient experiencing an adverse event immediately after discharge from ICU, we considered candidate variables that were available electronically in our databases.The  The list also included demographic and administrative-based variables, such as age at admission, gender, number of hours between hospital admission and ICU admission, and the Index of Multiple Deprivation score (1), derived for patients with a valid postcode.We included variables related to procedures and treatments, such as the number of vasoactive drugs administered, total fluid balance, administration of insulin, enteral and parenteral nutrition feeding, and the use of mechanical ventilation, tracheostomy and central venous catheters.
To determine the risk of future compound outcome after discharge from the ICU, we derived several features from all candidate variables acquired during the patients' stay in the ICU.These features are based on the extremes of the variables considered.We generated maximum, minimum and variability (as given by the standard deviation) features for the physiological variables and laboratory test results from different periods of the ICU stay (including the first 24 hours of the patient in the ICU, their last 72, 48, and 24 hours, and/or their entire ICU stay).Additional dichotomous variables were generated if the values were in the upper or lower 5 th percentile of the observed corresponding ranges, in order to account for potential non-linear associations of the variables with the adverse outcomes.Procedure and treatment variables were converted to dichotomous features for indicating whether a given procedure was performed or not, or whether a given medication was administered or not, over the entire ICU stay or in the last 24 hours of the ICU stay.This procedure resulted in 161 candidate features (including features from demographic information) for building a prediction model.

Post-ICU feature representation:
All vital-sign observations performed after discharge from ICU, as part of routine patient monitoring on acute care wards and collected prospectively for this study, were considered for analysis.Each set of vital signs include heart rate, systolic blood pressure, respiratory rate, body temperature, neurological status assessment using the Alert-Verbal-Painful-Unresponsive (AVPU) scale, peripheral oxygen saturation from pulse oximetry (SpO2), a record of whether the patient was receiving supplemental oxygen at the time of the SpO2 measurement, and the date and time of the observation.Vital-sign measurements are typically recorded every 4 or 6 hours throughout the patient's stay on the ward.At each measurement timestamp, in case of an incomplete vital-sign observation set, we used the most recent value of each variable (i.e., by carrying forward the last measurement).

Pre-processing and missing data
We identified obvious deviations from expected distributions and ranges of the data features using frequency graphs for all numerical and dichotomous features.Possible physiological ranges for the numerical features were defined according to clinical review and expert panel expertise, and values outside these ranges were not included in the analyses.Missing values were imputed with median and mode values from the feature distributions of numerical and dichotomous features, respectively.While other methods were considered, such as multiple imputation, the use of the median and mode was simpler and was deemed sufficient for this work in which the amount of missing data was low.
All numerical features were then standardised using a zero-mean, unit-variance transformation (i.e., using the mean and standard deviations from the feature distributions).This prevents features with relatively small changes in their units of measurement (such as temperature) from being dominated by features with relatively large changes (such as blood pressure), thus ensuring that all features have similar dynamic ranges.
For both imputation and normalisation, the parameters' values (for the median, mode, mean and standard deviation) found for the development dataset were used.

ICU-based model (RS1)
We used all candidate features derived from the variables acquired during the patients' stay in the ICU to build the first model.A L1-regularised logistic regression using the "glmnet" package in Matlab (Mathworks, Natick, Massachusetts, USA) for predicting the compound outcome (in-hospital death or re-admission to ICU).L1regularisation shrinks the less important features' coefficients to zero thus effectively removing those features that are deemed to be uninformative to the outcome variable.We estimated the regularisation parameter using LASSO (Least Absolute Shrinkage and Selection Operator) regression in a 5-fold cross-validation on the development set.

Vital-sign based model (RS2)
For the second scoring system, RS2, we used the vital signs (heart rate, respiratory rate, systolic blood pressure, oxygen saturation, and temperature) recorded in hospital after discharge from the ICU.For this model, rather than a supervised learning approach, we considered a novelty detection method (or one-class classification method), which does not require an outcome variable for development (2).This is useful where event rates are extremely low, or total sample sizes are constrained.Similar approaches have been used in our previous research (3)(4)(5)(6) and underpin a commercial, clinically-used system, described in (7).
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) We considered the construction of a multivariate, (possibly) multimodal model of normality, based on the vital-sign observation sets recorded during the 24 hours that preceded discharge home.If the patient had not died, was not re-admitted to the ICU and was not discharged by the 14 th post-ICU day, we used recorded data on that day.These observation sets are thus assumed to contain the vital-sign values from the most stable period of the patient's hospital stay, because these data were acquired when patients are at their most "stable" after discharge from ICU.This set of "normal" data contains  = 1,082 5-dimensional vital-sign vectors,  = {  , … ,   } ∈ ℝ 5 , which were subsequently used for the construction of our model of normality.
A kernel density estimate, or KDE (2) was used to estimate the probability density function (pdf) of the set of five vital signs.This is in a non-parametric technique where no a priori assumptions about the form of the underlying probability distribution are made.Our notation follows that reported previously (5,6).The data pdf () was estimated using the  = 1,082 set of observations as shown in (SM1-1).
The nearest-neighbour method was used to estimate the variance.Briefly, this method involves determining the squared Euclidean distance (∆) for each observation  to its 10 nearest neighbours (NNs), as shown in (SM1-2).
This quantity, ∆, is then used to estimate the variance  2 , as shown in (SM1-3).
Estimation of the underlying pdf of normal vital-sign data provides a means of quantifying the degree to which a given set of observations is abnormal.The likelihood (|  , ), a measure which represents the probability of observing a set of measurements given a pdf, can be used for this purpose.Thus, we define the novelty score as in (SM1-4).

𝑧(𝐱) = − log 𝑝(𝐱|𝐱 𝑖 , 𝜎) (SM1-4)
For normal data, the new observation  will be similar to previously-seen normal observations   , and so the likelihood will be high.Consequently, the negative loglikelihood will be low, and so the novelty score () will be low.Conversely, for abnormal data, the data will be dissimilar and the likelihood will be low, and consequently the novelty score () will be high.This procedure ensures that the novelty score can be interpreted as most early warning scores (EWS), in that high scores are associated with higher abnormality of the vital signs.
In short, this model allows the generation of a score corresponding to an assessment of whether a vital-sign set should be deemed "stable" with respect to the development dataset (i.e., the observation sets used for training the model).By extension, it further allows the model to estimate the degree of abnormality of a given vital-sign observation set.

Risk Score Index (RSI)
An overall risk score, the risk score index (RSI), was subsequently determined using a simple time-dependent linear combination of the two constituent risk scores, such that: (Equation 1) where  corresponds to the elapsed time (in hours) since the patient was discharged from ICU and has a maximum value of  hours.The time-dependent weighting function allows the contribution of RS1 to gradually decrease with time after discharge from the ICU.The patient's risk of future adverse events becomes increasingly determined by the values of the vital-sign measurements taken on the acute ward (i.e., RS2).The parameter  is used to adjust the weight of RS1 with respect to the time since discharge from ICU, and the parameter  corresponds to the time at which RS1 no longer affects RSI; i.e., when  ≥ , then RSI = RS 2 .
The parameters  and  were determined using a grid-search algorithm with 3fold cross-validation, by defining a grid of possible values of  = [0.1,0.2, 0.3, … , 200] and  = [12, 24, 36, … , 336].We selected the values of  and  that corresponded to the highest mean area under the curve of the receiver operating characteristic (AUROC) value across all cross-validation folds, using the compound outcome of in-hospital death or ICU re-admission within the next 24 hours of a vital-sign observation.
During the development of RSI, values of  = 100.2and  = 96 hours, for the linear time-dependent weighting function (see Equation 1), were obtained and used to calculate the risk scoring index.

Model validation and statistical analysis
To validate the first model, RS1, its discrimination and calibration were analysed.Discrimination is defined as the ability of the model to separate non-event patients from patients who had an adverse event after ICU discharge, and was assessed using the AUROC metric.Calibration assesses the degree of correspondence between the estimated probability of occurrence of an adverse event and that actually observed.This was tested using a goodness of fit test, the Hosmer-Lemeshow "C" statistic (8).When the predicted probability of adverse events of the prognostic model differs significantly from the observed pattern, the calibration ability of the model is deemed to be poor.As the Hosmer-Lemeshow test does not measure the magnitude of miscalibration and is sensitive to sample size (9), calibration was also assessed with the Brier score and Cox's calibration regression.
The latter assesses the degree of miscalibration fitting a logistic regression of observed survival to the predicted log-odds of survival from the risk model (10).The performance of the first model was examined both for the compound outcome and each adverse event (in-hospital death and ICU re-admission) individually.The ability of the first model to predict future adverse events at increasing intervals from ICU discharge was also examined by calculating the AUROC for future events by day after discharge (up to 120 days).
The final model, RSI, was validated using the AUROC for the derived outcome of inhospital death or ICU re-admission within the next N hours of a vital-sign observation/measurement performed after ICU discharge, in line with previous studies for evaluating EWS systems (11,12).We evaluated the model for different values of N, with  = [12,24,36,48,72] hours.We note that in this case the AUROC represents how well the scoring system RSI discriminates between observation sets followed by an adverse event and those with no subsequent adverse outcome within the next N hours.Therefore, the unit of analysis is a vital-sign set rather than a patient-admission, as performed for the validation of the first model.
We also considered each individual adverse event separately.To further understand the feasibility of implementation of the risk scoring systems in this setting, the burden of observation sets "triggered" for every correctly identified observation followed by an adverse event within 24 hours by the risk scoring system was also evaluated.
We report the cross-validation results using the development dataset.This gives an estimate of how our models perform on a random set of samples from the OUH Trust that were not used for developing the model.We also report the external validation results using data from the RBH Trust.Confidence intervals were estimated using bootstrap confidence intervals via percentiles with 500 samples (13).
final list of candidate variables included the following physiological variables: o Heart rate (HR), measured in beats per minute o Respiratory rate (RR), in breaths per minute o Systolic (SBP), diastolic and mean blood pressure, in mmHg o Temperature, in degrees Celsius o Peripheral oxygen saturation (SpO2), in % o Level of consciousness, using the Glasgow Coma Scale (GCS) The following laboratory test results were included in the list: o Alveolar-arterial oxygen partial pressure gradient (AaDO2), in kPa o Albumin, in g/L o Bilirubin (total), in mol/L o Calcium (adjusted), in mmol/L o Creatinine, in mmol/L o C-reactive protein (CRP), in mg/L o Haematocrit (HCT), in % o Haemoglobin (HGB), in g/dL o Lactate, in mmol/L o Mean corpuscular volume (MCV), in fL o Ratio of partial pressure of oxygen : fraction of inspired oxygen (PaO2/FiO2), in kPa o pH o Platelet count, in 10 9 cells/L o Potassium, in mmol/L o Sodium, in mmol/L o Urea, measured in mmol/L o Urine output, in mL o White blood cell count (WBC), in 10 9 cells/L BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance