Article Text

Download PDFPDF

Systematic review of prediction models for delirium in the older adult inpatient
  1. Heidi Lindroth1,2,
  2. Lisa Bratzke3,
  3. Suzanne Purvis4,
  4. Roger Brown3,
  5. Mark Coburn5,
  6. Marko Mrkobrada6,
  7. Matthew T V Chan7,
  8. Daniel H J Davis8,
  9. Pratik Pandharipande9,
  10. Cynthia M Carlsson1,14,15,16,17,
  11. Robert D Sanders1
  1. 1 Department of Anesthesiology, University of Wisconsin Madison School of Medicine and Public Health, Madison, Wisconsin, USA
  2. 2 School of Nursing, University of Wisconsin Madison, Madison, Wisconsin, USA
  3. 3 School of Nursing, University of Wisconsin-Madison, Madison, Wisconsin, USA
  4. 4 Department of Nursing, University Hospital, Madison, Wisconsin, USA
  5. 5 Department of Anesthesiology, University Hospital RWTH Aachen, Aachen, Germany
  6. 6 Department of Medicine, Western University, London, Ontario, Canada
  7. 7 Anesthesia and Intensive Care, The Chinese University of Hong Kong, Shatin, Hong Kong
  8. 8 MRC Unit for Lifelong Health and Ageing, University College London, London, UK
  9. 9 Division of Anesthesiology Critical Care Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
  10. 14 Department of Medicine, Division of Geriatrics, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA
  11. 15 Geriatric Research, Education, and Clinical Center (GRECC), William S. Middleton Memorial Veterans Hospital, Madison, Wisconsin, USA
  12. 16 Wisconsin Alzheimer’s Disease Research Center, Madison, Wisconsin, USA
  13. 17 Wisconsin Alzheimer’s Institute, Madison, Wisconsin, USA
  1. Correspondence to Heidi Lindroth; hlindroth{at}


Objective To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population.

Design Systematic review.

Data sources and methods PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. Inclusion criteria: age >60 years, inpatient, developed/validated a prognostic delirium prediction model. Exclusion criteria: alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author.

Results The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified.

Conclusions Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models.

  • delirium
  • geriatric medicine
  • statistic

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Statement and the CHARMS checklist to develop a protocol involving comprehensive search terms and databases.

  • The assembled interprofessional authorship team contributed different perspectives on delirium prediction models and statistical methodology.

  • This review focused on a narrow population and older adult inpatients and could be expanded to include all ages and settings including palliative care, long-term care and the emergency room.


Delirium is an acute disturbance of consciousness and cognition precipitated by an acute event such as sudden illness, infection or surgery. This syndrome is a serious public health concern, as up to 50% of hospitalised older adults will experience delirium in medical and surgical populations.1–3 Delirium has been independently associated with increased mortality, morbidity in terms of impaired cognition and functional disability along with an estimated annual US expenditure of $152 billion.4–9 Prediction models allow clinicians to forecast which individuals are at a higher risk for the development of a particular disease process and target specific interventions at the identified risk profile.10–13 At present, an extensive list of modifiable and non-modifiable, predisposing and precipitating delirium risk factors encumbers clinicians, hindering the ability to select the most important or contributing risk factor.1 14 An accurate and timely delirium prediction model would formalise the highest impact risk factors into a powerful tool, facilitating early implementation of prevention measures.11 This systematic review expands on previous published reviews on delirium prediction models by integrating both medical and surgical populations while examining statistical aspects of each study including reporting metrics and includes recently published models.


Our aim was to provide important recommendations on study design for future delirium prediction models while integrating knowledge gained from the study of both medical and surgical populations. We conducted a systematic review of the literature focusing on the identification and subsequent validity of existing prognostic delirium prediction models in the older adult (≥60 years old) acute hospital population.


This systematic review followed the protocol developed from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Statement and the CHARMS checklist (online supplementary appendix A).15 16 A delirium prediction model was defined as a statistical model that either stratified individuals for their level of delirium risk, or assigned a risk score to an individual based on the number and/or weighted value of predetermined modifiable and non-modifiable risk factors of delirium present. This review included studies focused on (1) older adult (≥60 years) population, (the US Centers for Disease Control and Prevention and United Nations define an older adult as 60 years of age and older),17 18 (2) inpatient hospital setting, (3) publication dates of 1 January 1990–31 December 2016 and (4) developed and/or validated delirium prediction models. Studies were excluded if they (1) studied a different patient population (ie, emergency department, skilled nursing facilities, palliative care and hospice) as these are unique patient populations with characteristics requiring specific foci and are not readily generalisable to a medical or surgical inpatient hospital setting. Furthermore, recommended therapies for treatment of delirium symptoms vary between the populations,19 20 (2) related to alcohol withdrawal, or delirium tremens, as the presence of alcohol withdrawal complicates delirium assessment and (3) had a sample size of ≤50 for methodological reasons (ie, underpowered). All study designs were included. Studies were not limited by time frame of delirium development (prevalent vs incident); however, only prognostic statistics were discussed.

Supplementary file 1

The search terms were as follows: (‘Delirium’ OR ‘postoperative delirium’ OR ‘ICU delirium’ OR ‘ICU psychosis’ OR ‘ICU syndrome’ OR ‘acute confusional state’ OR ‘acute brain dysfunction’) AND (‘inpatient’ OR ‘hospital*’ OR ‘postoperative’ OR surg* OR ‘critical care unit’ OR ‘intensive care unit’ OR CCU OR ICU) AND (‘predict*’ model OR risk*). Electronic databases of PubMed, CINAHL, PsycINFO, Cochrane Database of Systematic Reviews, SocINDEX, Web of Science and Embase were searched. Studies using a language other than English were included if translation was available through the University of Wisconsin-Madison Health Sciences Librarian. Bibliographies of identified studies were hand-searched for additional references. Study quality was assessed through the Newcastle-Ottawa Scale (NOS)21 for case–control and cohort studies. Risk of bias was assessed through the Critical Appraisal and Data Extraction for Systematic Reivews (CHARMS) checklist.15 Two authors (HL and SP) independently performed data collection, data extraction and assessed study quality, with any disagreement resolved by RDS.


Data extracted included: (1) study characteristics (study design, population and sample size), (2) outcome measure (method of identification and diagnosis, frequency and length of screening), (3) model performance information including the diagnostic accuracy of the delirium prediction models, calibration metrics and events per variable (EPVs), (4) characteristics of the models (variables used in model and scoring/stratification system), (5) cognitive measures used in the study and (6) statistical methods applied for analysis. Five authors were contacted for missing or incomplete data. Four responses were received.


Model performance was assessed through calibration and classification metrics.15 The AUROC was the primary measure collected to evaluate the discriminatory ability of the delirium prediction models. Clinical utility statistics such as sensitivity, specificity, positive predictive values, negative predictive values, ORs, relative risk statistics and use of decision curve analysis or clinical utility cure analysis were also collected from each delirium prediction model in reference to the model’s reported cut-off value. Goodness-of-fit statistics including χ2 and Hosmer-Lemeshow tests were collected to evaluate effective model calibration. Studies were also assessed for the inclusion of calibration plots and slopes. Model calibration refers to the agreement between observed outcomes and predictions.22 Secondary preplanned outcome measures included cognitive assessments and predictive variable use per model.

Role of the funding source

The funding sources named had no role in this study. All authors had full access to all the data in the study and shared responsibility for the decision to submit the publication.

Patient and public involvement

Neither patients nor the public were involved with the development or design of this study.


Twenty-seven studies were identified for inclusion.23–47 The initial search resulted in 7,502 citations, with 192 studies chosen for full-text review as detailed in the PRISMA diagram (figure 1). We did not identify any relevant, unpublished studies for this review. The inclusion criteria were modified for two studies that developed models in younger populations, but these models were externally validated in the target population of this review (age ≥60 years).25 40

Figure 1

PRISMA diagram: study selection. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Twenty-three delirium prediction models were developed, 14 were externally validated23 27 29–31 33–35 41 43–46 48 and three were internally validated.24 37 42 Prospective cohort design was used in 24 studies.23 25–31 33–35 37–49 Retrospective design was used in four studies.24 32 36 44 Nineteen studies used consecutive sampling methods,23 25–31 33 34 38 40–42 44 45 47–49 two of these were part of a randomised control trial.34 41 Eleven studies focused on the medical population,23 25 29–33 40 42 45 48 3 included medical and surgical24 43 44 and 13 recruited a surgical population (seven orthopaedic,26–28 34 38 41 49 one cardiac,46 two non-cardiac,37 47 one general surgery35 and two oncological36 39). None of the identified studies focused on critical care patients. Data collection occurred on admission in 17 studies23 25 27 29–31 33–35 40–45 48 49; participants were approached within 48 hours of admission. Seven studies collected data preoperatively then followed participants postoperatively.26 28 37–39 46 47 Data collection overlapped with delirium assessments in three studies.27 32 35 The average NOS quality ranking for included cohort studies was seven; six studies received the maximum of nine stars. Risk of bias was assessed using the CHARMS checklist,15 and results are shown in figure 2. Further characteristics of studies are listed in table 1.

Figure 2

This displays the CHARMS risk of bias assessment on all included studies. Study participants: design of included study, sampling method and inclusion/exclusion criteria. Predictors: definition, timing and measurement. Outcome: definition, timing and measurement. Sample size and missing data: number of participants in study, events per variable and missing data. Statistical analysis: selection of predictors, internal validation and type of external validation.

Table 1

Displays the 27 studies that were identified for inclusion in this review.

Delirium assessment

The outcome variable was measured using the Confusion Assessment Method in 21 studies.23 25–31 33–40 43 46–49 The frequency of delirium assessment varied from two or more assessments daily (3 studies),26 35 41 to once daily (12 studies),25 28 30 32 34 36–38 44–46 49 every other day (8 studies),23 27 29 31 33 42 43 48 once following surgery47 and undefined (3 studies).24 39 40 Of the studies that assessed delirium twice or more daily, all of these studies relied on ward nurse observations or telephone interview with the nurse to identify delirium symptoms.26 35 41 The principal investigator confirmed the presence of delirium following the nurse report of symptoms.26 35 Twenty-one studies used trained research or clinical personnel to conduct the delirium assessments.23 25–27 29–31 33–40 43–48 Three studies relied on delirium diagnosis, or keywords designated as representing delirium, to identify the outcome measure through retrospective chart review.24 32 44 Three studies relied on clinical staff to recognise and chart delirium symptoms.28 41 49 One of these studies retrospectively confirmed the diagnosis of delirium through consensus review of two authors; disagreement was resolved by a psychiatrist.41 One study did not report details on personnel performing delirium assessments.42

Model design and statistical methods

Various statistical techniques were employed by the 23 included studies. Twelve used univariate or bivariate analyses and selected variables with a predetermined statistical value (range from p<0.05 to p<0.25) for inclusion in the model.23–26 32 35–37 40 42 43 46 Five of these models paired bivariate analyses with a bootstrapping technique to address lower sample and event size.24 25 37 38 46 Four models based their variable selection from a literature review of risk factors for delirium.27 28 41 44 48 49 Two used proportional hazards regression modelling paired with bivariate analyses and included variables with either a p value <0.2532 or a relative risk of ≥1.50.30 Six studies published their power analysis.24 25 33 35 40 41 Sixteen studies employed a form of logistic regression. Twelve of these models applied a stepwise regression approach.23 25 26 29 30 35–37 42 43 46 47 Three applied a stepwise forward selection process,23 25 30 two employed a stepwise backward selection process35 46 and one used a combined approach.29 Statistical methods used for model building are further outlined in table 1.

Per TRIPOD reporting guidelines, validation studies were categorised into type; narrow validation refers to the same investigators subsequently collecting an additional patient cohort, following the development cohort, and broad validation refers to a validation cohort sampled from a different hospital or country.50–52 As interpretation of validation studies is dependent on case-mix,53 it is important to note that 8 of the 14 externally validated models are categorised as narrow validations.23 27 29–31 35 41 46 Further information is outlined in table 2.


Figure 3 demonstrates the frequency of variable use in the 14 externally validated delirium prediction models. Baseline cognitive impairment was the most frequently used variable. Six models defined baseline cognitive impairment as a cognitive test score at or below the level of dementia.27 30 34 43 48 This cognitive test was administered on study enrolment or extracted from past medical records.48 Two studies additionally evaluated chronic cognitive impairment through family or caregiver interview with the modified Blessed Dementia Rating Scale (mBDRS).30 31 Four models combined the cognitive test score derived on enrolment with a history of dementia to define baseline cognitive impairment.31 33 41 44 History of dementia was defined as follows: two studies: family or caregiver report supplemented with documented history in medical record,33 41 one study: medical record review and interview with mBDRS31 and one study: dementia billing codes or prescription information.44 One study defined baseline cognitive impairment as a prespecified key term in the electronic health.45 Table 2 details cognitive tests used in the externally validated delirium prediction models.

Figure 3

This displays the mean frequency of variable use in the 14 externally validated delirium prediction models. ‘(P)’ indicated a precipitating risk factor used in a delirium prediction model. The following variables were used twice and are not represented in the figure: BUN/Cr ratio (Blood Urea Nitrogen/Creatinine ratio), comorbidities, history of delirium, depression, medications (1: upon admission, 1: added during hospital stay), restraint use and malnutrition (1: altered albumin level, 1: malnutrition scale). The following variables were used once and are not represented in the figure: bladder catheter use, C reactive protein, emergency surgery, presence of fracture on admission, history of cerebrovascular accident, iatrogenic event, intensive care unit admission and open surgery.

Functional impairment was defined as follows: (1 study) needing assistance with any basic activities in daily living (ADL),27 (1 study) domestic help, help with meals or physical care41 and (2 studies) residence in nursing facility or at home with caregivers,33 and (2 studies) requiring a home care package with professional caregivers or residence in a care home.33 48 The latter being obtained on admission from medical records.33 48 Two studies used validated functional assessment tools (Instrumental Activites of Daily Living (iADL) and Barthel Index) and evaluated functional status 2 weeks prior to hospitalisation.23 31

Externally validated delirium prediction models are detailed in table 2.

Table 2

Detailed description of the externally validated DPMs.

Predictive ability

Reported AUROC in externally validated delirium prediction models ranged from 0.52 to 0.94 (figure 4). Of these models, the highest performing model (AUROC 0.94, 95% CI 0.91 to 0.97) was developed and validated in a surgical population.35 Two models reported an external validation AUROC above 0.80, indicating moderate predictive ability.33 48 Both were developed and validated in medical populations and share similarities with variable use including pre-existing cognitive impairment and presence of infection.

Figure 4

This shows the published AUROC statistic for the 14 externally validated delirium prediction models. #D/N: number of confirmed delirium in study/overall sample size. DPM: delirium prediction model name. The corresponding number of references the different AUROCs calculated based on different cognitive tests applied to the model by the authors. Squares with error bars: size of square corresponds to sample size of study. AUROC: reported area under the receiver curve statistic, 95% CIs.

Model calibration

Six of the 14 externally validated delirium prediction models reported calibration metrics.29–31 34 43 45 The reported χ2 statistics were significant in five prognostic models29–31 34 43 and did not reach significance in one model.45 Four of the 23 studies that developed models reported calibration statistics.32 37 40 42 None of the included studies reported calibration plots or slopes.

Risk of overfitting

EPVs were examined in each of the 14 externally validated models. Models estimating more parameters than events in a 1:10 ratio are at risk of statistical overfitting, potentially leading to overly optimistic model performance.22 54–57 In 14 models with external validation, four had fewer than optimum events for the number of parameters estimated in the development stage of the models.25 29 30 49 Five had fewer than optimum events in the external validation stage.23 29–31 45 Two models did not reach optimum events for the number of parameters in either the development or the external validation studies.29 30 Various statistical techniques such as shrinkage procedures, the use of lasso or penalised regression and internal validation methods are suggested to counter the effects of lower EPV.15 54 58 None of the identified studies report use of statistical shrinkage procedures. Five studies applied internal validation techniques in the development stage of their model to account for stability within their model.24 25 37 38 46

Clinical utility

Clinical utility of a prediction model may be evaluated through several different statistical metrics including ORs, relative risk, sensitivity and specificity, receiver operator curves, R2 and integrated discrimination improvement indices as well as the clinical utility curve statistic and the decision curve analysis.57 59 Six externally validated delirium prediction model studies reported ORs or relative risk statistics evaluating the highest risk stratification cut-off point.29–31 34 46 48 Seven studies reported sensitivity and specificity,23 27 33 35 41 43 48 and one study reported the rate of true positives and false positives.44 None of the identified studies reported decision curve analysis or clinical utility curve analysis. While the majority of studies selected variables that were either routinely used in practice or were feasible to administer, two studies developed delirium prediction models based on data routinely entered into the electronic health record to increase feasibility of use.24 44 Pendlebury et al adapted variable definition and use to match routine clinical assessment while externally validating four delirium prediction models and creating an additional risk stratification tool.33 48 Moerman et al reported feasibility and reliability statistics following the incorporation of the risk prediction tool into practice.41


This review identified moderate predictive ability (AUROC 0.52–0.94) in 14 externally validated delirium prediction models with 8 out of 14 models using narrow validation. However, three main limitations were identified. First, study design, application and reporting of statistical methods appear inadequate. Data collection overlapped with the initial diagnosis of delirium in the highest performing model as well as in two other included studies, likely exaggerating model performance.15 27 32 35 Low EPV combined with limited application of internal validation techniques contributed to an increased risk of bias and likely the creation of overly optimistic models.15 50–52 Second, broad variable definitions, particularly in functional and cognitive abilities, may have led to overlapping data capture. For example, Pendlebury et al demonstrated this possible effect in the development of the Susceptibility Score, model performance did not improve with the addition of functional impairment to a model that already included cognitive impairment and age.48 Lastly, assessment of the outcome variable, delirium, was largely non-systematic, once daily and avoided weekends. In the studies that assessed delirium more than once per day, the assessment was performed by routine clinical staff, decreasing consistency. This is a major limitation for an acute condition that fluctuates, may occur suddenly and is dependent on precise, objective assessment. While case-mix between populations may impact observed delirium rates, we believe it would be advantageous for future studies to incorporate systematic, frequent and consistent delirium assessments.

As delirium is a multifactorial syndrome representing an inter-relationship between premorbid and precipitating factors,29 the time course of data collection is important. Nine of the 14 externally validated delirium prediction models incorporate precipitating factors into their predictive model; two models29 31 are intentionally constructed in this manner. The inclusion of a precipitating factor into a premorbid delirium prediction model may provide important predictive power if designed in the appropriate manner, as demonstrated by Inouye et al.30 However, if variables are collected after the onset of delirium, this would exaggerate model performance (eg, ICU admission). As an example, one delirium prediction model has a robust AUROC of 0.94 (95% CI 0.91 to 0.97).35 This study excluded those with an MMSE <23 and prevalent delirium. Data collection occurred within the first 24 hours following surgery; however, delirium assessment began immediately after surgery, with a 50% delirium prevalence on the day of surgery. This overlap of data collection and delirium assessment likely exaggerated model performance for this outlier study. Seven externally validated models included data about the precipitating factor present on admission and either excluded those with prevalent delirium or calculated separate AUROCs for prevalent delirium versus incident delirium.23 30 33 44 48

Model underperformance may be explained by low powered studies, insufficient EPV as well as the use of univariate analyses and stepwise regression to select predictive variables for inclusion into models. Although these are common methods to use for model development and may counter the effects of insufficient EPV, each approach has significant drawbacks.60 Univariate analysis may reduce predictive ability by inclusion of variables that are not independent of each other, and stepwise regression disadvantages include conflation of p values and a biased estimation of coefficients.15 22 50 61 While EPV was originally adapted to ensure stability in regression covariates, it has been identified as an important component to predictive model stability and reproducibility due to the result of overfitting.15 50 62Ogundimu et al demonstrate this effect by simulating models with EPV of 2, 5, 10, 15, 20, 25 and 50. Stability of models increased as the EPV increased and models including predictors with low population prevalence required >20 EPV.63 The degree of model overfitting should be assessed through calibration statistics and forms of internal validation such as bootstrapping. Future studies should consider the use of statistical methods to counter low EPV including the application of statistical shrinkage techniques and penalised regression using ridge or lasso regression.15 22 56 60 64 Furthermore, future studies may benefit from the incorporation of advanced statistical techniques such as Bayesian Networks and machine learning that have shown to improve the performance of previous prediction models that were built using standard logistic regression.65 66 These methods facilitate the exploration of complex interactions between risk factors as well as adapt to changing patient conditions, allowing for a dynamic model.

Increasing age, pre-existing cognitive impairment and functional and sensory impairments were the most frequently used variables in the externally validated delirium prediction models. However, many studies employed different definition for these variables, making comparisons difficult between models and limiting generalisability across populations. Functional and physical impairments were broadly defined resulting in the inability to discern whether impairments resulted from truly physical origins or if the noted decrease in function was related to cognitive impairment leading to an overlap in data collection. Age may not be a relevant risk factor when considering an older cohort of patients; for example, a recent study found that global cognition may mediate the relationship between age and postoperative delirium67; therefore, the inclusion of age in a delirium prediction model may not add to the overall performance of the model if cognition is adequately captured or if only elderly patients are included in the study. This effect was demonstrated by Pendlebury et al, an improved AUROC resulted when age was removed from the prediction model (0.81 to 0.84).48 As the inclusion of age, functional, physical and cognitive impairments may result in an overlap of data collection, future models may want to explore variables that have not been frequently used in delirium prediction yet are highly predictive of mortality, surgical complications and depression. An example would be the self-rated health question. This is a single-item question evaluating an individual’s perception of their own health and has been found to be a significant predictor of subjective memory complaints, depression and mortality.68–74 Furthermore, this variable is feasible as it takes minimal time and no training. Incorporation of variables such as self-rated health may increase both predictive ability and feasibility, thus improving clinical utility.

The highest performing delirium prediction model excluded those with pre-existing cognitive impairment, did not incorporate a cognitive variable and used hearing impairment as a predictive variable (note the methodological concerns of this study were discussed above).35 Cognitive impairment was the most frequently used variable and is a known risk factor for delirium development.2 67 Prior research demonstrates individuals with mild cognitive impairment (MCI) are at a significantly higher risk of delirium development.75 76 All models used cut-off scores on cognitive tests that would indicate dementia, providing no evaluation of subtler cognitive decline such as MCI. Furthermore, Jones et al demonstrated a strong linear relationship between risk of delirium and all levels of cognitive function, even those considered unimpaired through formal testing.67 In this study, a general cognitive performance score was developed using a complex battery of neuropsychological tests. Unfortunately, the neuropsychological battery is too complex to be practical for the clinical setting. Fong et al found associations between baseline executive functioning, complex attention and semantic networks to be associated with subsequent delirium development.77 The inclusion of MCI, or simple cognitive tests as employed by Fong et al, as a variable may increase the detection and prevalence of cognitive impairment as a variable thus increasing its predictive power. Further exploration into isolated cognitive tests that are feasible to administer in a clinical setting as well as sensitive to the spectrum of cognitive impairment may enhance delirium prediction.

The clinical utility of a prediction model is dependent on both its efficacy at predicting those at risk and feasibility, hence both must be considered when building and validating a model. Clinical utility is compromised by efficacious models that are not feasible. Conversely, a feasible model that is not effective at identifying those at risk also lacks clinical utility. To this end, model derivation must focus on building an effective model. The next aspect that must be considered is the ability to enhance clinical care. Predicting individuals at high risk is clearly important, but to an experienced clinician, delirium may already be anticipated. Maximum value may be obtained by aiding in prediction of moderate risk patients, where the risk of delirium may be more ambiguous.

Strengths and weaknesses of this study

This systematic review benefitted from a prospectively developed protocol. A comprehensive literature search from multiple databases using broad search terms yielded 27 studies with 14 externally validated delirium prediction models. Our author team is interprofessional, providing the opportunity for different perspectives on model evaluation. Furthermore, this review synthesises evidence from both medical and surgical populations while providing statistical-based recommendations for study and model design for future delirium prediction model studies.

The limitations of this systematic review may be that articles focused on a younger population were not included. This limitation could narrow the generalisability of the results of this systematic review to the broader population; however, delirium predominantly affects older adults. Furthermore, this review is limited by population focus. We did not include prediction models built-in palliative care, long-term care facilities or the emergency department.

Strengths and weaknesses in relation to other studies

Past systematic reviews concluded that the identified delirium prediction models were largely heterogeneous in variable inclusion and were not sufficiently developed for incorporation into practice.78–80 Recommendations include further testing on existing delirium prediction models followed by integration in practice as well as further exploration into measurements that are feasible clinically. This review included eight models not previously identified in past systematic reviews of delirium prediction models. Furthermore, this review is the first to identify study and model design issues and discusses the paucity of measurements sensitive to the spectrum of cognitive impairment.

Implications and future research

Two avenues may be pursued for future studies. The first avenue involves model aggregation; currently available delirium prediction models would be combined into a meta-model through stacked regression in a new cohort of participants. This method would update currently published models to a new population, furthering generalisability and bolstering broad external validation.81 Variable definition could be harmonised in the meta-model with the intention to use variables that are readily available and feasible for routine practice. This method would further delirium prediction for those with dementia-level pre-existing cognitive impairment as well as examine the individual contributions of functional impairment due to physical conditions, cognitive impairment or age through model refitting. Nonetheless, a future meta-model would continue presently identified limitations such as exclusion of the spectrum of cognition. The second avenue should focus on the development and broad validation of delirium prediction models exploring the use of simple cognitive tests that would be inclusive to MCI and sensitive to the spectrum of cognition. Furthermore, future models should consider development of dynamic predictive models using advanced statistical methods such as Bayesian Networks, artificial intelligence and machine learning as these methods have shown to improve models built using standard logistic regression.66 82

We suggest the following broad principles for use in future studies: (1) delirium prediction models should be developed only using data available prior to the onset of delirium and likely should be focused in specific populations depending on whether the precipitating event has occurred or not; (2) should include structured, twice daily assessment (regardless of weekends) using validated tools and trained research staff to identify delirium; (3) should consider inclusion of variables and assessments that are readily available in clinical practice and are feasible to administer without extensive training or interpretation where possible and not to exclude a more informative variable; (4) model development and validation should follow rigorous methods outlined by Steyerberg22 and Steyerberg and Vergouwe56 including strategies to counter low sample size and overly optimistic model performance, the use of Akaike information criterion and Bayesian information criterion to assess model fit and consider broad validations to expand case-mix and generalisability; and (5) adhere to strict guidelines as outlined by the TRIPOD Statement for statistical performance reporting including calibration and clinical utility statistics.22 50–52 56 59

Two classes of delirium prediction models may be required based on the acuity of the admission (elective or emergency). If precipitating factors are included in an elective admission delirium prediction model, where the patient is yet to incur the delirium provoking event, an individual’s delirium risk may be overestimated. In the second option, inclusion of only premorbid factors may underestimate delirium risk given the emergency clinical scenario.


Twenty-three delirium prediction models were identified. Fourteen of these were externally validated, and three were internally validated. Of the fourteen validated delirium prediction models, the overall predictive ability is moderate with an AUROC range from 0.52 to 0.94. Assessment of the outcome variable, delirium, is often non-systematic, and future studies would be improved with more standardised and frequent assessment. Overall, the variable inclusion and applied definitions in delirium prediction models are heterogeneous, making comparisons difficult. To improve delirium prediction models, future models should consider using standard variables and definitions to work towards a prediction tool that is generalisable to several populations within the remit of understanding the relationship with the precipitating event.


We would like to express our sincere gratitude for the assistance of Mary Hitchcock in Ebling Health Sciences Library in the design of the search strategy and Dave Dwyer, Lily Turner, and Casandra Stanfield for their assistance with proofreading the manuscript.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. 76.
  77. 77.
  78. 78.
  79. 79.
  80. 80.
  81. 81.
  82. 82.


  • Contributors HL and SP with the mentorship of RDS formulated the aim, developed the study protocol, completed the search and extracted the data. HL and RDS synthesised the data. HL with the mentorship of RDS drafted the manuscript and designed the tables. RB designed the figures and assisted with statistical interpretation. LB provided expertise on content related to cognition and reviewed the manuscript. DHJD and CMC assisted with synthesis and interpretation of results and discussion in relation to their expertise in geriatrics, cognition and delirium. MC, MM, MTVC and PP assisted with synthesis of results and discussion section, providing expertise in delirium in its respective settings.

  • Funding HL and RDS acknowledge funding support from the Department of Anesthesiology at University of Wisconsin-Madison. RDS acknowledges funding support from K23 AG055700. PP acknowledges funding support from R01 NHLBI(HL111111) and research grant from Hospira Inc in collaboration with National Institutes of Health.

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Complete search results including excluded studies and CHARMS Risk of Bias checklist decision tree available from corresponding author upon request.