Article Text

Original research
Published models that predict hospital readmission: a critical appraisal
  1. Lisa Grossman Liu1,
  2. James R Rogers1,
  3. Rollin Reeder2,3,
  4. Colin G Walsh2,3,4,
  5. Devan Kansagara5,
  6. David K Vawdrey1,6,
  7. Hojjat Salmasian7,8,9
  1. 1Department of Biomedical Informatics, Columbia University, New York, New York, USA
  2. 2Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
  3. 3Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
  4. 4Department of Psychiatry, Vanderbilt University, Nashville, Tennessee, USA
  5. 5Department of Medicine, Oregon Health and Science University and VA Portland Health Care System, Portland, Oregon, USA
  6. 6Steele Institute for Health Innovation, Geisinger, Danville, Pennsylvania, USA
  7. 7Division of General Internal Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
  8. 8Harvard Medical School, Boston, Massachusetts, USA
  9. 9Mass General Brigham, Somerville, Massachusetts, USA
  1. Correspondence to Dr Lisa Grossman Liu; lvg2104{at}


Introduction The number of readmission risk prediction models available has increased rapidly, and these models are used extensively for health decision-making. Unfortunately, readmission models can be subject to flaws in their development and validation, as well as limitations in their clinical usefulness.

Objective To critically appraise readmission models in the published literature using Delphi-based recommendations for their development and validation.

Methods We used the modified Delphi process to create Critical Appraisal of Models that Predict Readmission (CAMPR), which lists expert recommendations focused on development and validation of readmission models. Guided by CAMPR, two researchers independently appraised published readmission models in two recent systematic reviews and concurrently extracted data to generate reference lists of eligibility criteria and risk factors.

Results We found that published models (n=81) followed 6.8 recommendations (45%) on average. Many models had weaknesses in their development, including failure to internally validate (12%), failure to account for readmission at other institutions (93%), failure to account for missing data (68%), failure to discuss data preprocessing (67%) and failure to state the model’s eligibility criteria (33%).

Conclusions The high prevalence of weaknesses in model development identified in the published literature is concerning, as these weaknesses are known to compromise predictive validity. CAMPR may support researchers, clinicians and administrators to identify and prevent future weaknesses in model development.

  • health informatics
  • information technology
  • statistics & research methods

Data availability statement

Data are available on reasonable request. Please email the corresponding author.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study appraises readmission models, of which dozens are published and which have limited regulatory oversight, to ensure high quality and usefulness for health decision-making.

  • Recommendations are specific to readmission models and were developed using a modified Delphi process. Critical appraisal was undertaken independently by two researchers.

  • Recommendations are limited to model development and validation, and future revision will be necessary as the field advances and developers adopt more modern techniques.


In 2013, the Centers for Medicare and Medicaid Services (CMS) Hospital Readmission Reduction Program (HRRP) began to financially penalise US hospitals with excessive 30-day readmission rates, with the goal of improving patient care. Subsequently, research on readmission risk prediction models increased exponentially,1 2 with two distinct goals: (1) to identify high-risk patients for targeted interventions, and (2) to standardise institutions’ readmission rates for use as a performance indicator. Preventable hospital readmissions cost CMS $17 billion each year,3 and CMS penalties for subpar readmission rates totalled $566 million in 2018.4 Readmissions have received considerable attention due to this financial burden, their impact on patient care and their use as a performance indicator.5 6 Consequently, efforts to research7 8 and market9 10 readmission models have increased rapidly, and these models are used extensively for health decision-making.

However, uncritically accepting the results of any single model has risks. These models can be subject to flaws in their development and validation, as well as limitations in their clinical usefulness. Given the hundreds of readmission models now available, distinguishing the highest quality, most clinically useful models can be challenging for clinicians, researchers and healthcare administrators. In this study, we address this gap for readmission models in the published literature by critically appraising them. To conduct the critical appraisal, we developed Critical Appraisal of Models that Predict Readmission (CAMPR), which lists 15 Delphi-based expert recommendations for high-quality, clinically useful readmission models. CAMPR focuses on the unique considerations of readmission modelling, such as purpose, competing risks, outcome timing, risk factor definitions, data sources and thresholding. This manuscript discusses the expert recommendations and subsequent critical appraisal in detail, and provides reference lists of eligibility criteria and risk factors to consider when assessing readmission models.


The Delphi method is a well-established structured communication technique for systematically seeking expert opinion on a specific topic.11–13 Traditionally, the first round uses open-ended questions to generate ideas from participants, whereas subsequent rounds rely on more structured communication to achieve consensus. In this study, we conducted two rounds of online expert surveys, the first open-ended and the second semistructured, and a third round consisting of expert application to the published literature.11

Round 1: development of CAMPR (open-ended survey)

Survey content

A Delphi expert used an iterative process to develop the initial survey in collaboration with four physician experts in readmission models. The survey collected personal information on the respondents’ institution(s) and relevant expertise, as well as information on models at the respondents’ institution(s). Then, the survey assessed perceived barriers to model development and implementation, as well as strategies to overcome barriers and recommendations to improve models. The complete survey is available as online supplemental appendix A.

Data collection

To ensure rapidity and anonymity, provide convenience and recruit individuals from diverse backgrounds and geographical locations, we invited experts to participate electronically using Qualtrics Survey Software. Expert panels for Delphi studies traditionally have 12–20 members.14 Electronic participation enabled us to include more than 20 members, as desired due to the complex nature of the problem and the probable diversity of opinions. We distributed our survey via personalised, individual emails to all corresponding authors of readmission prediction studies from two recent systematic reviews (2011, 2016).1 2 Additionally, we publicly distributed it to members of the American Medical Informatics Association.

Eligibility criteria

We included both model developers and implementers in our expert panel to capture a broad range of perspectives on the readmission prediction literature. We required that participants speak English and self-report involvement in (1) the development of one or more readmission models, or (2) the implementation of one or more readmission models at one or more institutions.

Data analysis

Two researchers conducted thematic analysis in NVivo V.11 (QSR International). First, the researchers independently read each response and defined codes in a dictionary for the remaining analysis. Then, the researchers independently coded all responses using codes corresponding to the dictionary and summarised themes that emerged. Together, the researchers reviewed and named common themes that emerged, and resolved conflicts by discussion. To enhance confirmability, we shared summaries of coded data with three participants and asked for their confirmation or revisions to interpretation.

Round 2: further development of CAMPR (semistructured survey)

Preliminary version

Based on the thematic analysis in round 1, the study team identified 48 preliminary recommendations to operationalise two quality dimensions of readmission models: (1) development and (2) implementation. Each preliminary recommendation addressed one of four key thematic domains for development and five key thematic domains for implementation, identified via expert consensus (table 1).

Table 1

Quality dimensions of readmission risk prediction models

Survey content and data collection

For each preliminary recommendation, the second survey asked participants to score the usefulness and content validity. Free-text fields enabled participants to add additional comments on each individual recommendation, as well as CAMPR in its entirety. The study team reviewed the survey before electronic delivery using Qualtrics. We distributed the survey via personalised emails to all previous eligible respondents who had agreed to additional contact. The complete survey is available as online supplemental appendix B.

Data analysis

We conducted a quantitative descriptive analysis of usefulness and content validity using R-3.3.3. Preliminary recommendations with a usefulness and content validity below predetermined thresholds (<50% useful or valid) were excluded, unless the free-text commentary indicated that the usefulness and content validity would greatly improve with revision. The study team reviewed all free-text commentary and refined, reworded or combined recommendations accordingly.

Round 3: application of CAMPR

Modified preliminary version

After refinement and reduction in round 2, the modified preliminary version contained 34 recommendations. We identified 23 development-related recommendations for inclusion in CAMPR, which primarily reflect the four key development domains identified in round 1 (validation, features, timeframe, data access). The remaining 11 implementation-related recommendations, which primarily reflect the five key implementation domains (resources, vision, clinical relevance, workflow integration and maintenance), will be reported on separately.

Iterative validation

Two researchers applied CAMPR to all published readmission prediction models identified in all known systematic reviews (2011, 2016).1 2 First, the researchers independently applied CAMPR to one-third of studies, then revised it to improve clarity, resolve discrepancies in application and combine redundant recommendations, which reduced the total number to 15. Then, the researchers independently applied the finalised version to all studies (detailed in online supplemental appendix C) and assessed inter-rater reliability. Conflicts were resolved by discussion.

Data extraction

Manual data extraction occurred concurrently with the application of CAMPR, to better assess characteristics of included studies. Importantly, we extracted eligibility criteria and risk factors for each readmission model, to generate reference lists for future developers (available tables 2 and 3 in the Results section). Examples of other types of extracted data include the readmission timeframe used (relevant to recommendation #6) and the validation technique used (relevant to recommendation #13). Two researchers developed the data extraction tool based on the initial review of one-third of studies. One researcher extracted data from each study, and another reviewed all extractions for completeness and accuracy.

Table 2

Barriers to developing and implementing readmission models

Table 3

Critical Appraisal of Models that Predict Readmission (CAMPR)

Data Analysis

All analyses were performed in R V.3.6.3. We used Cohen’s kappa and percentage agreement to measure inter-rater reliability. We stratified literature into recent (2011–2015) and early (1985–2010), using the year of CMS HRRP (2011) as our cut-off. We conducted bivariate analyses to assess whether adherence to each recommendation differed between recent and early literature, using an unequal variances t-test. Furthermore, we used Spearman’s rank correlation to examine whether overall adherence to recommendations differed by publication year. When classifying risk factors, we used the same classification as the first systematic review,1 with the added category ‘institution-related’ as suggested by expert consensus.


Development of CAMPR

Round 1

We successfully contacted 75 out of 81 corresponding authors who developed unique readmission models published from 1985 to 2015, of whom 14 (19%) completed our survey. An additional 49 respondents completed our survey after we publicly distributed it, of which we included 40 who had experience implementing readmission models. The final 54 eligible experts (14 developers and 40 implementers, characterised in online supplemental appendix D) represented 20 unique models, including well-known models such as LACE and CMS-endorsed models. Of 14 developers, only 7 (50%) reported that any institution currently used their model in any capacity. Table 4 reports expert-identified barriers to developing, validating and implementing readmission models, as well as strategies to overcome barriers.

Table 4

Eligibility criteria for readmission prediction models*

Rounds 2 and 3

We had permission to reconnect with 22 previous respondents, of whom 5 (23%) completed our second survey.

Application of CAMPR

We included 81 published readmission models in our critical appraisal.15–95 We found that published models followed 6.8 out of 15 recommendations (45%) on average. Fifty-five out of 81 (68%) followed less than half the recommendations, and no study followed every recommendation, suggesting an opportunity for improvement. Table 5 presents the percentages of published readmission models following each recommendation, stratified by publication year. Models published recently (2011–2015, n=55, 68%) followed significantly more recommendations than models published earlier (1985–2010, n=26, 32%) (7.1 vs 6.1, p=0.03), and publication year weakly correlated with recommendations followed (ρ=0.27, p=0.02), suggesting slight improvement in model quality over time as the field developed. Model types included regression (77, 95%), random forest (3, 4%), neural network (3, 4%), decision tree (2, 2%), discriminant analysis (2, 2%), support vector machine (1, 1%) and unclear (1, 1%). We found moderate-to-high inter-rater reliability for applying CAMPR (Cohen’s kappa=0.76, agreement=88%). Here, we summarise each recommendation in CAMPR and present the critical appraisal results. Additional results are in online supplemental appendix D. The complete dataset is available on request. CAMPR is available as online supplemental appendix E.

Table 5

Risk factors included in readmission prediction models

Recommendation #1: is the model’s purpose and eligibility criteria explicitly stated?

About the recommendation

Readmission models traditionally serve one of two purposes, or intended applications: (1) to identify patient candidates for targeted interventions to prevent readmission, or (2) to risk-adjust readmission rates for hospital quality comparison.1 Developers should clearly state which purpose their model serves, one or both. Developers should also define the target population by specifying eligibility criteria for patient inclusion in model development. Specifying eligibility criteria is critical to ensure implementers understand when each model applies, as unjustified application is a major reason why predictions fail.96 97

Critical appraisal results

Eighteen out of 81 studies (22%) did not define their model’s purpose. Of the remaining models, 46 (57%) were for preventing readmission, 15 (19%) were for hospital quality comparison and 2 (2%) were for both. Table 4 provides an abbreviated reference list of eligibility criteria for published readmission models (the full reference list is available in online supplemental appendix D). Twenty-seven models (33%) did not specify their eligibility criteria.

Recommendation #2: does the model consider common patient-related and institution-related risk factors for readmission?

About the recommendation

Developers should show that they considered risk factors or features that were included in previous models. Notably, institution-related factors such as hospital name should not be used in models for hospital quality comparison, as they can mask differences in hospital quality.

Critical appraisal results

Table 5 provides an abbreviated reference list of known risk factors for readmission and their frequency of inclusion in published models (the full reference list is available as online supplemental appendix D). Based on expert consensus and the existing literature, we identified seven categories of factors.1 Categories included (1) demographics (included in 75 models or 93%), (2) disease related (80, 99%), (3) functional ability (21, 26%), (4) healthcare utilisation (66, 81%), (5) medication related (33, 41%), (6) social determinants of health (53, 65%), (7) institution related (16, 23%). Five studies (out of 15, 33%) mistakenly used institution-related risk factors in models for hospital quality comparison.

Recommendation #3: does the model consider competing risks to readmission, particularly mortality?

About the recommendation

Death is a competing risk to readmission and may substantially impact readmission prediction depending on the target population.63 67 68 A high mortality rate may reduce model discrimination because death and readmission share similar predictive features. Ignoring mortality may limit insight about risk factors, and unaccounted changes in mortality may cause model drift. Developers should indicate that they accounted for both in-hospital and post-discharge mortality, as well as other competing risks to readmission (eg, transfers).28

Critical appraisal results

Thirteen models (16%) did not account for mortality, 40 (49%) accounted for in-hospital mortality only, 5 (6%) accounted for post-discharge mortality only and 21 (26%) accounted for both.

Recommendation #4: does the model identify how providers may intervene to prevent readmission?

About the recommendation

The expert group recognised that building actionable models, which identify where providers can intervene on risk factors to prevent readmissions, is critical to clinical usefulness. An actionable model may (1) identify modifiable risk factors on the individual level,36 62 90 or (2) identify which individuals will benefit most from intervention, which may not coincide with readmission risk. Notably, non-modifiable risk factors like age can obscure modifiable ones like polypharmacy or quality of care60 98 99; therefore, managing collinearity100 is important. In the future, predicting benefit will become easier as options for intervention become more well researched.101

Critical appraisal results

Four published models (5%) identified modifiable factors on the individual level. No models have predicted which individuals would benefit most from intervention.

Recommendation #5: does the model consider recent changes in the patient’s condition?

About the recommendation

A model that does not account for recent changes in the patient’s condition may give an outdated prediction, limiting its clinical usefulness and eroding trust in its predictions. The expert group recommended that models that give predictions near hospital discharge (ie, most current models) should account for changes during hospitalisation, including treatment effects, hospital-acquired conditions and social support status.

Critical appraisal results

Thirty-nine models (48%) accounted for changes during hospitalisation.

Recommendation #6: is the model’s timeframe an appropriate trade-off between sensitivity and statistical power?

About the recommendation

Researchers initially selected the 30-day timeframe as the optimal trade-off between statistical power and likelihood of association with the index admission.28 As common data models and health information exchange support larger datasets for model development, shorter timeframes such as 7 days may enable greater sensitivity for readmissions associated with the index admission without loss in statistical power.102 Therefore, developers should consider assessing prediction accuracy using multiple timeframes, as relevant to the clinical context and dataset size, to determine the best trade-off between sensitivity and statistical power. Timeframes should begin at discharge (as standardised by CMS)103 to prevent immortal person–time bias.63

Critical appraisal results

Sixty-three models (78%) used the standardised 30-day timeframe adopted by CMS, while 2 (2%) used 7 days, 3 (4%) 28 days, 5 (6%) 60 days and 9 (11%) 1 year. Twelve studies (15%) considered more than one timeframe, of which 7 (9%) modelled readmission risk using hazard rates. Nine studies (11%) inappropriately defined timeframes as beginning at admission, rather than discharge, and 14 (17%) did not specify when their timeframe began.

Recommendation #7: does the model exclude either planned or unavoidable readmissions?

About the recommendation

Planned readmission is defined as non-acute readmission for scheduled procedures. Planned readmissions should be excluded, as consistent with the standardised definition of all-cause readmission.17–19 66 Unavoidable readmission is defined more broadly as readmission not preventable through clinical intervention.28 As researchers develop standardised algorithms to more effectively identify unavoidable readmissions,47 61 104 using the broader definition may enable greater sensitivity and improve the relevance of predictions to the clinical setting. Therefore, developers should consider excluding unavoidable readmissions if it is useful, such as in multiple sclerosis, where the disease inevitably progresses and later readmissions become increasingly unavoidable. Notably, exclusion criteria can be highly complex and require third-party processing (eg, Vizient). Ideally, developers should publish their code. If not, the readmission outcome should be sufficiently defined to ensure transparency and reproducibility.105

Critical appraisal results

Thirty-nine models (48%) did not explicitly exclude planned readmissions. The remaining models either excluded planned readmissions (38, 47%) or excluded unavoidable readmissions more broadly (4, 5%).

Recommendation #8: is the model equipped to handle missing data and is missingness in the development datasets reported?

About the recommendation

Developers should explicitly state whether their model handles missingness and how, such as designating a ‘missing’ category for categorical variables, or multiple regression imputation for continuous variables. Dropping individuals with excess missingness is problematic because it decreases models’ generalisability to future individuals with excess missingness and falsely increases model performance in cases of structural missingness. Developers should also report on missingness in the datasets used for model development, so that implementers can determine potential generalisability to their real-world datasets.

Critical appraisal results

Only 34 studies (42%) discussed how their model handled missingness. Of these, 20 (25%) used one or more inappropriate techniques, including (1) dropping individuals with excess missingness (17, 21%), and (2) binning or imputation which was done improperly (3, 4%).

Recommendation #9: is preprocessing discussed and does the model avoid problematic preprocessing, particularly binning?

About the recommendation

Developers should explain their data preprocessing methods, because problematic methods may produce models with less-than-optimal predictive performance.96 One example of a problematic method is binning.106 Originally intended to improve interpretability, binning can cause information loss, and is no longer justifiable given users’ need for accurate predictions and modern interpretability techniques. In particular, manual or arbitrary binning, without clustering or splines, may decrease performance and introduce noise.107

Critical appraisal results

Only 27 studies (33%) discussed one or more data preprocessing techniques, despite mostly using regression models, which can be highly sensitive to small changes. Commonly discussed techniques included binning, interaction terms and transformations to resolve skewness, non-linearity and outliers.

Recommendation #10: does the model make use of all available data sources to improve performance?

About the recommendation

Developers should make use of publicly available data sources where possible and appropriate to the model’s purpose, such as the Social Security Death Index to determine post-discharge mortality (see recommendation #3) or curated public datasets to externally validate (see recommendation #13). Other data sources such as health information exchanges can help assess readmission at multiple institutions, which is desirable to better estimate the true readmission rate. When considering data sources from multiple institutions, such as with health information exchanges, developers should account for hospital-level patterns and clustering of readmission risk, which may occur because quality of care and data collection practices vary between institutions.36 61 68

Critical appraisal results

In the literature, data sources included claims (19, 23%), administrative datasets (33, 41%), electronic health records (42, 52%), disease-specific registries (12, 15%), research datasets (11, 14%), death registries (9, 11%), health information exchanges or linkages (4, 5%), and surveys or patient-generated health data (3, 4%). Seventy-five studies (93%) assessed readmission at only one institution, likely underestimating the true readmission rate. Thirty-nine studies (48%) used a single data source (administrative datasets: 15, 19%; electronic health records: 17, 21%).

Recommendation #11: does the model use electronically available data rather than relying on manual data entry?

About the recommendation

Developers should incorporate risk factors that will be available electronically at the time of prediction and avoid manual data entry by providers or research assistants. Manual data entry may inhibit widespread implementation, by consuming human resources and preventing automated generation of predictions.

Data extraction results

Twenty-six models (32%) relied on manual data entry.

Recommendation #12: does the model rely on data available in sufficient quantity and quality for prediction?

About the recommendation

Developers should indicate whether data included in their model can be accessed in sufficient amounts and quality for development and implementation. ‘Sufficient’ is subjective and requires consideration of real-world missingness. Automated quality assurance, which identifies erroneous entries (eg, age>120 years) and incorrect data combinations (eg, former smoker YES, never smoker YES), may help to improve quality.

Critical appraisal results

Twenty-three studies (28%) identified problems with either data quantity (17 out of 23, 74%) or quality (8 out of 23, 26%).

Recommendation #13: is the model internally validated using cross-validation or a similarly rigorous method?

About the recommendation

The importance of using repeated k-fold cross-validation or a similarly rigorous method is well established. Split-sample validation is insufficient and may cause unstable and suboptimal predictive performance.108–110 If the model is intended for generalised use at more than one institution, developers or implementers should confirm external validity using one or more external, representative and independent datasets, from another institution or source. Internal validation alone is insufficient to ensure generalisability.108

Critical appraisal results

Ten models (12%) were not validated at all, 64 (79%) were internally validated only and 7 (9%) were internally and externally validated. For internal validation, 46 (57%) used random split-sample, 11 (14%) used split-sample by time, 12 (15%) used bootstrapping, 3 (4%) used cross-validation and 1 (1%) used out-of-bag estimates.

Recommendation #14: is the model’s discrimination reported and compared with known models where appropriate?

About the recommendation

It is commonly accepted practice to prominently and clearly report discrimination using appropriate and well-known measures beyond just the concordance (c) statistic.111 Where possible, comparison with an established baseline is essential, because so many models already exist. Developers should compare performance using statistical tests with cross-validation or another method, and only compare models with similar eligibility criteria.

Critical appraisal results

Seven models (8%) did not report discrimination. Commonly reported measures included the c statistic (47, 58%), sensitivity or specificity (23, 28%), area under the receiver operating characteristic curve (19, 23%), negative or positive predictive value (18, 22%), integrated discrimination improvement (5, 6%) and net reclassification improvement (3, 4%).

Recommendation #15: is the model calibrated if needed and is calibration reported?

About the recommendation

Proper calibration is critical for sorting patients in descending order of readmission risk for making intervention decisions. It is commonly accepted practice to report calibration using calibration curves with no binning.112 113 Reporting the Hosmer-Lemeshow (HL) goodness-of-fit statistic is insufficient, as a non-significant HL statistic does not imply the model is well calibrated, and the HL statistic is often insufficient to detect quadratic overfitting effects common to support vector machines and tree-based models.112

Critical appraisal results

Thirty-seven models (44%) did not assess calibration. Commonly reported measures included the HL statistic (29, 36%) and observed-to-expected ratios (17, 21%).


In this study, we critically appraised readmission models using 15 Delphi-based expert recommendations for development and validation. Interestingly, we found that many published readmission models did not follow the experts’ recommendations. This included failure to internally validate (12%), failure to account for readmission at other institutions (93%), failure to account for missing data (68%), failure to discuss data preprocessing (67%) and failure to state the model’s eligibility criteria (33%). The high prevalence of these weaknesses in model development identified in the published literature is concerning, because these weaknesses are known to compromise predictive validity. Identification of weaknesses in these domains should undermine confidence in a model’s predictions.

In our expert surveys, several lessons emerged, most notably about improving models’ relevance to clinical care and integration of predictions into clinicians’ workflows. In particular, experts expressed concern that models identified the highest-risk patients rather than the patients who might benefit most from intervention, which led to recommendation #4. Experts also noted that the published literature in existing systematic reviews and therefore our critical appraisal is focused on development and internal validation. This suggests that literature on external validation and implementation is less common. Additional efforts to research external validation and implementation could improve readmission models, by making them more applicable to a broader patient population.

In the future, CAMPR may be a convenient teaching aid for model implementers and users at healthcare institutions, such as clinicians and healthcare administrators, as well as for model developers in academic and commercial research. CAMPR does not explain the detailed logic and methods of developing and implementing predictive models, and those looking for comprehensive advice should consult other resources. Finally, CAMPR is not intended as a reporting standard for academic studies, and responses to CAMPR recommendations should not be used to derive an overall score. An overall score may disguise critical weaknesses that should diminish confidence in model predictions. Rather than generating an overall score, consider the potential impact of failing to follow each recommendation, and how that may interact with the use of that model in the given patient population.

We developed CAMPR using a modified Delphi process consisting of two online rounds, which we found faster and more practical than conducting the traditional in-person meetings and three rounds. Beyond readmission modelling, other predictive modelling domains in healthcare (eg, sepsis risk, mortality risk, etc) could benefit from similar guidance. Thinking beyond better modelling techniques is essential, or model predictions will remain of limited clinical use. This includes thinking about how to generate better datasets, thinking about model drift and maintenance over time,114 and thinking about how to clinicians should act on predictions.


The study used a modified Delphi process, which may lack rigour compared with the traditional Delphi process. We used an ‘opt-in’ process to recruit experts, and this self-selection bias may have led to missed recommendations or opinions. Fewer participants responded to the second round than expected, although the number was sufficient for the Delphi process. Future revision of recommendations will likely be necessary as the field advances and developers adopt more modern techniques. CAMPR is not intended as a reporting standard, and a more formal evaluation of construct validity and generalisability would be needed before it could be used as such.

Data availability statement

Data are available on reasonable request. Please email the corresponding author.

Ethics statements

Ethics approval

The Columbia University Medical Center Institutional Review Board approved the studies (Protocol AAAR0148). We used online surveys to collect opinions about predictive models from expert researchers. In accordance with 45CFR46.101, the Columbia University Medical Center Institutional Review Board determined this to be a Category 2 exemption and did not require written informed consent.


Supplementary materials


  • Twitter @LisaGrossmanLiu

  • Contributors HS conceptualised this work. LGL and RR conducted and analysed the expert surveys. HS, CGW, DK and DKV provided supervision throughout the survey process. LGL and JRR conducted the critical appraisal and data extraction, and LGL performed the associated analyses. LGL drafted the manuscript, and all authors contributed to refining all sections and critically editing the paper.

  • Funding This work was supported by the National Library of Medicine (F31LM054013, PI: LGL).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.