Article Text

Original research
A LASSO-derived clinical score to predict severe acute kidney injury in the cardiac surgery recovery unit: a large retrospective cohort study using the MIMIC database
  1. Tucheng Huang1,2,3,
  2. Wanbing He1,2,3,
  3. Yong Xie1,2,3,
  4. Wenyu Lv1,2,3,
  5. Yuewei Li4,
  6. Hongwei Li1,2,3,
  7. Jingjing Huang1,2,3,
  8. Jieping Huang1,2,3,
  9. Yangxin Chen1,2,3,
  10. Qi Guo1,2,3,
  11. Jingfeng Wang1,2,3
  1. 1Department of Cardiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
  2. 2Guangzhou Key Laboratory of Molecular Mechanism and Translation in Major Cardiovascular Disease, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
  3. 3Guangdong Provincial Key Laboratory of Arrhythmia and Electrophysiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
  4. 4Department of Respiratory Medicine, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
  1. Correspondence to Jingfeng Wang; wjingf{at}mail.sysu.edu.cn; Qi Guo; guoq69{at}mail.sysu.edu.cn; Yangxin Chen; chenyx39{at}mail.sysu.edu.cn

Abstract

Objectives We aimed to develop an effective tool for predicting severe acute kidney injury (AKI) in patients admitted to the cardiac surgery recovery unit (CSRU).

Design A retrospective cohort study.

Setting Data were extracted from the Medical Information Mart for Intensive Care (MIMIC)-III database, consisting of critically ill participants between 2001 and 2012 in the USA.

Participants A total of 6271 patients admitted to the CSRU were enrolled from the MIMIC-III database.

Primary and secondary outcome Stages 2–3 AKI.

Result As identified by least absolute shrinkage and selection operator (LASSO) and logistic regression, risk factors for AKI included age, sex, weight, respiratory rate, systolic blood pressure, diastolic blood pressure, central venous pressure, urine output, partial pressure of oxygen, sedative use, furosemide use, atrial fibrillation, congestive heart failure and left heart catheterisation, all of which were used to establish a clinical score. The areas under the receiver operating characteristic curve of the model were 0.779 (95% CI: 0.766 to 0.793) for the primary cohort and 0.778 (95% CI: 0.757 to 0.799) for the validation cohort. The calibration curves showed good agreement between the predictions and observations. Decision curve analysis demonstrated that the model could achieve a net benefit.

Conclusion A clinical score built by using LASSO regression and logistic regression to screen multiple clinical risk factors was established to estimate the probability of severe AKI in CSRU patients. This may be an intuitive and practical tool for severe AKI prediction in the CSRU.

  • acute renal failure
  • cardiac surgery
  • adult intensive & critical care

Data availability statement

Data are available upon reasonable request. The data set analysed to generate the findings for this study is available from the corresponding author on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • Least absolute shrinkage and selection operator regression and multivariable logistic regression were used to establish a clinical score model.

  • The performance of this novel clinical score model in both the primary cohort and validation cohort was evaluated using the area under the receiver operating characteristic curve, calibration curves and decision curve analysis.

  • This novel clinical score model might not be suitable for those with a renal failure history.

  • External validation of this novel clinical score model was lacking.

Introduction

Acute kidney injury (AKI), a common complication in patients admitted to the intensive care unit (ICU) worldwide,1 2 is associated with adverse short-term and long-term prognoses.3 It has been reported that more than half of patients in the cardiac surgery recovery unit (CSRU) suffer from AKI of some stage,4 which is associated with high mortality and rehospitalisation rates.5 The early and rapid diagnosis and treatment of AKI may help reduce mortality and rehospitalisation rates. Although several biomarkers have been used for the early diagnostic and prognostic prediction of AKI,6 7 the clinical utilisation of these biomarkers has been limited. When the levels of these biomarkers increase, renal injury occurs. Thus, identifying critically ill patients at high risk of AKI is an important part of the overall management of CSRU patients.

Graphical calculation devices, which are presented as a scale or score that incorporate possible risk factors to make clinical prognostic predictions, have become increasingly popular. It has been extensively used to predict the probability of death or recurrence events for a patient with cancer.8 Recently, some researchers established a clinical prediction model for forecasting the occurrence of AKI in patients undergoing cardiac surgery.9 However, that small, single-centre study did not exclude patients with chronic kidney disease and thus probably overestimated the occurrence of AKI; additionally, only logistic regression for variable selection was used. By machine learning, a model was established to predict cardiac surgery-associated AKI, although the sample was small and urine output was neglected.10 Another study used a convolutional neural network model to predict severe AKI in the ICU, while patients with a previous diagnosis of chronic kidney disease were not excluded.11

Least absolute shrinkage and selection operator (LASSO) regression is of great strength for variable selection because it can efficiently address the potential association between covariates, such as collinearity.12 Accordingly, in this study, we performed LASSO regression to select variables and built a logistic regression model to identify independent risk factors for severe AKI in patients admitted to the CSRU. We aimed to determine the risk factors for severe AKI and develop a clinical score for evaluating the probability that patients undergoing critical cardiac care will acquire severe AKI.

Methods

Data source and ethics approval

The data were extracted from the Medical Information Mart for Intensive Care (MIMIC)-III data set. As a large and publicly available database, MIMIC-III comprises the clinical information for 61 532 ICU stay cases between 2001 and 2012. The use of the MIMIC-III database was approved by the review boards of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center.13 Because the information used in the study was from a publicly deidentified database, the informed consent requirement was waived.

Study population

Adult ICU stays longer than 1 day were included. When a patient had multiple ICU admissions, only the first medical record was selected for the study. The exclusion criteria were as follows: patients in units other than the CSRU (n=24 074, 77.8%); patients with no urine output records (n=105, 0.3%); patients with no creatinine data (n=439, 1.4%) and patients with existing renal failure (n=39, 0.1%) (figure 1). During the CSRU stay, all creatinine and urine output records were extracted, and AKI was defined according to the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines.14 Baseline serum creatinine was defined as the lowest creatinine in the past 7 days. Both urine output and serum creatinine criteria were used to identify AKI. Information about renal replacement therapy was not considered in this study. Severe AKI was defined as stage 2 or stage 3 AKI under the KDIGO criteria. Patients in the CSRU were screened, and a total of 6271 patients were included. Chronologically, the first 70% of patients were allocated to the primary cohort, and the last 30% were allocated to the validation cohort. Subsequently, we established a clinical score model by using the primary cohort data and validated the model by using the validation cohort.

Figure 1

Flow chart of enrolled subjects. A total of 6271 CSRU stay records were enrolled in this study. CSRU, cardiac surgery recovery unit; ICU, intensive care unit.

Variable extraction

The following variables were extracted.

Demographics: age (years), sex, height (cm), and weight (kg).

Vital signs: heart rate (/min), respiratory rate (/min), temperature (°C, saturation of peripheral oxygen (%), blood glucose level (mg/dL), systolic blood pressure (SBP, mm Hg), diastolic blood pressure (DBP, mm Hg), central venous pressure (CVP, mm Hg) and mean artery pressure (mm Hg). The mean value of vital signs in the 24 hours after admission was included for analysis.

Laboratory tests: white blood cell count (×109 /L), haemoglobin (g/L), platelets (×109 /L), chloride (mmol/L), sodium (mmol/L), blood urea nitrogen (mg/dL), bicarbonate (mmol/L), pH, partial pressure of oxygen (pO2, mm Hg), partial pressure of carbon dioxide (pCO2, mm Hg), creatinine (mg/dL) and potassium (mmol/L). The values of laboratory tests in the first 24 hours after admission were used for the analysis. In addition, 24-hour urine output was extracted.

Procedures: administration of furosemide, use of sedative, ventilation, vasopressor, cardiopulmonary bypass, coronary artery bypass grafting and left heart catheterisation. The sedative drugs in this study included midazolam, fentanyl andpropofol.

Comorbidities: coronary artery disease, congestive heart failure, atrial fibrillation, stroke, diabetes, renal disease, liver disease, chronic obstructive pulmonary disease and malignancy.

All variables were collected in the initial 24 hours after admission to predict severe AKI as early as possible. The frequency of missing values for each variable was less than 15%. The missing values were filled in by the random forest method using R software.

Statistical analysis

Continuous variables are denoted as the mean±SD or the median (IQR), whereas categorical variables are expressed as numbers (percentages). Continuous data were compared with Student’s t test or the rank-sum test, while categorical data were compared using the χ2 test.

In this study, LASSO was performed for variable selection. LASSO regression is a compression estimation used to address the collinearity between covariates. When there are several collinear predictors, LASSO selects only one and ignores the others or zeroes out some regression coefficients. Cross-validation was used during LASSO regression, and 1−SE criterion was used to select lambda. Namely, the value of lambda was identified when the cross-validated error was within one SE of the minimum. ORs with 95% CIs, statistics describing the strength of the association between disease and exposure, were calculated by logistic regression, thus estimating the association of independent risk factors with AKI. Finally, a clinical score model was established based on the above analysis, which was further validated with C-indices, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curves, the areas under the ROC curves (AUCs), calibration curves and decision curve analysis. We used 10-fold cross validation to identify the optimal clinical score model. Briefly, the primary cohort was randomly divided into 10 roughly equal-sized groups. One group was taken as a test data set, and the remaining groups were used as a training data set. The model was fitted on the training data set and evaluated on the test data set. After repeating the process 10 times, the optimal model with the best performance was identified.

SPSS software (V.23.0, IBM, NY, USA) and R software (V.3.6.3, R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analysis. The packages used in this study included missForest, glmnet, rms, pROC, caret and rmda. A two-sided p<0.05 was considered statistically significant.

Patient and public involvement

Patients and/or the public were not directly involved in this study.

Results

Patients with severe AKI comprised 55.9% (2452/4388) and 54.2% (1020/1883) of the primary and validation cohorts, respectively. No significant difference in the severe AKI rate was observed between the two cohorts (p=0.213). Except for SBP (primary cohort, 113.3 mm Hg vs validation cohort, 132.4 mm Hg, p=0.040), no clinical characteristics showed a significant difference between the primary and validation cohorts (table 1).

Table 1

Baseline characteristics of the enrolled subjects in the primary and validation cohorts

In the primary cohort, patients with severe AKI were older, had higher weights and had higher blood glucose level than those without severe AKI (p<0.001). SBP and DBP were significantly lower (112.7 mm Hg vs 114.0 mm Hg and 56.6 mm Hg vs 57.9 mm Hg, respectively), while CVP was significantly higher (11.2 mm Hg vs 9.8 mm Hg) in the severe AKI group (p<0.001). Urine output and pO2 were lower in the severe AKI group (p<0.01). Drug administration was also different, namely, severe AKI patients received sedatives, ventilation and furosemide significantly more often (p<0.001). The stroke prevalence rates were the same, but a higher prevalence of atrial fibrillation, congestive heart failure and left heart catheterisation was observed in severe AKI patients (p<0.05) (table 2).

Table 2

Baseline characteristics of the severe AKI and non-severe AKI groups in the primary cohort

To confirm the possible risk factors for severe AKI, we performed LASSO regression to select variables. A total of 18 variables were enrolled for further analysis according to the 1−SE criterion (figure 2). Then, we conducted logistic regression analysis based on the LASSO results. A total of 14 variables were shown to be associated with severe AKI (table 3).

Figure 2

LASSO coefficient profiles of variables and misclassification errors for different models. The upper panel presents the associations between the coefficients of variables and the log lambda value. Each line corresponds to one distinct variable. With increasing log lambda, the coefficient of the variable tended towards 0. The lower panel presents the selection of the applicable model. Vertical lines were drawn at the optimal values by adopting the minimum criteria (dashed line) and the SE of the minimum criteria (dotted line, the 1−SE criteria). In our study, the lambda value was chosen according to the 1−SE criteria. LASSO, least absolute shrinkage and selection operator.

Table 3

Variables in the LASSO regression and multivariate logistic regression models

Next, we included the above significant factors to build a clinical score based on the logistic regression model (figure 3). Each level of every variable was assigned a score. By adding the scores for all of the selected variables, the total score was obtained. By checking the number corresponding to the total scores, the probability of severe AKI can be estimated for a given patient.

Figure 3

Clinical score for the prediction of severe AKI in CSRU patients. All 14 selected variables, including age, sex, weight, respiratory rate, SBP, DBP, CVP, urine output, pO2, sedative usage, furosemide, atrial fibrillation, congestive heart failure and left heart catheterisation, were given corresponding points based on their values. The total points of these variables corresponded to the predicted probability of severe AKI in the CSRU. AKI, acute kidney injury; CSRU, cardiac surgery recovery unit; CVP, central venous pressure; DBP, diastolic blood pressure; SBP, systolic blood pressure.

The C-indices were 0.779 for the primary cohort and 0.778 for the validation cohort. The ROC curves demonstrated that the model had good discriminative ability in both the primary cohort (AUC: 0.779, 95% CI: 0.766 to 0.793) and the validation cohort (AUC: 0.778, 95% CI: 0.757 to 0.799) (table 4). Calibration plots showed that the apparent curves were adjacent to the ideal curves in both the primary and validation groups. Finally, decision curve analysis was performed to compare the clinical usability and benefits of the model. The decision curves showed acceptable net benefits across a range of high risks of severe AKI in the primary and validation cohorts (figure 4).

Figure 4

Performance evaluation of the severe AKI prediction model. ROC curves in the primary cohort (A) and validation cohort (B). The AUCs of the model in the primary and validation cohorts were 0.779 and 0.778, respectively. Calibration curves in the primary cohort (C) and validation cohort (D). The observed values were close to the ideal values, indicating a satisfactory forecasting performance of the clinical score model. Decision curve analyses in the primary cohort (E) and validation cohort (F), showing the net benefit from the model. AKI, acute kidney failure; AUC, area under the receiver operating characteristic curve; ROC, receiver operator characteristic curve.

Table 4

Model performance in the primary and validation cohorts

We also evaluated the model performance after excluding the variable of urine output. Without urine output information, the model also showed acceptable discriminative ability in both the primary cohort (AUC: 0.713, 95% CI: 0.698 to 0.728) and the validation cohort (AUC: 0.718, 95% CI: 0.695 to 0.741) (online supplemental table 1). For patients without suffering AKI in the initial 24 hours after admission, the model performed with an AUC of 0.680 (95% CI: 0.651 to 0.709) in the primary cohort and an AUC of 0.673 (95% CI: 0.630 to 0.715) (online supplemental table 2).

Discussion

AKI is a complicated clinical syndrome characterised by reduced urine production and/or rapid increases in serum creatinine.15 AKI has been reported to be positively associated with short-term mortality in CSRU populations.5 16 Delayed diagnosis of AKI is an independent risk factor for nosocomial death.17 Therefore, the early identification of patients at risk for AKI might help to reduce short-term mortality, improve prognosis, and reduce the healthcare burden.

In this study, we extracted the clinical information of 6271 patients from the MIMIC-III database. We identified the following 14 possible risk factors for severe AKI by LASSO regression and logistical regression: age, sex, weight, respiratory rate, SBP, DBP, CVP, urine output, pO2, sedative use, furosemide, atrial fibrillation, congestive heart failure and left heart catheterisation. Subsequently, a clinical score model was constructed by quantifying the weight of the aforementioned variables. The clinical score model was well fitted, as evaluated by the AUC, calibration curves and decision curve analysis in both the primary and validation cohorts. The model could calculate a severe AKI probability immediately after the initial 24 hours and might help clinicians perform early intervention.

Several scoring systems and prognostic models have been built to predict AKI. Scoring systems such as the Cleveland Clinic Score18 and the Mehta Score19 only consider AKI patients requiring dialysis and thus might miss patients with subclinical AKI. Additionally, clinical prediction models have been used to forecast AKI in patients undergoing cardiac surgery9 or coronary angiography.20 These studies enrolled both mild and severe AKI patients. Our model was generated from the MIMIC-III database, with a larger sample size and more variables. This study predicted only severe AKI, which might be more attractive for clinical practice. Moreover, the primary cohort and validation cohort were assigned by admission time. According to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis statement, non-random assignment by time is a stronger design feature for evaluating model performance than random assignment.21

LASSO regression is a popular variable selection algorithm for multicollinear data or high-dimensional data.22 LASSO has been widely used for clinical prediction. For example, via LASSO, researchers have built a clinical model to predict the diagnosis and prognosis of colon cancer.23 A radiomics signature using LASSO has been developed to evaluate survival in patients with non-small-cell lung cancer.24 LASSO has been used to predict AKI in patients with haematologic tumours, patients suffering from cardiac surgery or patients hospitalised in the neurosurgical ICU.12 22 25 In the present study, based on clinical profiles, LASSO was performed to select relevant coefficients from a multitude of variables, simultaneously removing all unrelated variables. Through dimensionality reduction using LASSO, 42 clinical variables were screened down to 14 risk factors, according to the 1−SE criterion.

Among those 14 variables, older age and obesity were independent risk factors for AKI, as indicated by previous investigations.26 27 Additionally, hypotension has been reported to be associated with new-onset AKI in ICU patients with shock.28 High CVP, indicating fluid overload, is another factor affecting AKI.29 Consistent with previous studies, these risk factors were included in the clinical score model and given a weighted score. Reduced urine output is a clinical manifestation of AKI and is also an important factor underlying the poor prognosis of AKI. In this study, decreased urine output was one of the most important predictors of AKI in CSRU patients. Overall, the clinical score model contained 14 variables, more than half of which have been reported to be associated with AKI. In addition, ROC curves, calibration curves and decision curve analysis showed consistent results in both the primary and validation cohorts, showing that the clinical score model could be an effective and reliable tool for predicting the risk of severe AKI.

Several limitations of our study must be noted. First, this study was based on the MIMIC-III database, whose data were collected between 2001 and 2012. Some therapies might not meet the latest guidelines and some newer medicines might not be included. Because of the single-centre nature of the data, the performance of our model might vary when applied to other regions. The potential residual confounding by variables not recorded in this database could not be evaluated. Second, only patients without existing renal failure were included in this study. Thus, this novel score model might not be suitable for those with a renal failure history. Third, missing values were filled by the random forest method, which might lead to biased regression coefficient estimates.30 Therefore, further studies are needed to verify our model. Fourth, our model was designed to be used immediately after the initial 24 hours of admission, and it may not work for patients who suffer AKI within those initial 24 hours.

Conclusion

In conclusion, this study established and validated a novel clinical score by using LASSO regression and logistic regression to screen for multiple clinical risk factors to estimate the probability of severe AKI in CSRU patients. This clinical score model can be an intuitive and reliable predictive tool that might help in individualised clinical decision-making and risk management for severe AKI.

Data availability statement

Data are available upon reasonable request. The data set analysed to generate the findings for this study is available from the corresponding author on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants. The data were extracted from the Medical Information Mart for Intensive Care (MIMIC)-III data set. As a large and publicly available database, MIMIC-III, comprises the clinical information for 61532 ICU stay cases between 2001 and 2012. The use of the MIMIC-III database was approved by the review boards of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center. Since the information used in the study was from a publicly deidentified database, informed consent was waived. Briefly, this study was based on a famous public database. No reference number or ID was available for this public database. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We would like to thank the participants, developers and investigators associated with the Medical Information Mart for Intensive Care (MIMIC)-III database.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • TH and WH contributed equally.

  • Contributors TH: conceptualisation, data analysis, writing original draft, writing review and editing. WH: conceptualisation, writing original draft, writing review and editing. YX, WL and YL: writing original draft and data curation. HL, JJH and JPH: literature search and data interpretation. YC, QG and JW: conceptualisation, writing review and editing and data curation. QG accepts full responsibility for the finished work and the conduct of the study, had access to the data, and controlled the decision to publish. QG was confirmed as guarantor.

  • Funding This study was supported by grants from the National Natural Science Foundation of China (nos. 82070237, 81870170, 81770229, 81970200), Guangdong Basic and Applied Basic Research Foundation (no. 2020A1515110313), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory) (no. 2019GZR110406004), Guangzhou Science and Technology Bureau (nos. 201803040010, 201707010206, 202102010007) and Yat-sen Start-up Foundation (no. YXQH202014).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.