Article Text

Original research
Applicability of predictive models for 30-day unplanned hospital readmission risk in paediatrics: a systematic review
  1. Ines Marina Niehaus1,
  2. Nina Kansy1,
  3. Stephanie Stock2,
  4. Jörg Dötsch3,
  5. Dirk Müller2
  1. 1Department of Business Administration and Health Care Management, University of Cologne, Cologne, Germany
  2. 2Institute for Health Economics and Clinical Epidemiology, University of Cologne, Cologne, Germany
  3. 3Department of Paediatrics and Adolescent Medicine, University Hospital Cologne, Cologne, Germany
  1. Correspondence to Ines Marina Niehaus; niehaus{at}wiso.uni-koeln.de

Abstract

Objectives To summarise multivariable predictive models for 30-day unplanned hospital readmissions (UHRs) in paediatrics, describe their performance and completeness in reporting, and determine their potential for application in practice.

Design Systematic review.

Data source CINAHL, Embase and PubMed up to 7 October 2021.

Eligibility criteria English or German language studies aiming to develop or validate a multivariable predictive model for 30-day paediatric UHRs related to all-cause, surgical conditions or general medical conditions were included.

Data extraction and synthesis Study characteristics, risk factors significant for predicting readmissions and information about performance measures (eg, c-statistic) were extracted. Reporting quality was addressed by the ‘Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis’ (TRIPOD) adherence form. The study quality was assessed by applying six domains of potential biases. Due to expected heterogeneity among the studies, the data were qualitatively synthesised.

Results Based on 28 studies, 37 predictive models were identified, which could potentially be used for determining individual 30-day UHR risk in paediatrics. The number of study participants ranged from 190 children to 1.4 million encounters. The two most common significant risk factors were comorbidity and (postoperative) length of stay. 23 models showed a c-statistic above 0.7 and are primarily applicable at discharge. The median TRIPOD adherence of the models was 59% (P25–P75, 55%–69%), ranging from a minimum of 33% to a maximum of 81%. Overall, the quality of many studies was moderate to low in all six domains.

Conclusion Predictive models may be useful in identifying paediatric patients at increased risk of readmission. To support the application of predictive models, more attention should be placed on completeness in reporting, particularly for those items that may be relevant for implementation in practice.

  • health services administration & management
  • health & safety
  • risk management
  • paediatrics

Data availability statement

Data are available upon reasonable request. Additional information, including the protocol, is available from the corresponding author.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Independent and standardised methodological approach for study selection, data extraction and risk of bias assessment.

  • Comprehensive presentation of predictive models that provide information about applicability, performance and reporting quality at a model level, differentiated by 30-day all-cause, surgical conditions and general medical condition-related paediatric unplanned hospital readmissions.

  • Due to study heterogeneity, the models were only narratively synthesised.

Introduction

Hospital readmissions (HRs) are becoming increasingly important as a quality indicator for paediatric inpatient care.1 2 HR is often defined as a subsequent, unplanned admission within a period of 30 days after the index hospitalisation.3 For paediatric populations, rates of all-cause 30-day unplanned hospital readmission (UHR) ranged from 3.4% to 18.7%.3–5 In addition, taking 27 US states into account, it has been estimated that paediatric HRs can cost up to $2 billion annually, with approximately 40% of these occurring HRs being potentially preventable.6

Identifying the reasons for paediatric HRs is a major challenge, as the health of children is also affected by factors aside of inpatient care.7 Predictive models can be applied as a tool for the identification of patients with a risk of HR higher than that of the average population and for the implementation of preventive interventions to reduce the risk of HR.8 Especially in the context of the ongoing COVID-19 pandemic, where children and adolescents are also being hospitalised with a variety of symptoms,9–11 the prevention of UHRs can be beneficial, as it would allow hospital resources to be used in a more target-orientated way.

This systematic review aimed to address two research gaps that have been identified:

  1. Predictive models with good performance are useful in practice when clinicians and other stakeholders have all the necessary information for their application in clinical practice and critical assessment.12 However, previous systematic reviews discussed the shortcomings in reporting the quality of prediction models13–15 and also for paediatric clinical prediction rules16.

  2. A previous systematic review has already identified 36 significant risk factors for UHRs in paediatric patients with different health conditions.3 The largest number of risk factors was identified for surgical procedure-related UHRs. Among others, comorbidity was one of the most common risk factors across the 44 included studies.3 The review3 extends the findings of an earlier systematic review that focused on 29 paediatric studies targeting predictors for asthma-related UHRs17.

Both reviews3 17 were primarily addressed to predictor finding studies14,while to date, there is no published review of existing 30-day UHR predictive models in paediatrics.

The objective of this systematic review was to determine the potential application of multivariable predictive models for individualised risk prediction of 30-day UHR in the paediatric population by evaluating the models’ discriminative ability, completeness in reporting and the risk factors shown to be significant for prediction of 30-day UHR.

Method

The 2020 Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement was adhered to for conducting and reporting of this systematic review.18 Screening of the titles and abstracts, data extraction, quality assessment and analyses (eg, completeness in reporting) were performed by two independent reviewers, while disagreements were discussed with a third author. A protocol for this non-registered systematic review was prespecified and is available from the corresponding author. Based on expert recommendation, the analysis was subsequently focused on 30-day UHRs instead of 30-day HRs (ie, planned HRs and UHRs), deviating from the prespecified protocol.

Data source and search strategy

CINAHL, Embase and PubMed were used for an electronic database search to identify studies published up to 7 October 2021. The key search terms include the outcome variables used for the model (ie, readmission/rehospitalisation), elements of the study design (ie, prediction/c-statistic) and the population of interest (ie, paediatrics/children) (see online supplemental material for full search strategies—online supplemental tables A1–A3). The reference lists of the included studies and of comparable systematic reviews3 17 were examined for further potential studies.

Inclusion criteria

Studies addressing multivariable predictive models for children and adolescents (except newborns/preterm newborns, as the index admission is the birth hospitalisation) were included if they were published in English or German and available as full texts in peer-reviewed original journal articles. Studies aiming to develop a new model or to validate an existing model were included (1) if the model was potentially appropriate for the individual prediction of 30-day UHR from acute healthcare service after discharge or after index procedure in paediatrics and (2) if the model provided at least one discrimination measure (eg, c-statistic). Discriminative ability is a key factor in evaluating predictive models19 and a necessary information to make well-founded conclusions about the performance of a model. In addition, (3) predictive model studies that developed a new model (ie, development design) or determined the incremental or added value of a predictor for an existing model (ie, incremental value design) had to be based on a regression modelling approach. This inclusion criterion enables us to identify significant risk factors and to apply the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) adherence form, which was originally developed for regression models.20 This implies that predictive models using machine-learning (ML) techniques (eg, least absolute selection and shrinkage operator21 or random forest22) are excluded and coded as non-regression models. Studies that aimed to identify 30-day UHR predictors and did not provide a discrimination measure are classified as prognostic factor studies and are thus excluded from the analysis (so as not to bias them adversely in TRIPOD adherence). Prognostic factor studies, for example, are not required to present a simplified scoring rule (cf. TRIPOD item 15b23). Due to specific requirements of mental diseases, studies were only included (4) if they addressed non-mental health condition-related 30-day UHRs.3

Data extraction

Just as in previous systematic reviews,3 24 studies were categorised by health conditions in all tables. Basic study characteristics were extracted according to criteria in tables 1 and 2. To assess the applicability of the predictive models, significant risk factors (ie, odds ratio (OR) or hazard ratio>1 with a p value of <0.05) were assigned to established and revised variable categories3 in table 3. If all variables of a predictive model are available for a patient at the time of index admission (eg, previous health service usage before index admission), the model is applicable at admission. Applicability of predictive models at discharge is given if all variables are available at this point for a patient (eg, length of stay and operative time).

Table 1

Summary of study characteristics for all-cause 30-day UHR predictive models

Table 2

Summary of study characteristics for surgical and general medical conditions-related 30-day UHR predictive models

Table 3

Significant risk factors for 30-day unplanned hospital readmission predictive models with a development or incremental value design

Reporting quality and performance

Predictive models can just be used in practice when clinicians and other stakeholders have access to all information required for their application in clinical practice.12 The newly developed 'Critical Appraisal of Models that Predict Readmission (CAMPR)' contains 15 expert recommendations for predictive model development relating to HRs. However, CAMPR should not be used as a reporting standard so far and relates to aspects that are out of the scope of this systematic review (eg, considering different time frames for UHRs).25 Due to the importance of high-quality information about predictive models, we decided to assess the completeness of reporting by using the TRIPOD adherence form and scoring rules.12 23 26 The TRIPOD adherence form consists of 22 main criteria based on the TRIPOD statement,20 resulting in 37 items that are applicable to varying degrees to the development, validation and incremental value studies.23 We decided to apply the TRIPOD adherence form at predictive model level. Therefore, publications that report the development and validation of the same predictive model, for example, are assessed separately. According to previous research, our analysis concentrates on items that could be reported in the main text or supplements27.

TRIPOD adherence at model level was merged with the performance results (ie, discrimination and calibration measures) and the applicability assignment in table 4. The discrimination of a predictive model is often evaluated by the c-statistic or area under the receiver operating characteristic curve. The c-statistic can take a value between 0.5 and 1. A value of 0.5 indicates that the model is not superior to a random prediction of outcome, while values between 0.7 and 0.8 indicate that the model is appropriate. A value of 0.8 or greater indicates a strong discrimination of a model.28

Table 4

Performance, application and TRIPOD adherence of 30-day UHR predictive models in paediatrics (n=37)

Quality assessment

Following previous systematic reviews,3 24 29 the refined version of the quality in prognosis studies (QUIPS) tool with its prompting items30 was used to appraise the studies critically with regard to the included predictive models based on six domains. Each domain was rated with a ‘high’, ‘moderate’ or ‘low’ risk of bias.

The six domains are30 ‘study participation’, ‘study attrition’, ‘prognostic factor measurement’, ‘outcome measurement’, ‘study confounding’ and ‘statistical analysis and reporting’.

Data synthesis

Because a quantitative evaluation in the form of a meta-analysis was not possible due to the high heterogeneity among the studies, the studies were qualitatively synthesised; that is, the results for performance, completeness in reporting and significant risk factors were presented in a narrative and simplified quantitative form.

Patient and public involvement

Due to the study design, we did not involve patients or the public.

Results

Search result

From the electronic database search, 10076 records were obtained. After duplicates had been removed, the titles and abstracts were screened for 7694 records. Based on the predefined inclusion criteria, 7586 records were excluded. Adding one additional recommended article31,we found that this results in 109 records being included in the full-text assessment. Among the 84 excluded records, 2 were predictive model studies for 30-day HRs (ie, UHRs and planned HRs) with discrimination metrics32 33; 12 studies analysed 30-day UHRs or 30-day HRs combined with another outcome (ie, emergency department return visits (n=5),34–38 mortality (n=3)39–41 and other complications (n=4)42–45); 3 were predictive model studies for 30-day UHRs or 30-day HRs with no discrimination metrics46–48; 5 were non-regression-based predictive model studies for 30-day UHRs or 30-day HRs in paediatrics21 49–52; and 59 were prognostic factor studies for 30-day UHRs or 30-day HRs. Based on the full-text assessments (n=25) and the hand search of reference lists (n=353–55), 28 studies were included in the systematic review, with 6 of them55–60 already presented in a previous systematic review3 with a different focus. The results of the review process regarding the database search are provided in online supplemental figure A1 in the online supplemental material (see online supplemental table A4 in the online supplemental material for a summary of study characteristics of selected excluded models).

Quality assessment

Overall, the quality of many studies was moderate to low for several domains. For instance, the study quality had to be reduced due to a lack of sufficient information (eg, in the domain ‘study participants’ or 'study attrition'), while all studies were rated as ‘low’ for the domain 'study confounding' (see online supplemental table A5 in the online supplemental material for the results of the risk of bias assessment).

Study characteristics

All studies were based on retrospective data, with 9 studies based on tertiary or paediatric hospital data,22 55 61–67 and 19 studies based on centralised databases31 53 54 56–60 68–78. Four of 28 studies additionally included census data in the analysis.61 65 66 68 The period of data collection ranged from 1 year31 53 54 60 63 68 to 17 years69 70. The majority of studies included patients up to an age of <18 or ≤18 years. Only 5 studies considered patients up to 21 years of age59 64 71 or younger than 1 year74 76. The sample size was specified with different units in the individual studies (eg, encounters and admissions) and varies between 190 children74 and 1.4 million encounters69.

The 28 included studies resulted in 37 predictive models for 30-day UHRs in paediatrics. 10 of 28 studies developed or validated more than one predictive model for UHRs,22 58 59 65–70 75 which were in part excluded due to non-agreement with the inclusion criteria. The models included were grouped into three health conditions: (1) all-cause UHR (n=13),22 61 63–65 68 69 (2) surgical condition-related UHR (n=17)31 53 54 56–60 67 70 73–75 77 78 and (3) general medical condition-related UHR (n=7)55 62 66 71 72 76. The 30-day UHR rates varies from 1.5%53 to 41.2%71.

Among the 37 predictive models included, 32 (87%) used a development design22 31 53–61 63–67 70–78; 3 (8%) used an external validation design62 65 69; and 2 (5%) used an incremental value design66 68. All external validated models were based on existing predictive models that had been previously used in the adult population65 69 or for different outcomes62. Furthermore, 5 of the 28 studies included did not state the primary aim to develop, validate externally or assess the incremental value of the respective 30-day UHR predictive model.65 67–70

Of the predictive models with a development or incremental value design, 18 employed an apparent validation31 53–55 58–61 67 68 73–78 and 16 employed an internal validation22 56 57 63–66 70–72. The most commonly applied internal validation method was cross-validation (n=8)22 63 64 followed by split sample (n=5)56 65 70–72 and bootstrapping (n=3)57 66. In order to analyse the data, either a logistic regression22 31 53–55 57–61 63–68 70–78 or a Cox proportional hazard regression56 was used. Most models presented their results by ORs with a 95% CI. With a p value of <0.05, we considered the results as statistically significant.3 A summary of characteristics of all included studies is provided in tables 1 and 2.

Applicability and significant risk factors in predictive models

Based on the 28 predictive models with a development or incremental value design, 25 significant risk factors associated with 30-day UHRs were identified (see table 3). The most common risk factors were comorbidity (n=18), (postoperative) length of stay (n=10), illness severity (n=9) and principal procedures (n=9). The significant risk factors were inconsistently defined across predictive models, allowing a direct comparison only to a limited extent. ORs for comorbidity ranged from 1.0172 to 10.0858 across predictive models. A length of stay of ≥15 days (OR=2.39)61 and a postoperative length of stay of >4 days (hazard ratio=3.12)56 were considered to be a major risk factor. For illness severity, ‘intensive care unit stay’ (OR=3.302)67 and for principal procedures ‘isolated primary anterior spinal fusion’ (OR=7.65)54 were one of the most pronounced risk factors, respectively. The risk factor with the highest OR value was ‘any inpatient complication’ (OR=180.44).53 For all-cause UHRs, UHRs related to surgical conditions and UHRs related to general medical conditions, 14, 19 and 12 significant risk factors were found, respectively.

Most predictive models are potentially applicable at discharge (n=33), while 4 predictive models can be used at index admission,22 63 66 76 based on the significant and examined variables (see online supplemental table A6 in the online supplemental material for an overview of variables and table 4 for an application description).

Completeness in reporting and discriminative ability at model level

Information about TRIPOD adherence and performance at model level is provided in table 4. The median TRIPOD adherence of the models was 59% (P25–P75, 55%–69%; average: 60%), ranging from 33%69 to 81%66. Developed predictive models had a more favourable reporting quality in comparison with external validated models (ie, 59% (P25–P75, 55%–69%; average: 61%) compared with 44% (P25–P75, 39%–50%; average: 44%), respectively). Two models with poor adherence in reporting were based on an external validation design, and the validation of these models was not the primary aim of the study.65 69

Including all 37 items, we found that the overall median adherence per TRIPOD item across models was 65% (P25–P75, 32%–92%; average: 57%), ranging from 0% to 100% (see online supplemental table A7 in the online supplemental material for a detailed description by model type). The overall adherence per TRIPOD item is illustrated in figure 1.

Figure 1

Overall adherence per TRIPOD item across all included predictive models (n=37). Notes: Percentages relate to the number of models for which an item was applicable (in this case, the respective item should have been reported). *Indication of derivation from the total number of models for which a TRIPOD item was applicable (N=# of models for which the TRIPOD item is applicable): 10a (N=34), 10b (N=34), 10c (N=4), 10e (N=2), 11 (N=5), 12 (N=5), 13c (N=5), 14a (N=34), 14b (N=32), 15a (N=34), 15b (N=34), 17 (N=1), 19a (N=5). TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis

14% of the models reported the title (item 1) completely, while 19%62–66 68 of the models mentioned the predictive model type in this context. 3% of the models had a completed abstract (item 2). The detailed predictor definition (item 7a) was fulfilled for more models (95%), in contrast to outcome definition (item 6a) (reported in 70%). The handling of predictors in the analysis (item 10a) showed incomplete reporting in 82% of the models. In addition, the handling (item 9, reported in 35%) and reporting of missing values (part of item 13b, reported in 32%) were not addressed in many models. Just 9% of the models displayed complete reporting of the model-building procedure (item 10b), as the majority of the models (91%) did not address the testing of interaction terms22 31 53–61 64–68 70 72–75 77 78. The description (item 10d) and reporting of performance measures (item 16) were incomplete in 68% and 89% of the models. Just 24% of the models addressed results of calibration measures (cf. table 4). No model presented the full predictive model (item 15a) by providing an example of an intercept. An explanation for using the prediction model (item 15b, eg, by a simplified scoring rule) was presented in 21% of the models. One model provided detailed information about a simplified scoring rule (item 15b) in the online supplemental material66.

The discriminative ability (c-statistic) of the models ranged from 0.2862 to 0.8753. 14 out of 37 predictive models had a c-statistic of <0.7. The linear correlation between c-statistic and TRIPOD score at model level was not statistically significant (r=−0.241, p=0.15). Models with good discriminative ability (c-statistic >0.7)31 53–60 65 67–75 77 78 are primary applicable at discharge and have a TRIPOD score ranging from 41%31 to 69%57. The two models with the highest reporting quality (79% and 81%) are applicable for predicting 30-day UHRs of children with complex chronic conditions. The c-statistic values of these models were 0.6566 and 0.6766, respectively (see online supplemental figure A2 in the online supplemental material for an illustration of the models’ performance and TRIPOD adherence).

Discussion

Based on 28 studies, this systematic review identifies 37 predictive models that could potentially be used for determining individual 30-day UHR risk in paediatrics. According to the models, the 4 most common significant risk factors in predictive models were comorbidity, (postoperative) length of stay, illness severity and principal procedures. 23 validated predictive models have a c-statistic of >0.7. The median TRIPOD adherence of the predictive models included was 59% (P25–P75, 55%–69%), ranging from 33% to 81%, which is similar to that of other systematic reviews12 27.

Practical clinical and policy implications

In general, reporting quality and discriminative ability can provide crucial information about the strengths and weaknesses of a predictive model for implementation in practice (see online supplemental figure A2 in the online supplemental material for a combined illustration). However, the results from this systematic review revealed considerable differences in the c-statistics (0.2862–0.8753) and in the TRIPOD scores (33%69–81%66) at the model level. When considering the available information about reporting quality and discriminative ability in relation to each other, it should be noted that the linear correlation between c-statistic and TRIPOD score at model level was not statistically significant (r=−0.241, p=0.15). Therefore, an independent evaluation of both aspects for the selection of an appropriate predictive model is recommended.

Clinicians and decision makers should use predictive models with good discriminative ability (ie, c-statistic above 0.7) and sufficient data availability. Especially predictive models that are based on census data61 65 66 68 or manual data entry (eg, written discharge documentation22) may be more difficult to implement than models relying on centralised databases31 53 54 56–60 69–78. The TRIPOD score at the predictive model level (see table 4) can be used as a first indicator if the predictive model can be assessed and implemented with the given information.

Similar to a previous systematic review,3 comorbidity and (postoperative) length of stay were identified as consistently cited risk factors across the included studies. In addition, illness severity was one main risk factor among all three health condition groups. For surgical condition-related UHR, the principal procedure has been shown to be crucial as a risk factor. The practical application of risk factors should be made with caution because risk factors are often inconsistently defined across studies. Therefore, knowledge about study-related predictor definitions is required before application.

Limitations

This systematic review has certain limitations:

  1. The studies included needed be to published in English or German with full-text access.

  2. Summarising the results of the included studies quantitatively was not possible due to the heterogeneity of the predictive models (resulting from differences in sample sizes, the examined variables or variations in the periods of data collection).

  3. The sample size of the included studies was reported in different units (eg, encounters and discharges), impeding the comparisons of UHR rates.

  4. Our assignment of the predictive models that are potentially applicable at discharge assumes that the required variables are available at the time point. If clinicians and other stakeholders decide to use a predictive model, it should be checked beforehand whether complete data collection is possible at the desired time.

  5. In addition to the identified medical risk factors (eg, comorbidity) and several country-specific risk factors (eg, location of residence) that result in paediatric readmissions, health-policy initiatives may also affect the readmission rates in paediatric clinical practice79. However, due to a lack of data, these aspects could not be captured by this review.

Future research

This systematic review did not identify predictive models for individualised risk prediction of potentially preventable UHRs in paediatrics, emphasising past discussions to expand the research field further.3

Current external validation studies were conducted in the USA and examined the applicability of existing predictive models with other outcomes or population backgrounds to paediatric 30-day UHRs.62 65 69 Therefore, external validation studies are needed for those models that are explicitly developed to predict 30-day UHRs in paediatrics. Because the number of predictive models related to medical condition-related UHRs was small (n=7)55 62 66 71 72 76, with 4 out of 7 models demonstrating a c-statistic below 0.762 66 76, there is a need for high-quality models in this area.

Non-regression-based techniques (eg, machine learning) are an increasing field in order to predict 30-day HRs in paediatrics, most of which show good discriminative ability21 22 47 49–52 69 (see online supplemental table A4 in the online supplemental material). Future systematic reviews should summarise and critically assess existing non-regression-based HR predictive models in paediatrics, for instance, by applying the TRIPOD-ML statement that is going to be published.80

Existing studies discuss the benefit of shorter time intervals in order to identify preventable readmissions more accurately6 81; one study concluded that a 30-day UHR metric was more precise (c-statistic=0.799) for paediatric trauma patients than a 7-day UHR metric (c-statistic=0.737).70 To our knowledge, there is one predictive model for 365-day7, 3 for 90-day59 67 75 and one for 7-day70 UHRs in paediatrics with good discriminative ability (c-statistic>0.7). Future studies should address the evaluation of paediatric UHR predictive models with different time intervals.

Conclusion

This systematic review revealed an increase in the development of predictive models for 30-day UHRs in paediatrics in recent years. To support the implementation of the predictive models in the long term, it is essential to validate existing models in order to test their applicability in different settings. To increase accessibility for use, more attention should be given on completeness in reporting, particularly for items that may be relevant for the implementation of paediatric 30-day UHR predictive models in practice (ie, those relating to outcome and predictor definitions, handling of missing values, full predictive model presentation and an explanation for its use).

Data availability statement

Data are available upon reasonable request. Additional information, including the protocol, is available from the corresponding author.

Ethics statements

Patient consent for publication

Ethics approval

This study does not involve human participants.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors IMN conceptualised and designed the systematic review, participated in the literature search, study selection, quality assessment, data extraction and data analyses, and drafted the initial manuscript. NK contributed to the literature search, study selection, quality assessment and data extraction, and critically reviewed the manuscript. SS contributed to the data analysis and critically reviewed the manuscript. JD contributed to the study selection, data extraction and data analysis, and critically reviewed the manuscript. DM conceptualised and designed the systematic review, participated in the study selection, quality assessment, data extraction and data analyses, and critically reviewed the manuscript. All authors approved the final manuscript for submission and agreed to be accountable for all aspects of the work. IMN is the guarantor of the study.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.