Article Text

Download PDFPDF

Predicting future self-harm or suicide in adolescents: a systematic review of risk assessment scales/tools
  1. Isobel Marion Harris,
  2. Sophie Beese,
  3. David Moore
  1. Institute of Applied Health Research, University of Birmingham, Birmingham, UK
  1. Correspondence to Isobel Marion Harris; i.m.harris{at}bham.ac.uk

Abstract

Objective This systematic review aimed to evaluate the ability of risk tools to predict the future episodes of suicide/self-harm in adolescents.

Design Systematic review.

Data sources MEDLINE, EMBASE, CINAHL and PsycINFO were searched from inception to 3 March 2018.

Eligibility criteria for selecting studies Cohort studies, case–control studies and randomised controlled trials of adolescents aged 10–25 who had undergone risk assessment in a clinical setting following an episode of self-harm were included.

Data extraction and synthesis Two independent reviewers extracted data and assessed risk of bias. Data were grouped by tool and narrative synthesis undertaken, with studies appraised using a checklist combining the QUIPS (Quality In Prognosis Studies) and QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tools.

Results Of the 17 137 articles initially identified, 11 studies evaluating 10 separate tools were included. The studies varied in setting, population and outcome measure. The majority of the studies were rated as having an unclear risk of bias, and meta-analysis was not possible due to high variability between studies.

The ability of the tools to correctly identify those adolescents going on to make a self-harm/suicide attempt ranged from 27% (95% CI 10.7% to 50.2%) to 95.8% (95% CI 78.9% to 99.9%). A variety of metrics were provided for 1–10 points increases in various tools, for example, odds and HRs.

Conclusions This systematic review is the first to explore the use of assessment tools in adolescents. The predictive ability of these tools varies greatly. No single tool is suitable for predicting a higher risk of suicide or self-harm in adolescent populations.

PROSPERO registration number CRD42017058686

  • Suicide & self-harm
  • Child & adolescent psychiatry
  • PSYCHIATRY
  • Prediction
  • Risk assessment

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This systematic review is the first to explore the use of tools to predict future self-harm/future suicide attempts in an adolescent population

  • A checklist combining the QUIPS and QUADAS-2 tools was used to evaluate the quality of the included studies

  • High levels of heterogeneity meant that meta-analysis was not possible

  • Results of the study support other reviews in adult populations and highlights the need for further risk prediction work in this area

Introduction

Self-harm and suicide are serious problems in children, adolescents and young people,1–3 with the highest rates seen in those aged 16–24 years.4 5 The term self-harm can be used to refer to acts of self-poisoning or self-injury (eg, cutting, scratching, breaking bones and burning) carried out intentionally regardless of motive or suicidal intent.6 Repetition of self-harm is common, with 15%–25% of adolescents, treated at hospital for an episode of self-harm returning for treatment within 12 months.2 7

There is a strong association between self-harm and risk of future suicide,6 8 with approximately 50% of adolescents who die by suicide having previously self-harmed,9 and self-harm increasing the risk of death by suicide approximately tenfold.10 The 6-month period, following an episode of self-harm, has been identified as when this risk of death is highest.11 Despite the association, there are important differences between acts of self-harm and suicide attempts. The motivations behind suicide attempts include wanting to end one’s life, whereas motivations behind self-harm are more typically related to attenuating negative emotions and feelings.12 There are also key differences with suicide attempts tending towards high lethality, infrequent occurrence and single method use only, whereas acts of self-harm contrast with single or multiple methods of low lethality and repetition.13

Predicting suicide and repetition of self-harm in adolescents is a challenge as it is usually a secretive and hidden behaviour, and there is no single risk factor for adolescent self-harm.1 14–16 Additionally, in this population, suicide is a rare event with low prevalence (5.8 per 100 000 for the 15–19 years age group, and 9.5 per 100 000 for the 20–24 years age group in the UK in 201517 making risk prediction difficult.

The UK guidelines for the management of self-harm in those aged over 8 years recommend that all patients who present with self-harm should undergo comprehensive psychosocial assessment including assessing the risk of repetition of self-harm or suicide, as well as a full mental health and social needs assessment.18–20 Risk scales/tools (from here on referred to a tools) tend to be a key part of this assessment, however, there is currently only a small amount of evidence available regarding their use and effectiveness, and no guidance as to which to use, or which is best for particular populations or settings.18 21 The UK guidelines suggest that these risk tools should not be used alone to determine future risk or to make a decision on when to offer treatment.18–20 There is a large variation in the type and format of risk tools being used across the UK.22–24 A survey of 32 English hospitals found that over 20 different risk tools were in use, many of these were locally developed, highlighting a lack of consistency of practice.21

The content is variable from tool to tool with some only assessing a few parameters and others assessing a more extensive range. Adult tools may not be appropriate for use with adolescents due to either inappropriate questions being asked, or important areas specific to the age group not being assessed.

Previous systematic reviews of risk assessment tools used to predict future self-harm or suicide following self-harm in an adult population have been conducted,3 25–27 and none were able to conclude that any one tool performed better than another or was more useful for predicting future self-harm or suicide attempts.

While adult populations have been considered, there is currently no systematic review examining risk tools in an adolescent population, a key age group affected by self-harm and suicide. A review is required specifically focusing on this age group to allow greater understanding of the use of risk assessments in this population.

The wide variety in the type of risk tools being used across hospitals and current lack of guidance surrounding their use in particular patient groups and settings warrants further investigation to improve and standardise patient care. This review contributes to the body of evidence for self-harm and suicide prediction and prevention to ensure that informed decisions can be made regarding predication of future risk and future research.

Objectives

This systematic review aimed to evaluate the ability of risk assessment tools to predict future episodes of self-harm or suicide in adolescents and young adults presenting to clinical services with an episode of self-harm or attempted suicide. To achieve this aim, the review asked the following question:

How accurately do risk assessment tools/scales predict which adolescent and young adults will go on to self-harm in the future, make a future suicide attempt or die by suicide?

Methods

Search strategy

The following bibliographic databases were searched from inception to March 2018 with no language restrictions: MEDLINE, EMBASE, CINAHL, PsycINFO and Open Grey. Searches used index and free-text terms related to self-harm, suicide, adolescents and risk assessment. Terms were combined using the appropriate OR and AND operators (see MEDLINE search strategy in online supplementary appendix 1).

Supplemental material

Selection/inclusion criteria

Study design

Prospective and retrospective cohort studies, case–control studies and randomised controlled trials testing assessment tools.

Patient group

Adolescents and young people aged 10–25 who have self-harmed or attempted suicide who have presented to clinical services or have been treated by a clinician following an episode of self-harm or attempted suicide. Studies also containing older populations were included if data for 10–25 years were presented separately. Studies on a wider age range were included if the majority of the sample (>50%) had self-harmed or attempted suicide.

WHO defines adolescence as being between 10 and 19 years28, however, in the literature the upper limit for adolescents can range from 18 to 25 years.1 In an attempt to include all relevant data in this review, reflecting this and that self-harm is rare before the age of 12, a broad range of 11–25 years will be included.29

Interventions

Any risk assessment.

Timing

No restriction was placed on length of follow-up so long as the outcomes occurred after the risk assessment was carried out.

Setting

Studies with the risk assessment carried out within a clinical setting or by a clinician were included. This could have been an inpatient or outpatient facility or as part of a home treatment programme.

Outcome

Self-harm and/or attempted suicide or completed suicide. It was acknowledged that these may be recorded individually or they may be grouped together (eg, repeat self-harm and attempted suicide may have been classed as the same event).

Study selection

Due to the large number of records, 10% of titles and abstracts were initially screened by two reviewers independently using predefined criteria based on the target population and outcome to identify potentially relevant articles. After discussion and consideration of reasons for any disagreements, the remaining titles and abstracts were screened by one reviewer.

The full text of the potentially relevant articles was obtained and assessed against the full inclusion criteria. Endnote V.X7 software (Clarivate Analytics) was used to record study selection decisions, and reasons for exclusion were noted.

Data extraction strategy

Data extracted included: study characteristics (duration, start and end date, country and setting), participant characteristics (number, average age, gender, ethnicity, socioeconomic details and comorbidities), any reported subgroups of participants, tool used, if tool is validated, data collection method, outcome(s) measured (eg, repeat self-harm), method of outcome assessment, total number followed up, lost to follow-up number and reasons, number of events for each outcome and data on measures of association between assessment tool and outcome (eg, relative risk (RR), sensitivity and specificity) along with attendant precision (95% CI, p values) or raw data to calculate these. Unadjusted and adjusted data along with factors adjusted for were recorded. Where data were not reported, the corresponding author of the article was contacted by email. A follow-up email was sent if no response was received.

Quality assessment strategy

Risk of bias assessment of included studies used relevant elements of QUIPS30 31 and QUADAS-2,32 checklists suitable for prognostic and diagnostic tests, respectively. QUIPS covers areas of possible bias in the prognostic factor studies, such as study participation, attrition and outcome measurement, and QUADAS-2 covers areas of possible bias in diagnostic test accuracy studies, such as patient selection, index test and reference standard used. Studies reporting prognostic model development were assessed on a further four criteria identified from examining the quality assessment conducted for a systematic review of prognostic models.33 These criteria include assessment of statistical methods used and methods of dealing with confounding factors.

Study selection, risk of bias assessment and data extraction were undertaken by two reviewers independently, and any disagreements resolved by discussion with referral to a third reviewer if required.

Methods of data analysis/synthesis

A narrative synthesis was carried out presenting information from all included studies to explore similarities, differences and findings within and between them. Where possible studies were grouped and analysed by tool.

Data synthesis

For each tool, outcome (self-harm; suicide) and outcome measure (OR, HR; sensitivity/specificity) grouping, contributing studies and data were assessed for clinical and methodological homogeneity. This informed a decision that meta-analysis was not appropriate, and therefore, data have been presented narratively.

Patient and public involvement

No patients or public were involved in this review.

Results

Search results

A total of 17 137 potentially relevant records were identified through the literature search. After removal of duplicate records and screening for relevance to the review, inclusion criteria were applied to 103 full-text articles. Eleven studies34–44 were included in the review and 92 excluded due to not meeting criteria for population (37 articles), study design (25), outcome (7), use of an assessment tool (15) and separately presenting data on an adolescent subgroup (8). Two of these last eight studies presented baseline, but not outcome, data for an adolescent subgroup so the authors were contacted to try to obtain these data. No response was received so these papers remain excluded (see figure 1).

Figure 1

PRISMA flow diagram detailing the search process for included studies. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Study characteristics

The 11 included studies comprised 8 prospective35 36 38 40–44 and 3 retrospective34 37 39 cohort designs and evaluated 10 tools (table 1); eight tools were specifically for self-harm/suicide assessment and two were for hopelessness and depression. Four studies attempted to develop a prediction model.36 37 41 44 No trials evaluating the impact of using a tool were identified.

Table 1

Methodological characteristics and outcomes of included studies

A total of 2554 participants were included, ranging from 10 to 24 years of age who were followed up for between 3 and 18 months. Three studies contained participants that had all presented with self-harm/suicide attempt,34 36 43 and the remainder had mixed populations where >50% presented with self-harm/suicide attempt.35 37–42 44 The proportion of each study population that presented with self-harm/suicide attempt is presented in table 1. As a result of the studies with mixed populations, 1818 of the total 2554 participants presented with self-harm/suicide.

The studies were carried out across a variety of settings in the UK36 42 and USA.34 35 37–41 43 44 Four took place in an emergency department,34 37–39 four in inpatient units,35 40 41 44 one each in an open treatment trial,43 a home setting36 and clinic sessions for a mixed inpatient/outpatient population.42

The 10 tools varied in length from three questions (Self-Assessed Expectation of Suicide Risk Scale)37 to 93 questions (Self-Injurious Thoughts and Behaviours Interview, SITBI)35; the latter being unusually long as all other tools evaluated had ≤30 questions. The content assessed by the tools greatly varied. For example, eight of the tools asked about suicidal ideation, however, only two asked about previous self-harm. For full details of the tool content and length, please see the data presented in online supplementary appendix 2.

Across the studies, a selection of metrics to report outcome data were used. These include OR (adjusted and unadjusted), RR, HR (adjusted and unadjusted), and predictive validity statistics such as sensitivity and specificity. Online supplementary figure 1 details the full range of outcome measures, subgroups reported and factors adjusted for (if relevant) for each tool across all the studies.

Quality assessment

In general, reporting of the studies could have been better. For example, completeness of follow-up was rarely reported, and neither were details regarding whether the method of identifying the outcome (self harm/suicide attempt) was interpreted without knowledge risk assessment tool findings.

Only three of the studies34 36 43 comprised a population where all patients presented with an episode of self-harm or a suicide attempt. In the remaining studies,35 37–42 44 the proportion ranged from 51% to 79% but data were not separately reported for this group in these mixed population studies.

The four studies reporting prognostic models varied in quality.36 37 41 44 All four were at the model development stage and only one of these was assessed as having a low risk of bias due to under reporting of details in the others.

Table 2 details the quality assessment undertaken on all 11 included studies.

Table 2

Quality assessment of included studies

Findings

The findings from all included studies are presented by outcome measure (HR, OR and area under the curve (AUC) from receiver operator characteristic curves) in table 3 and predictive validity statistics (sensitivity, specificity, positive predictor value and negative predictor value) in table 4. Sensitivity can be defined as the proportion of positives that are correctly identified as positive, and specificity as the proportion of negatives that are correctly identified as negative. Positive predictive value (PPV) tells you the proportion of positive test results that truly are positive, and negative predictive value (NPV) tells you the proportion of test results that are truly negative. PPV and NPV, unlike sensitivity and specificity, are dependent on disease prevalence, so careful consideration of the disease prevalence must be made when applying PPV and NPV values to a different clinical population.45

Table 3

HR, OR, AUC results

Table 4

Predictive validity results

Predicting future self-harm

Three studies, assessing four tools, contributed data for predicting future self-harm.

The Self-Injury Implicit Association Test (SI-IAT)35 scores were not statistically significantly predictive of repeat self-harm at 3 months, (unadjusted OR 3.10, 95% CI 0·39 to 9.94, p≥0·05).

The SITBI35 scores were found to be statistically significantly predictive of repeat self-harm at 3 months follow-up (adjusted OR 1.82, 95% CI 1.25 to 2.65, p=0.002). This study (evaluating both SI-IAT and SITBI) was assessed as having a high risk of bias, mostly due to poor reporting.

At 6 months follow-up, the accuracy of the Suicide Ideation Questionnaire (SIQ)36 to classify a patient as high or low risk for self-harm repetition was: sensitivity 27.3%, specificity 99.2%, PPV 85.7% and NPV 85.6%. This tool performed poorly at identifying high-risk patients, but performed well at identifying low-risk patients.

The predictive validity of the Self-Harm Questionnaire (SHQ)42 was reported at 3 months. This was: sensitivity 94.7%, specificity 34.6%, PPV 25.4% and NPV 96.6%. This tool performed well at identifying high-risk patients, but performed poorly at identifying low-risk patients.

Predicting future suicide attempt

Eight studies reporting seven tools for predicting future suicide attempt.

The predictive ability of the Ask Suicide Screening Questions (ASQ)34 was reported at 6 months as: sensitivity 95.8%, specificity 5.8%, PPV 16.8% and NPV 87.5%.

The SIQ-Junior (SIQ-JR) was evaluated in two studies, one reported RR40 and the other reported adjusted and unadjusted HRs.41 For every 1-point increase in SIQ-JR score, the RR of no future attempt was 0.93.40 No CIs or p value were stated. For every 10-point increase in the SIQ-JR score, the unadjusted HR of future suicide attempt was 1.30 (95% CI 1.14 to 1.48, p≤0.001).41 A subsequent multivariate regression model then reported an adjusted HR of 1.23 (95% CI 1.08 to 1.40, p=0.003).

The Columbia-Suicide Severity Rating Scale (C-SSRS) was evaluated by four studies,37–39 43 reporting adjusted and unadjusted ORs, adjusted HR and AUC. Three of these studies reported outcomes for the C-SSRS, and the two subscales (severity and intensity) that the C-SSRS comprises.

The unadjusted OR ranged from 1.09 (95% CI 1.01 to 1.17) to 3.85 (95% CI 1.07 to 13.86) for every 1-point increase in C-SSRS score, and the adjusted OR ranged from 1.15 (95% CI 1.03 to 1.29) to 1.51 (95% CI 1.24 to 1.84) for every 1-point increase in C-SSRS score.

For the SIQ, unadjusted and adjusted HRs were reported.44 In the univariate regression conducted, a statistically significant HR of 1.01 (95% CI 1.00 to 1.02, p≤0.05) for the dichotomised (high/low suicidal intent) SIQ score was reported, however, after the multivariate analysis, despite the HR being the same at 1.01 (95% CI 1.00 to 1.02, p≥0.05), it was no longer statistically significant.

For the Beck Hopelessness Scale,41 an unadjusted HR of 1.51 (95% CI 1.22 to 1.87, p≤0.001) was reported for a 5-point increase in the scale.

For the Children’s Depression Rating Scale-Revised41, an unadjusted HR of 1.29 (95% CI 1.10 to 1.52, p=0.002) for a 10-point increase in the scale was reported.

Discussion

Main findings

This systematic review identified 11 studies evaluating 10 tools for the prediction of future self-harm/suicide attempt in adolescent patients. Of these 11 studies, four attempted to develop a predictive model using the tools, while seven evaluated the tools as standalone instruments.

This review found there is a wide variation in the setting, population, outcome measure, length of follow-up and length and content of the tools in use. The variation was apparent in the predictive ability reported across the tools. The studies reporting predictive validity statistics varied greatly in results, for example, the SHQ had high sensitivity (94.7%) and low positive predictor value (25.4%),42 whereas the SIQ had low sensitivity (27.3%) and high positive predictor value (85.7%).36 It could be argued that a tool with higher sensitivity would be best suited for use in this population, to ensure that all those at risk are identified and offered treatment. However, this would cause issues with overtreatment and not be the best use of resources as the high sensitivity comes with low positive predictor values.

This finding of variation in predictive ability implies these tools are not sufficiently accurate to be used as standalone predictors of future risk of self-harm/suicide in adolescent populations, mirroring the findings of other reviews in adult populations.3 25–27 This adolescent review and the other adult population reviews demonstrate there is insufficient evidence for use of such tools as the only component of a risk assessment, despite their widespread use for this purpose in clinical practice.21

Methodological limitations

Due to the high heterogeneity between included studies, specifically in terms of setting, population and length and content of tools used, no meta-analyses were conducted. Despite this, a wide range of data has been presented for the studies. Data were presented for each tool by outcome, and then compared by outcome measure reported where possible.

Also, use of the tools in differing populations and use of different outcome measures made comparing results impossible in studies reporting outcome measures such as HR and OR. For example, Horwitz et al 39 evaluated the severity subscale of the C-SSRS, as did Posner et al,43 however, Horwitz used the subscale on a continuous basis, whereas Posner dichotomised the scores, meaning they were not directly comparable. The reporting of results using different measures is a known difficulty commonly encountered in systematic reviews evaluating diagnostic/predictive methods.46 Very few studies reported all the metrics required for risk prediction.

Despite these limitations, the review has shown that multiple scales are in use for adolescent patients and that there is lack of consistency of practice. Additionally, the evidence from this review demonstrates that tools for adolescents have the same issues as tools for adults in terms of varying greatly in length and content.22–24 47–50

Implications of evidence

There are difficulties in drawing conclusions regarding the clinical significance of results and implications of the evidence from this review. First, the outcome data presented for each tool does not clearly translate into individual patient consequence or benefit.

Second, the very different settings the tools were evaluated in (eg, inpatient units, home treatment settings and emergency departments) means that results would not be transferable between these settings and the differing clinical populations within.

Third, some of the tools used and models developed are not practical for routine clinical use, particularly in an emergency department setting. For example, the model developed by Chitsabesan et al 36 requires four separate tools to be administered to both parent and child, and the subsequent scores to be used within a formula. This would clearly require refinement as it is currently a complex and time consuming process.

While the main aim of the review was to evaluate the ability of risk assessment tools to predict future episodes of suicide or self-harm, several other issues were identified that merit further investigation.

First, as seen in online supplemental appendix 2, the tools specific to self-harm/suicide vary greatly in content assessed. For example, seven out of these eight tools assess suicidal ideation, however, only two assess previous self-harm. As there is known to be a strong association between suicide and previous self-harm,6 8 9 it seems surprising that only a minority of tools assess this important risk factor. Further primary research evaluating tool content and identifying predictive risk factors would be beneficial. Currently, available evidence is limited by dataset size and the populations included, for example, validation of tools on community populations rather than clinical ones.

Related to this is how the answers to the questions asked by the tools are used in predicative analysis. For example, the SHQ evaluated by Ougrin and Boege,42 consists of three initial screening questions that everyone is asked establishing the presence of self-harm/suicidal ideation, and then a further 12 questions exploring self-harm for those who give a positive response to the initial three. The analysis conducted by Ougrin and Boege, however, only used the first three questions to classify patients into positive/negative, and the second section of 12 questions was not used. As a result, the basis of classification in this study was effectively history of self-harm/suicidal ideation, rather than the use of the tool’s questions about self-harm to stratify risk and predict which patients would go on to repeat self-harm. Further analysis of this tool with regard to the questions asked in each section would be beneficial to explore the predicative ability of this additional data collected.

Second, no distinction was found about whether the tools in use were for patients presenting with self-harm/suicide attempt for the first time, or those presenting with repeat events. As some tools asked about previous self-harm/suicide attempts, this was perhaps being taken into account, but research exploring whether these patient groups have the same risk factors and how previous events can modify them, would be useful to understand further how to predict and prevent this behaviour, and determine whether separate assessments would be appropriate.

Third, Czyz et al 37 explored self-rated suicide risk, as opposed to that rated through tools. They found that asking adolescents to self-rate their risk of suicide was more accurately predictive of future suicide attempts, and this was of particular relevance in busy emergency departments with time pressures. This was the only study exploring self-rated risks, however, and more primary studies are required to explore further the use of self-rating in predicting future risk. This could have important implications for changing practice with regard to how suicide risk is assessed.

Fourth, this review attempted to explore the use of prognostic models in self-harm/suicide prediction, however, due to poor reporting and models only being in the early stages of development, this was not really possible in any great depth. The four studies that reported model development were difficult to understand from a modelling point of view. Methods were not explicitly stated making it hard to comprehend how analyses had been conducted, confounding factors were either not identified or methods for addressing them were not stated, and lack of full reporting meant that carrying out a detailed quality appraisal was not possible. Future work should build on these early modelling studies to further develop them to the later stages of internal and external validation and impact studies, and it is imperative that future studies should be reported fully.

Conclusion

This review is the first to explore the use of tools to predict future self-harm/suicide attempts in an adolescent population. It has shown that the current limited amount of primary evidence means at present no individual tool can be identified as performing better than another or is a sufficiently accurate predictor of self-harm/suicide risk. The use of tools in prediction models is at an early stage and further research is warranted to develop these further. Risk assessment tools should only be used as part of a wider comprehensive assessment, assessing more than just the risk of repetition. This reflects current standard practice in the UK, where National Institute for Health and Care Excellence guidelines recommend thorough psychosocial needs assessment focused on patient needs rather than risk.18–20

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.

Footnotes

  • Contributors IH and DM designed the study. All authors oversaw its implementation. IH and SB coordinated and did all review activities, including searches, study selection, data extraction and quality assessment. IH and DM planned the analyses and IH conducted these. IH wrote the initial draft and DM and SB contributed writing to subsequent versions of the manuscript. All authors reviewed the study findings and read and approved the final version before submission.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.