Objectives The primary objective was to assess the utility of the number needed to treat (NNT) to inform decision-making in the context of paediatric oncology and to calculate the NNT in all superiority, parallel, paediatric haematological cancer, randomised controlled trials (RCTs), with a comparison to the threshold NNT as a measure of clinical significance.
Design Systematic review
Data sources MEDLINE, EMBASE and the Cochrane Childhood Cancer Group Specialized Register through CENTRAL from inception to August 2018.
Eligibility criteria for selecting studies Superiority, parallel RCTs of haematological malignancy treatments in paediatric patients that assessed an outcome related to survival, relapse or remission; reported a sample size calculation with a delta value to allow for calculation of the threshold NNT, and that included parameters required to calculate the NNT and associated CI.
Results A total of 43 RCTs were included, representing 45 randomised questions, of which none reported the NNT. Among acute lymphoblastic leukaemia (ALL) RCTs, 29.2% (7/24) of randomised questions were found to have a NNT corresponding to benefit, in comparison to acute myeloid leukaemia (ALM) RCTs with 50% (3/6), and none in lymphoma RCTs (0/13). Only 28.6% (2/7) and 33.3% (1/3) had a NNT that was less than the threshold NNT for ALL and AML, respectively. Of these, 100% (2/2 ALL and 1/1 AML) were determined to be possibly clinically significant.
Conclusions We recommend that decision-makers in paediatric oncology use the NNT and associated confidence limits as a supportive tool to evaluate evidence from RCTs while placing careful attention to the inherent limitations of this measure.
- paediatric oncology
- numbers needed to treat
- clinical trials
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
The utility of the number needed to treat (NNT) was evaluated in all superiority, parallel group, paediatric haematological randomised controlled trials (RCTs) published from inception to August 2018, wherein relapse, remission or survival was assessed.
The visualisation, in the form of a forest plot, of the relationship between NNT, CIs and the threshold NNT of all included studies provides a clinically relevant example of communicating complex information.
A number of RCTs were excluded from this review due to reporting that precluded calculating the NNT.
The delta value in the sample size calculation was assumed to be the absolute difference that would provide a clinically significant effect size and a proxy for the threshold NNT. This assumption, thus would lead to the possibility of effect sizes being chosen that might be more reflective of feasibility than clinical benefit and, therefore, limits generalisability, as this is not a universally recognised approach.
The proposed method implies that the threshold NNT is equivalent to the threshold absolute risk reduction (ARR) even though the NNT results in a transformation of scale and is expressed using a unit measured in patients. Therefore, a threshold ARR may not correspond to a minimal clinically important difference in terms of the NNT.
Cancer in children is exceedingly rare and consists of less than 1% of all cancers diagnosed in Canada, with haematological cancers accounting for approximately 40% of cases.1 Paediatric haematological cancer survival rates are currently upwards of 80%, largely as a result of treatment advances evaluated through randomised controlled trials (RCTs).2 Owing to the relative rarity of paediatric haematological cancers, multicentre international trials have been necessary to conduct adequately powered treatment investigations.1 3 However, even with coordinated resource-intensive efforts, it can take 5–7 years to complete a phase III RCT, and another 5 years to publish outcomes with meaningful follow-up.2 There is also an additional time lag before high-level evidence becomes the standard of care.2
Given the lengthy timeline from research to practice, evaluating evidence arising from RCTs published in the paediatric oncology literature is critical for informing subsequent RCTs and standard of care. In other treatment contexts, the number needed to treat (NNT) has proven to be of value in assisting clinicians to assess therapeutic interventions and act as a supportive tool in benefit–risk assessments as well as formulary decision-making.4–8 The NNT is an absolute effect measure coined almost 30 years ago, defined as the ’number of patients needed to be treated with one therapy versus another for one patient to encounter an additional outcome of interest within a defined period of time'.6 9 10 The NNT corresponds to the inverse of the absolute risk reduction (ARR), which is the absolute difference between the experimental and control estimates, for a specific time point. For example, an RCT comparing the effect of the medication strontium ranelate to a placebo on the incidence of vertebral fractures at 3 years in women with postmenopausal osteoporosis found that the event rate in the strontium ranelate group was 20.9% compared with 32.8% in the placebo.11 The inverse of the absolute difference in event rates between the experimental and control groups corresponds to the NNT, such that in this study, ’nine patients would need to be treated for 3 years with strontium ranelate in order to prevent one patient from having a vertebral fracture (95 percent CI, 6 to 14)'.11 The evaluation of evidence requires, at a minimum, consideration of the absolute risk and relative benefits (and harms) related to a therapy in question, with the NNT being a supportive tool do so.12 Despite the usefulness of the NNT and the Consolidated Standard of Reporting Trials (CONSORT) statement, which considers the NNT as a helpful tool, recent research suggests that these measures are rarely reported in the literature.6 13–16
At this time, the utility of the NNT to support evidence-based practice in paediatric oncology treatment trials remains unexamined, as does the degree to which the NNT has been reported in the paediatric oncology literature. We specifically aimed to assess the utility of the NNT with consideration of a threshold NNT, which is the point where the therapeutic benefit equals the therapeutic risk.17 The threshold NNT should correspond to the inverse of the ARR that an RCT is designed to detect and a clinically significant effect size that would lead to a clinical practice change. Therefore, a decision to administer a therapeutic intervention over the standard of care should occur when the NNT is less than the threshold NNT.17 The primary study objective was to assess the utility of the NNT in paediatric haematological cancer, by calculating the NNT in all superiority, parallel RCTs assessing treatment-related survival, relapse or remission, and comparing the NNT to the threshold NNT. A secondary study objective was to assess the proportion of published studies (specifically randomised questions) that reported the NNT.
This systematic review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (online supplementary file).18 This review consisted of a subset of studies from a previous systematic review conducted by our research team, which was conducted from inception of the databases searched to July 2016. The search strategy used in that systematic review was re-run to capture studies published from July 2016 to August 2018. Methods describing the search strategy, eligibility criteria, study identification and data extraction for our previous systematic review have been detailed in the protocol (online supplementary file–appendix A).
Supplementary file 1
Supplementary file 2
Search strategy and study inclusion
A comprehensive literature review was performed using the databases MEDLINE (Via Ovid), EMBASE (via OVID) and Cochrane Childhood Cancer Group Specialized Register (via CENTRAL) from inception to August 2018 to identify all superiority, parallel group RCTs in paediatric patients diagnosed with a haematological cancer that assessed an outcome related to survival, relapse or remission and those that reported either CIs or standard errors associated with both the experimental and control estimates, or number of patients at risk on a Kaplan–Meier curve. The reference lists of included studies during the full-text review stage were hand-searched to identify any additional studies. The search was restricted to studies published in English and therefore prone to language bias.
Study identification and data extraction
Two investigators (HH and KN) screened the titles and abstracts non-independently to identify studies that fulfilled the study inclusion criteria. Discrepancies were settled by discussion and consensus, with the principal investigator (AFH) available as an adjudicator. Studies that fulfilled the inclusion criterion at the title and abstract screening stage were selected for full-text review by one investigator (HH) to confirm study eligibility. A data extraction template was developed and piloted with 15 included studies to ensure all pertinent data were captured. One investigator (HH) then extracted all of the data, of which a random sample was selected and verified by the principal investigator (AFH) as a quality assurance measure.
The number needed to treat to benefit (NNTB), which corresponds to a positive NNT, or number needed to treat to harm (NNTH), which corresponds to a negative NNT, and associated 95% CI were calculated for each randomised question as per the validated methodology described by Altman & Andersen.19 A randomised question is defined as an intervention comparison assessing a primary outcome for which a sample size calculation is reported. The NNT was based on the primary outcome and time point as specified in the sample size calculation. In the event that the time point specified in the sample size calculation was not reported, the information was inferred if a Kaplan–Meier curve with the number of patients at risk was reported.19 If the aforementioned was not provided, the time point reported in the results was used, and thus, these trials were prone to selective reporting bias. All analyses were conducted based on randomised questions to account for the possibility that an RCT could have more than one parallel group.
The ARR, NNT and delta values (ie, threshold ARR and NNT), as reported in the sample size calculation, were visualised on a forest plot, grouped by disease (acute lymphoblastic leukaemia (ALL), acute myeloid leukaemia (AML), lymphoma and mixed, which corresponds to the inclusion of multiple diseases), to allow for identification of NNTB (defined as the NNT and 95% CI that only included positive numbers), NNTH (defined as the NNT and 95% CI that only included negative numbers) and inconclusive NNT (defined as the NNT where the 95% CI included both a positive number and a negative number). Descriptive statistics were used to summarise the frequency and percentage of randomised questions reporting the NNT, as well as the NNTB, NNTH and inconclusive NNT by disease site.
In order to ascertain whether the NNTB was clinically significant, we calculated the frequency and percentage of randomised questions where the NNT <threshold NNT, NNT >threshold NNT or NNT=threshold NNT. The threshold NNT was considered to be the inverse of the ARR (ie, delta value), as specified in the sample size calculation, and was assumed to correspond to a clinically significant effect size that would lead to a change in clinical practice. The threshold NNT was compared with the treatment NNT and classified as definitely clinically significant, possibly clinical significant, inconclusive clinical significance and definitely not clinically significant as specified in figure 1. These categories, as well as the overall method, were informed by methods described by Man-Son-Hing et al 20 and Guyatt et al.21RCTs where an ARR of zero occurred were excluded from the analysis because the inverse corresponds to an undefined NNT. SAS (Statistical Analysis Software) version 9.4 (SAS Institute, Cary, NC, USA) was used to perform all analyses.
Patient and public involvement
Given this is a research methods systematic review, there was no patient or public involvement.
Our search identified 4151 unique studies from MEDLINE, EMBASE and the Cochrane Childhood Cancer Group Specialized Register accessed through CENTRAL. Following title and abstract screening, 432 studies were evaluated for eligibility based on full-text review. Of these studies, 387 studies were excluded and 43 studies (ie, RCTs), representing 45 randomised questions, were included in the systematic review (figure 2) (online supplementary file–appendix B). The randomised questions corresponded to RCTs investigating treatments for ALL (n=24; 53.3%), lymphoma (n=13; 28.9%), AML (n=6; 13.3%) and mixed diagnoses (n=2; 4.4%).
Number needed to treat
The frequency and proportion of the NNTB, inconclusive NNT and NNTH are summarised in table 1. Approximately 29.2% (7/24) of randomised questions in ALL RCTs were found to have a NNT corresponding to a NNTB, in comparison to AML with 50.0% (3/6). There were no randomised questions in lymphoma (n=15) trials with a NNTB.
Comparison of NNT and threshold NNT
A comparison of the NNT to the threshold NNT is summarised in table 1 and visualised in figure 3. For randomised questions corresponding to NNTB, the NNT was less than the threshold NNT in 28.6% (2/7) ALL and 33.3% (1/3) AML comparisons. However, of these, 100% (2/2 and 1/1) had a lower confidence limit that was greater than or equal to the threshold NNT for ALL and AML, respectively, and hence were possibly clinically significant. In contrast, 71.4% (5/7) and 66.7% (2/6) had a NNT greater than the threshold NNT; however, 80.0% (4/5) and 50.0% (1/2) of these had an upper confidence limit that was less than or equal to the threshold NNT for ALL and AML, respectively, and hence were possibly clinically significant.
Reporting of NNT
There were no randomised questions that reported the NNT to support the reporting of the primary outcome of the study.
In this systematic review, we demonstrated that variation in the NNT exists among RCTs assessing outcomes related to remission, relapse and survival in paediatric haematological cancers. A majority of randomised questions found to have a NNTB were not necessarily associated with a positive effect size when using the inverse of the delta value as specified in the sample size calculation as a proxy for the threshold NNT and a measure of what a clinically significant NNT should be. There were no randomised questions reporting the NNT, which highlights reporting deficits in the paediatric haematological cancer RCT literature.
Strengths and weaknesses
Our review provides a comprehensive analysis of the utility of the NNT through an evaluation of all superiority, parallel group paediatric haematological RCTs assessing relapse, remission and survival from inception to August 2018. We provide the NNT and ARR with its 95% CI along with the threshold NNT and ARR for these RCTs using a validated methodological approach, which will serve as a valuable tool for decision-makers, clinicians and researchers to assess treatment effects. A weakness of this study is the exclusion of a number of RCTs due to reporting that precluded calculating the NNT. However, as the exclusion is due to reporting deficits, this limitation is beyond our control and serves as an important finding that reporting quality is limited in the paediatric haematological cancer RCT literature. An additional weakness is that the delta value in the sample size calculation was assumed to be the absolute difference that would provide an effect size that would lead to a change in clinical practice (ie, minimal clinically important difference), if not explicitly indicated, and a proxy for the threshold ARR and NNT. This assumption, thus, would lead to the possibility of effect sizes being chosen that might be more reflective of study feasibility as opposed to clinical benefit. This approach may be limited in terms of generalisability given that this is not a universally recognised approach. Additionally, this assumption implies that the threshold NNT is equivalent to the threshold ARR even though the NNT results in a transformation of scale and is expressed using a unit measured in patients. Therefore, a threshold ARR may not correspond to a minimal clinically important difference in terms of NNT. However, as there were no studies that reported a threshold NNT, our approach represents a feasible method to apply in the absence of a reported threshold NNT. This method is nonetheless not validated and further studies will need to be undertaken to compare whether researchers would equate the minimal clinical important difference in terms of ARR to the NNT.
Comparison with existing literature
Considerable published literature has evaluated the utility of the NNT. The overarching conclusion is that the NNT is a metric of value in clinical, health policy and formulary decision-making when interpreted correctly.4–8 However, the NNT and ARR are rarely reported or poorly reported in the literature despite being recommended as a helpful tool in the CONSORT statement and are often calculated using inappropriate methods.6 12–16 22–27 Our findings corroborate the existing literature because no studies reported the NNT in our review. Previous studies have not highlighted the utility of the NNT specifically in the paediatric oncology literature or evaluated the clinical significance of the NNT using the approach described in our study and thus, our study is a novel and important addition to the literature.
Study explanations and implications
Our study quantified the NNT as a means to better understand the utility of this tool to facilitate decision-making in paediatric oncology. The NNT allows for an intuitive understanding of the absolute effect size in terms of patients and can help considerably when comparing one treatment to another, after ensuring baseline characteristics, the outcome and time point for the patient population of interest are comparable.12 For instance, an RCT conducted by Creutizig et al 28 in paediatric AML patients assessing 5-year event-free survival found a 6.0% (95% CI, 1.3% to 10.7%) absolute increase associated with the experimental treatment (liposomal daunorubic induction) compared with the control treatment (idarubicin induction). The associated NNT corresponded to 17 (95% CI; 75 to 9) or NNTB 17 (95% CI, NNTB 75 to NNTB 9), meaning that it is estimated that by administering the experimental treatment, one extra patient would survive at 5 years for every 17 patients treated (95% CI, NNTB 75 to NNTB 9). Of note, this RCT was powered to detect an absolute increase in 5-year event-free survival of 13% (ie, delta value), which would correspond to a NNTB of 8 (ie, threshold NNT). Although the NNTB is 17, the lower confidence limit is 75 and the upper confidence limit is 9 (a range that does not include 8), which, given the range, would lead one to believe that the effect size does not provide strong enough evidence to change clinical practice. In situations where the lower confidence limit of the NNTB is less than the threshold NNT, one can be more confident that the treatment confers a clinically improved outcome as compared with the control. On the other hand, if the NNTB is less than the threshold NNT and the lower confidence limit is greater than the threshold NNT, one should exercise greater caution in concluding that the effect size is clinically significant (refer to figure 1 for visual). As demonstrated in our study, a forest plot is a convenient method to visualise the relationship between the NNT (and the associated 95% CI), evident in study results, compared with the NNT that the study was designed to detect as a proxy for the threshold NNT and that would be considered clinically significant.
The aforementioned approach is recommended in light of smaller sample sizes that are often attained in paediatric oncology RCTs and rare disease trials in general, as it allows for assessment of the precision of the treatment effect as well as clinical and statistical significance. This was demonstrated in our study where the majority of randomised questions found to have a NNTB had a NNT greater than the threshold NNT, of which the upper confidence limit was less than or equal to the threshold NNT. If these RCTs were designed with higher power, it is possible that definite clinical significance may have been obtained. On the other hand, these findings would not be considered significant based on statistical significance. Since statistical significance does not provide an indication of the size of the treatment effect, one would not be able to discern whether the findings could have possible clinical significance. An assessment of clinical significance, therefore, requires a summary measure be presented with a CI. By presenting a CI, an assessment can be made of both statistical and clinical significance, which can inform clinical decision-making. Interpreting results from RCTs based solely on statistical significance, without taking into consideration clinical significance, can result in misappraisal of evidence. Using the results of our study as an example, we demonstrated that all randomised questions, for which the NNTB was less than threshold NNT, had a lower confidence limit that was equal to, or greater than, the threshold NNT. Although these results were statistically significant, none had definite clinical significance and were only possibly clinically significant. These findings have clinical implications because clinicians often have to make decisions about administering treatments that are not standard of care, and rely on an accurate appraisal of evidence to inform these decisions. Inconclusive evidence, however, does not necessarily infer an ineffective intervention. Rather, inconclusive evidence (when the CI of the NNT crosses infinity as a result of the CI of the ARR crossing 0) infers that the level of clinical significance cannot be determined from the study results. The use of the NNT and the method we describe can be one more tool to support clinical decision-making within this context.
The scenarios where the NNT results in inconclusive evidence is a limitation in the utility of NNT, as discussed by Altman.29 To illustrate, Lange et al 30 assessed 5-year disease-free survival in paediatric AML patients in first remission after intensive chemotherapy, and found a 7.0% (95% CI, −19.8% to 5.8%) absolute decrease associated with the experimental treatment (interleukin-2 infused on days 0–3 and 8–17) compared with the control treatment (no further therapy). The study was powered to detect a 10% difference in 5-year disease-free survival, which was assumed to be the minimal clinical importance difference, and hence, corresponds to a threshold NNTB of 10. The resulting NNT of the RCT was −14 (95% CI, −5 to 17) or a NNTH 14 (95% CI, NNTH 5 to NNTB 17). At first glance, it appears as though the point estimate does not fall within the 95% CI, given the disjointed confidence limits. In other studies wherein the CI traverses both harm and benefit the NNT is reported without the CI.31 In reality, the CI encompasses values from a NNTH of 5 to ∞ and NNTB of 17 to ∞. Plotting the NNT and CI on a forest plot (figure 3) demonstrates that a NNTH of 14 does fall within the interval range and in fact, the interval is continuous. Altman, therefore, recommended presenting the CI of the NNT as the following to emphasise continuity (using results from Lange et al as an example): NNTH 14 (NNTH 5 to ∞ to NNTB 17).
We strongly encourage plotting the ARR and the NNT on a forest plot simultaneously because the NNT is simply a method of re-expressing the ARR and supports the interpretation of the ARR. As the NNT is a relative measure, it should always be accompanied by the absolute measure, the ARR.16 Additionally, the utility of the NNT is inherently reliant on three major areas: baseline risk, the outcome and the time point.12 In order for the NNT from an RCT demonstrating a NNTB to have utility, the patient population of interest should share a similar baseline risk because the desired treatment effect may be overestimated and thus the NNTB may by underestimated. Outcomes related to event-free survival often differ in what is considered an event and thus it is critical to ensure that the NNTB being applied to the population of interest is identical in terms of the outcome in question. Numerous studies have demonstrated how the NNT varies with time and thus, comparability in time points is critical to ensure accurate interpretation of the NNT to a population of interest.4 12 23 24 Lastly, criticisms of the statistical properties of the NNT have been highlighted by Hutton32 33 and Katz et al.34 We agree with Altman & Deeks’s32 response to these criticisms in that the NNT was designed for translation of research results and, therefore, arguments related to computation and its distribution properties are of less relevance. The NNT is simply a metric to re-express the ARR and, therefore, should be viewed as a measure to support the interpretation of the ARR.
We recommend that clinicians and decision-makers in paediatric oncology consider using the NNT as a supportive tool to evaluate evidence from RCTs while paying careful attention to the inherent limitation of this measure. Additionally, we recommend that researchers report the NNT and associated CI to support the interpretation and generalisability of the trial results. Given the inherent limitations of the NNT, we emphasise that the NNT should be considered a supportive tool to inform evidence-based decision-making and not a replacement. Online supplementary file appendix C provides a summary of how the NNT can be calculated and assessed to inform decision-making.19 20
Patient consent for publication Not required.
Contributors AFH, KG and HH conceived and designed the study. HH collected and analysed the data. AFH and HH wrote the first drafts of the manuscript, and all authors contributed to subsequent drafts. All authors had full access to all of the data in the review and take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding Funding support was provided by the University of British Columbia School of Nursing to conduct this systematic review.
Disclaimer The funder played no role in study design, collection, analysis, interpretation of data, writing of the report or in the decision to submit the paper for publication. They accept no responsibility for the contents.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Unpublished data will be made available upon request to the corresponding author.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.