Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology is used to assess and report certainty of evidence and strength of recommendations. This GRADE concept article is not GRADE guidance but introduces certainty of net benefit, defined as the certainty that the balance between desirable and undesirable health effects is favourable. Determining certainty of net benefit requires considering certainty of effect estimates, the expected importance of outcomes and variability in importance, and the interaction of these concepts. Certainty of net harm is the certainty that the net effect is unfavourable. Guideline panels using or testing this approach might limit strong recommendations to actions with a high certainty of net benefit or against actions with a moderate or high certainty of net harm. Recommendations may differ in direction or strength from that suggested by the certainty of net benefit or harm when influenced by cost, equity, acceptability or feasibility.

The Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group has designed a transparent approach to rating certainty of evidence and grading strength of recommendations.

In the context of making recommendations, GRADE specifies that ratings reflect the certainty that the estimates of an effect are adequate to support a particular decision or recommendation.

Depending on the thresholds or ranges chosen, it is possible to have high certainty in the evidence for a set of outcomes related to a particular decision, yet uncertainty whether the evidence is adequate to support that decision; this will occur when desirable and undesirable consequences are closely balanced, such as cancer treatments with high certainty in prolonging survival and high certainty in serious toxicity.

The recent GRADE Working Group guidance states that systematic review authors and guideline panellists will ideally specify the threshold or ranges they are using when rating the certainty in evidence.

Guideline panels using fully contextualised approaches have faced challenges of balancing feasibility and simplicity with comprehensive simultaneous consideration of all important outcomes. This current GRADE concept article introduces an approach for guideline panels to more directly and explicitly rate their certainty of the balance of benefits and harms. This GRADE concept article (a new form of communication from the GRADE Working Group) is presented to stimulate discussion and does not constitute GRADE guidance.

GRADE evidence-to-decision frameworks explicitly identify the following considerations in determining the direction and strength of recommendations:

Certainty of evidence (regarding effect estimates for health effects),

Relative importance of outcomes (also called values and preferences),

Balance of benefits and harms,

Resource use (cost),

Cost–benefit ratio (Are incremental health benefits worth the costs?),

Equity,

Acceptability,

Feasibility.

Health-related harms include pain or disability but also burdens that lower quality of life. For example, the burden of receiving an intervention that requires being immobile for long periods of time could be considered as a health-related harm. In this article, when we use the phrase ‘balance of benefits and harms’ we refer to the ‘balance of benefits versus harms and burdens’. Other burdens that may be considered more societal in nature may be considered through other criteria in the framework (cost, acceptability, feasibility) depending on the perspective taken such as that of the healthcare system, the population or the individual. Here, we will use the term ‘harms’ to refer to ‘health-related harms and burdens’.

Ideally, guideline panels consider all the factors listed above when determining the direction and strength of a recommendation. The process may proceed in progressive steps that consider first benefits and harms to generate certainty in net benefit; then costs to generate certainty in a cost–benefit ratio and then equity, acceptability and equity to address certainty in a recommendation if relevant (

Certainty across the evidence-to-decision framework*.

Although it makes decisions more transparent, reporting a guideline panel’s certainty for each of these concepts may be overwhelming for guideline users seeking simple explanations of the rationale and certainty for recommendations. Among the concepts for which certainty can be expressed formally, the certainty in balance of benefits and harms (net effect) may be most relevant to patients and clinicians (often the primary target users for guidelines). Additional criteria that may influence a recommendation (cost, cost-benefit ratio, equity, acceptability, feasibility) are more likely to vary across social groups and contexts, and population-based ratings may be of less interest to patients and clinicians working together to make individual healthcare decisions.

Consistent with the recent clarification of ‘certainty of evidence’—the certainty that a true effect lies within a specified range or on one side of a specified threshold

Expressing the certainty of net benefit for guideline users provides the most direct summary representation of the extent of our confidence that the estimates of effects are adequate to support a particular decision or recommendation. The US Preventive Services Task Force has used the term certainty of net benefit in a manner consistent with this conceptual framework.

Determining the certainty in the balance of benefits and harms involves generating a net effect estimate (a way of specifying the balance of benefits and harms) and then rating the certainty regarding that net effect in relation to the threshold of net benefit=0 (

A stepwise approach to determining the certainty of the net effect estimate.

Decision analysis provides a statistical method for generating the net effect estimate. Decision modelling has evolved over the years and sophisticated models include multiple outcomes, the varying times at which each outcome can occur, the relative importance placed in each outcome (often using utilities or quality-adjusted life-years) and future decisions and resulting outcomes. Guideline panels sometimes use decision analysis to evaluate a chain of possible consequences and decisions to inform their recommendations: the UK National Institute for Health and Care Excellence relies heavily on such models. Decision analysis often involves modelling cost-effectiveness or an assessment of net effect across a range of possible scenarios. Determining the certainty of evidence emerging from such models is itself a complex matter: a GRADE project group is currently addressing the issue.

For many decisions for which guideline developers, clinicians or patients desire recommendations, however, one need not consider a chain of subsequent decisions. Many guideline recommendations are binary and are based on the evidence limited to that decision. In such cases, one can perform a much simpler decision analysis without requiring participation of a skilled modeller. Simple models can generate confidence intervals (CIs) for a net effect estimate (a composite of individual effect estimates) given the following assumptions (described further in the online

Effect estimates represent data conforming to normal distributions.

Effect estimates to be combined are independent and not correlated with each other.

Effect estimates to be combined can be multiplied by a conversion factor to use a consistent unit of measure.

Given that the second assumption is often unlikely to hold, the analyst can perform sensitivity analysis of the net effect estimate to determine robustness to changes in the individual effect estimates, the assumptions of correlation between effect estimates and the conversion factors. A sensitivity analysis defining the likelihood of the net effect estimate remaining favourable across the range of assumptions determines the certainty of net benefit.

Here we describe the methods for generating the net effect estimate as presented in

We assume reviewers have already identified the important outcomes for their systematic review of the available evidence; methods for this outcome selection have been reported.

Including both a composite outcome and one or more components of that outcome is problematic. For example, it would be inappropriate to include all-cause mortality and cardiovascular mortality in the same model. One may choose to use only the composite outcome (eg, all-cause mortality) or to use only the component outcomes (eg, cardiovascular mortality, cancer mortality and mortality from causes other than cancer or cardiovascular disease).

If effect estimates are not available in absolute terms (or if effect estimates are being extrapolated to a population with different baseline risks than that used for the absolute effect estimates), then absolute effect estimates may be derived using a combination of relative effect estimates and baseline risk estimates.

Quantitative estimates of relative importance for each outcome will serve as a conversion factor to use a consistent unit of measure for the net effect estimate. These estimates need to be meaningful as a multiplier or represent a quantitative measure of importance relative to a reference standard. Guideline panels that use a qualitative 9-point rating of importance of outcomes

A simple approach is to select one outcome as a reference outcome and define a relative importance adjustment (ie, a multiplier) for each other outcome as a modifier to apply to effect estimates. In making individual patient-specific decisions, one could enter the quantitative estimates of relative importance for the individual patient and derive an individualised estimate of net effect. With further development, this approach could inform shared decision-making for individual patients.

For groups of patients, one could consider quantitative estimates of relative importance as ranges. In making population-specific recommendations, one could use a range of relative importance estimates considered reasonable to capture most members of the population and check for robustness of estimates of net effect across the range of relative importance. One would then lower the rating of certainty of net benefit if the estimate of net effect crosses to net harm within the range of relative importance. The later discussion of sensitivity analysis for the net effect estimate (step 6) will address the concepts of ranges and certainties of relative importance.

Methods to determine quantitative estimates of relative importance from a patient perspective include discrete-choice experiments,

If the outcomes to be combined include both continuous measures and dichotomous measures, the assignment of relative importance becomes more complicated and would take additional methods to reach a shared unit of measure (such as conversion to quality-adjusted life-year estimates). Utilities reported for decision analyses may be convertible to relative importance of outcomes. However, utilities are often reported with a range from 0 (for death or worst outcome) to 1 (for optimal quality of life or best outcome), and relative importance of outcomes functioning as multipliers would not be meaningful if multiplied by 0. Relative importance of outcome estimates equal to 1 minus the utility could convert utilities to meaningful multipliers.

For each effect estimate, one can multiply the point estimate and CIs by the relative importance for the outcome, and then present the importance-adjusted effect estimate in positive or negative terms to correspond to benefits or harms in the direction of effect.

Adding together the point estimates for each importance-adjusted effect estimate will provide the point estimate for the net effect. Statistical formulas allow calculation of the 95% CI for the net effect (see online

Precision becomes meaningful with contextual anchoring. Reporting results with a 3-cm range would be overly precise for planning travel by car and unacceptable imprecision for some types of surgery. To express the certainty in the balance of benefits and harms, we need to specify a threshold for a net benefit, then express the certainty that the net effect lies on one side of this threshold.

Guideline panels may specify the threshold of net effect; we suggest using the ‘zero effect’ for simplicity. Guideline panels that formally evaluate cost-effectiveness already use a method to set a value threshold for the quantity of net benefit that is considered worth the cost to achieve it.

If the entire CI does not cross zero, then the precision of the net effect estimate is sufficient to not rate down the certainty of net benefit for imprecision. One must still consider other factors affecting certainty that are more difficult to quantify (risk of bias, inconsistency, indirectness and publication bias) and the plausible range of relative importance of outcomes before final determination of the certainty of net benefit.

If the CI includes zero effect and thus the range of net effect estimates includes both net benefit and net harm, the guideline panel will rate down the certainty of net benefit. The greater the extent of overlap of the CI with both benefit and harm, the lower the certainty in the net benefit.

Classification of precision of net effect estimate

Pattern of net effect estimate | Classification | Precision of net effect estimate is consistent with … |

Entire CI is beneficial | Net benefit | High certainty of net benefit |

Point estimate is beneficial, lower bound of CI is harmful and point estimate has larger absolute value than lower bound of CI | Likely net benefit | Moderate certainty of net benefit |

Point estimate is beneficial, lower bound of CI is harmful and point estimate has smaller absolute value than lower bound of CI | Possible net benefit | Low certainty of net benefit |

Point estimate is close to zero, wide CI | Possibly no net benefit or harm | Very low certainty of net benefit or harm |

Point estimate is close to zero, narrow CI | Net benefit or harm likely near zero | Moderate certainty of little net benefit or harm |

Point estimate is harmful, upper bound of CI is beneficial and point estimate has smaller absolute value than upper bound of CI | Possible net harm | Low certainty of net harm |

Point estimate is harmful, upper bound of CI is beneficial and point estimate has larger absolute value than upper bound of CI | Likely net harm | Moderate certainty of net harm |

Entire CI is harmful | Net harm | High certainty of net harm |

*Differentiation of wide versus narrow CIs could be based on a threshold of minimally important differences.

Classification of precision of net effect estimate.

The calculation for CIs for the net effect estimate includes an assumption that effect estimates being combined are not correlated with each other. If effects are correlated, the accurate CIs would be wider or less precise; if inversely correlated, the accurate CIs would be narrower or more precise. If such accuracy is needed, one could add correlation coefficients to the calculation (see online

One approach to select the outcomes critical to the likelihood of net benefit is to identify the outcomes that could change the classification of the precision of the net effect estimate. Such outcomes are either:

Outcomes for which removal of the outcome would change the classification of the precision of the net effect estimate.

Outcomes for which addition of plausible increases to the effect estimate (for effect estimates with lower certainty) would change the classification.

Determining the lowest certainty of evidence among critical outcomes requires addressing risk of bias, inconsistency, indirectness and publication bias for each critical outcome.

The lowest of the certainty ratings for critical outcomes and the certainty rating consistent with the precision of the net effect estimate represents the certainty of net benefit. This approach may work in most cases; raters still need, however, to consider the overall framework and determine if limited certainty in single outcomes are sufficient to rate down the overall certainty of net benefit. This is especially so if the upper or lower bounds of the CI for the net effect estimate approximates a zero effect. A 95% CI is used based on convention rather than a theoretical rationale.

To enhance feasibility of the approach, efforts to fully consider the range of relative importance for outcomes may be limited to ratings that would otherwise be classified as high certainty of net benefit. In situations in which further assessment is needed to confirm robustness of certainty across the range of relative importance, one can repeat the analyses across a reasonable range of relative importance of outcomes.

The purpose of the sensitivity analysis is to determine if the certainty of net benefit remains high across the range of relative importance estimates. There remains insufficient conceptual development to provide explicit guidance on how to precisely define the range of relative importance for outcomes to use for the sensitivity analysis.

The GRADE Working Group has developed guidance on rating the certainty of relative importance of outcomes.

It may be necessary during the process of the sensitivity analysis of outcome importance to re-evaluate which outcomes are critical to the likelihood of net benefit.

The certainty of net benefit does not necessarily dictate the strength of recommendation. The evidence-to-decision framework also includes cost, cost–benefit ratio, equity, acceptability and feasibility as considerations that may modify the strength of recommendation. Panels may choose to focus exclusively on net health effects and not include other elements (eg, some panels choose not to consider costs and do not formally consider acceptability, feasibility and equity).

In situations in which there is a high certainty in effect estimates but uncertainty that the balance of benefits and harms is favourable across the range of patient values and preferences (a situation in which panels will make weak recommendations because fully informed patients are likely to make different decisions), a moderate or low certainty of net benefit provides a clear expression of the rationale for weak recommendations.

High certainty is not necessary, in all cases, for supporting a strong recommendation.

In this article, we introduce an approach for guideline developers to consider explicitly reporting the certainty of net benefit with recommendations, either in addition to or in place of reporting an overall quality of evidence associated with a recommendation. Either way, the approach requires consideration of certainty of evidence ratings for individual outcomes, typically presented in summary of findings tables.

This approach is applicable to decisions or recommendations with binary choices, such as treatment, prevention, diagnostic and screening interventions. This approach involves many judgements that are already made explicitly or implicitly when guideline panels make recommendations. Reporting the judgements made when using this approach would allow readers to interpret their confidence in how the ratings were made and may reduce spurious confidence that could occur with quantitative reporting in the absence of qualitative factors.

A key driver for this approach is greater congruence with the intent behind the concept of ‘adequate evidence to support a recommendation’ than what is currently conveyed by the ‘overall quality of evidence in estimates of effects’. Strengths of this approach include the transparent, logical, quantitative expressions for both scholarly and clinical readers and for both guideline developers and guideline users.

Throughout this discussion, we are considering the context of guideline recommendations which by nature relate to considerations for a population and not for a specific individual. Concepts of certainty of net benefit may eventually be extrapolated to ‘certainty of individual net benefit’ with inclusion of individually determined relative importance of outcomes, but at this time no discussion or testing has been applied to relating these concepts to individual decision-making.

The primary limitation of this approach is its lack of testing to inform its feasibility and acceptability, and how readers will interpret these concepts. This report is shared, before such testing, to increase scholarly discussion. This GRADE concept article does not, therefore, constitute GRADE guidance.

The authors are pleased to acknowledge Lehana Thabane, PhD, for collaborating with the lead author to derive the statistical method; Paul E Alexander, MSc, MHSc; Cynthia Boyd, MD, MPH; Reem Mustafa, MD, and Irfan Dhalla MD, MSc, for feedback to improve the readability and conceptual clarity of the manuscript, and Jordan Prince for the construction of an online calculator.

At a GRADE Working Group (GWG) meeting, GG and MH presented a complex method to rate the certainty of evidence for an outcome when fully contextualised with respect to other outcomes related to a decision or recommendation. HJS had introduced the concept of rating certainty in a range of effects based on all GRADE domains. BSA introduced the concept of certainty of net benefit to clarify and simplify methodology to report and assess the balance of benefits and harms in the context of fully contextualising certainty of evidence across outcomes. The GWG formed a subcommittee for this conceptual development. BSA and Lehana Thabane developed the statistical model for a simple decision analysis and sensitivity analysis. BSA, IK, AI and Lehana Thabane provided examples to demonstrate the model (Appendix). The GWG had in-person meetings in three countries (with up to 100 people in attendance) in which the wider audience provided in-depth review, feedback and discussion to refine the concepts. BSA, PO, IK, AI, MTA, MHM, JJM, AQ, MH, HJS and GG met frequently to iteratively refine the concepts, meet the authorship requirements and approve the final version. BSA is the guarantor of the article.

The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

All authors are members of the GRADE Working Group and conduct scholarly activity or professional services related to the concepts in this article. BSA and PO are employed by EBSCO Information Services and IK is employed by Duodecim Medical Publications Ltd.

Not commissioned; externally peer reviewed.

No additional data available.

Not required.