Introduction

“If you cannot measure it, you cannot improve it.”

William Thomas, Lord Kelvin (1824-1907)

Why measuring perioperative outcomes is important

Reliable measurement and recording of outcomes after surgery should be integral to the delivery and development of high-quality surgical care. For consent to be truly informed and decision-making collaborative, surgical patients have a right to know the expected results of the procedure to which they are consenting. Providers of surgical services need to be able to evaluate their processes of care and resultant outcomes in order to benchmark their practice against other providers. Surgeons and the team they work with (including anesthesiologists) have a professional duty to show that their practice is safe and competent. Additionally, the financial and logistical planning of surgical service delivery relies on robust process and outcome data, e.g., how long patients are likely to stay in hospital, which patients may require postoperative critical care, and the frequency of expected postoperative complications.

Why outcome measurement is important in perioperative research

Outcome measurement in perioperative research is equally fundamental. The only means by which clinical trials can reliably discriminate between beneficial, ineffective, and harmful interventions is by employing outcomes that are relevant to patients, clinically important, and valid. Moreover, the perioperative evidence base requires not only relevant and valid outcome measurements within individual trials but also consistency of outcome measurements between different trials. Such consistency facilitates comparing and contrasting results between studies as well as combining the findings into high-quality systematic reviews—i.e., the “gold standard” of evidence-based medicine. Heterogeneity of outcome measurements limits the quantitative pooling of data from multiple trials within meta-analyses. At best, this diminishes confidence in the pooled estimates of an intervention’s effect.1 At worst, it may preclude any quantitative pooling of data within a systematic review and thereby substantially undermine the value of individual trials and the utility of the combined evidence base.

Inconsistency in outcome reporting

The problem of inconsistent outcome reporting and its consequences has long been recognized. A large systematic review of stroke outcome measures in 2000 reported that “there is little consistency in the measurement of outcome in acute stroke trials, and this may complicate interpretation of the results and reduce the likelihood of detecting worthwhile drug effects”.2 In the perioperative setting, a review of surgical outcome reporting in 2002 noted that “inconsistent complication reporting is common in hospitals and in the surgical literature”.3 Furthermore, a recent Cochrane systematic review of perioperative hemodynamic management noted that none of 31 included studies used the same set of postoperative morbidities.4

Lack of consensus regarding what outcomes to measure and how they should be defined has stimulated recent interest in standardizing endpoints and outcome measures for perioperative research.5,6 The potential benefits of researchers adopting standardized endpoints should be self-evident. Measuring outcomes in a variety of different ways makes it difficult or impossible to compare results between trials. On the other hand, if identical criteria are adopted for a given outcome and used consistently across trials, data from different trials can be easily compared, contrasted, and combined in meta-analyses to provide a more precise estimate of the direction and magnitude of the true effect. The use of individual patient data (IPD) meta-analysis may help overcome some of the variation in methodology or outcome reporting between trials and thus reduce the heterogeneity of the data; however, the IPD approach is considerably more time and resource intensive than using aggregate trial data for systematic reviews.7,8 Consistent reporting of outcomes also facilitates improved understanding of the nuances of different trial results based on the nature of the clinical context, the patient population studied, and the intervention administered. Any uncertainty whether differences between studies are related to definition of outcomes or true clinical effects is minimized with consistent outcome reporting.

Core outcome measures and standardized endpoint reporting

Various initiatives have been introduced, including development of core outcome sets and standardized definitions for specific outcome measures, in an attempt to address the problems associated with inconsistent outcome measurement and reporting. The Core Outcome Measures in Effectiveness Trials (COMET) program9 is notable in this respect. The COMET program developed from initiatives to develop a core outcome set in rheumatology and cancer medicine. An important development against this background has been initiatives to combine the core outcomes set approach with standardized definitions in order to provide a consistent and comprehensive toolkit for investigators designing the clinical trials of the future.

In clinical trials, standardization of endpoints requires two essential elements—first, a defined core outcome set reported consistently across all trials, and second, consistent definitions (criteria) for individual outcome measures. These two elements may be combined to provide a menu of endpoints, with criteria, for each outcome domain, along with a standardized core outcome set. In turn, each outcome domain can include a hierarchy of consistently defined measures appropriately selected according to the level of detail and precision relevant to a particular trial, but with all trials consistently reporting the core outcome set.

The aim of this narrative review is to:

  • summarize the challenges inherent in measuring and defining perioperative outcomes

  • describe current efforts to standardize endpoints in perioperative care

  • discuss the potential implications of standardized endpoints for perioperative research and clinical practice

The landscape of perioperative outcome measurement

Types of outcome measures

Outcome or process?

The quality of perioperative care may be measured in various ways. Some “outcomes” are in fact more accurately described as “process” measures, invoking the Donabedian model whereby both process measures (i.e., what we do or the actions involved in delivering health care) and outcome measures (i.e., the results or effects of those processes) may closely reflect the quality of care.10 Examples of process measures include length of hospital stay and completion of the WHO checklist, whereas 30-day mortality and postoperative myocardial infarction are consequences or outcome measures.

Clinician-described or patient-reported

The measure of interest may be described by clinicians or reported by patients. Patient-reported outcomes (e.g., rating of postoperative pain on a numerical rating scale) are considered “subjective” because they reflect patients’ perceptions, whereas clinician-described outcomes (e.g., the presence or absence of myocardial injury) are considered “objective” measures based on clinical evidence. Nevertheless, there is clearly a degree of subjectivity in assessing certain outcomes (e.g., the presence or absence of postoperative atelectasis). Moreover, patient-reported outcomes are, by definition, important to patients, whereas clinician-described outcomes may not be as important to patients.

The concept of “patient-centred” outcomes—i.e., focusing explicitly on outcomes that matter to patients—has recently been proposed to quantify postoperative recovery.11 For example, an asymptomatic postoperative rise in troponin may not affect a patient’s recovery. For that reason, it would not constitute a patient-centred outcome despite being a predictor of future cardiac events and therefore an important outcome to clinicians. A recent review asserted that “we are now entering a new era in medicine where patient-centred outcomes will determine what constitutes medical success or failure, not only doctors’ perceptions of success.”12

Adverse events or recovery

Outcome measures may also focus on adverse events or postoperative recovery. Both approaches have strengths and weaknesses. Adverse events can generally be observed clinically and/or confirmed from diagnostic tests, whereas recovery can be assessed only from patients using patient-reported outcome measures (PROMs) or indirect measures of functional capacity. Patient-reported outcome measures may be assessed using either specific questionnaires quantifying recovery13-17 or more general questionnaires developed for evaluating health-related quality of life (HRQL).18 Functional capacity is generally assessed by a standardized measure of physical function (e.g., six-minute walk test).19

While PROMs, by definition, are patient-centred and may reveal important sequelae following surgery that measures of adverse events fail to capture, PROMs instruments require specific psychometric evaluation and validation. New PROM tools require validation against existing tools, and similarly, their reliability—i.e., the extent to which a test provides the same output given the same input—may need formal testing. The feasibility of a particular measure also requires evaluation. Without the necessary resources and expertise for their use, the utility of PROMs in practice is clearly limited. The use of outcome measurement tools that have not been appropriately tested is common in perioperative research20 and provides another justification for developing standardized endpoints.

Notwithstanding that measures of adverse events might therefore appear to be a simpler and more objective approach for assessing outcomes than patient-reported or indirect markers of recovery, they do have limitations. After most types of surgery, in-hospital mortality or mortality within 30 days occurs too infrequently to be an adequate description of perioperative outcomes. Consequently, perioperative morbidity, a more common outcome, has become the focus of perioperative outcome measurement. Perioperative morbidity has additional implications for the use of healthcare resources and is increasingly recognized as a predictor of long-term outcome.21,22 Recent characterization of the high-risk surgical population has also steered attention towards perioperative morbidity.23 Although this population comprises only 10-15% of all surgical patients, it accounts for over 80% of postoperative complications and resource costs. Nonetheless, morbidity endpoints may also fall short in describing outcomes after surgery, particularly beyond the immediate perioperative period. Recent evidence suggests that long-term recovery may take months to years—at any rate, significantly longer than the time frames examined in most perioperative research.24-26 Hence, there is a growing interest in short-term and longer term recovery endpoints to quantify the overall success of surgery.27-29

Other controversies persist concerning measurement of adverse events. First, the definitions used for specific complications are frequently inconsistent. A review of surgical adverse events in 2001 found 41 different definitions and 13 grading scales for surgical-site infections among the 82 studies included in the review.30 The recent debate regarding definitions of perioperative myocardial injury is likewise ongoing.31,32 Even seemingly simple binary constructs such as mortality may be presented in a variety of ways.33,34 Such issues clearly suggest that standardized endpoints would greatly facilitate comparison of data between trials.

Second, grading the severity of adverse events is problematic. Although distinguishing between trivial and life-threatening complications may be informative, the various systems developed for quantifying severity,35-38 e.g., the Clavien-Dindo classification,35 all suffer from a degree of inherent subjectivity. Severity is variously based on the degree of physiological derangement, its duration, or the invasiveness of treatment required; however, the long-term sequelae—i.e., the impact on a patient’s quality of life and life expectancy—are arguably more important criteria for judging severity of complications to patients.

Composite outcomes

Morbidity outcomes may be reported singly or combined to give composite outcomes, such as major adverse cardiac events. Composite measures have two principal advantages. First, they can increase the power of a study to detect differences between groups. Combining several individually rare outcomes into one composite outcome increases the event rate and thus reduces the sample size required to detect a significant difference between groups. Second, amalgamating all separate clinically important outcomes—e.g., composite outcome of death, dependency, and poor neurological function, as is commonly reported in stroke trials—provides a succinct quantitative overview of the overall benefit of an intervention.

However, composite outcomes obscure the detail of the individual components and may be misleading if the component outcomes are not broadly equivalent in both severity and incidence. If one complication within the composite outcome occurs much more frequently than the others, the overall rate of the composite outcome will be skewed towards the rate of that particular complication.39 Meanwhile composite outcomes are patently impossible to interpret in the context of the existing literature if their component inputs vary between trials.40

Morbidity scores

Postoperative morbidity scores have been developed to quantify and compare overall postoperative morbidity for different procedures. These include the Postoperative Morbidity Survey (POMS),41,42 Cardiac Postoperative Morbidity Score (C-POMS),43 and Comprehensive Complication Index.44,45 While a robust system for quantifying overall postoperative morbidity would appear advantageous, such systems also pose challenges. They may be cumbersome to administer and require specific training in their use, which limits their utility. They require thorough validation and share the weakness of other composite outcomes, i.e., they may mask important differences in rates of individual specific postoperative morbidities. The POMS and C-POMS, in particular, were validated as means of detecting morbidity that would prevent hospital discharge. As healthcare systems and processes change, such measures may require refinement.

Resource use measures

An alternative approach is the use of resource consumption measures (e.g., critical care and hospital length of stay and readmission rates after surgery). Such measures are undeniably important in economic analyses of perioperative care. These measures are not only readily accessible from hospital data but they are also patient-centred, since patients generally want to return home from hospital as soon as possible. Nevertheless, resource use is only a proxy for clinical outcome. Despite the abundant evidence linking postoperative complications with increased length of stay and higher costs,46-50 resource use is also affected by several hidden factors (e.g., availability of community-based support, clinician behaviour, and hospital policy) that may limit the reliability—in different institutions or in the same institution over time—of measures of resource use as markers of perioperative clinical outcomes.

To avoid these potential confounders, trials may report “time to medical fitness for discharge” as a measure of overall clinical outcome.51,52 Although this endpoint tells us little about the clinical concerns that may have delayed fitness for discharge, it is arguably a good marker of the overall early success of surgery and perioperative care. Nevertheless, its utility as an outcome measure presupposes that different hospitals (or clinical teams) have similar criteria for considering a patient medically fit for discharge. Therefore, agreement on a consistent threshold and standardized definition for “medically fit for discharge” is a clear prerequisite for meaningful interpretation.

Patient-reported experience measures (PREMs)

Finally, overall patient experience or satisfaction with care may be an outcome of interest. The importance of patient satisfaction after surgery and/or anesthesia has been increasingly recognized both in its own right and as a metric for quality of care.53-55 Furthermore, there is some evidence linking positive patient experience and clinical outcome.56-58 Concerns have arisen, however, over the use of non-validated tools to measure patient experience despite the availability of well-validated instruments.59,60 Without an agreed standard to guide investigators, the use of untested instruments to assess perioperative patient satisfaction is likely to continue.

In summary, regardless of the approach to perioperative outcome measurement, the use of complex outcome measures, such as PROMs, or composite morbidity endpoints greatly increases the scope for variability between trials. This makes it all the more compelling to standardize endpoints in perioperative care. Furthermore, perioperative outcome measures must be valid, reliable, and pragmatic (i.e., feasible) based on their specific contexts and populations. Investigators need consensus-based guidance that will help them determine and precisely define the outcome measures as well as establish time points to record the outcome measures. As would be expected, making such decisions on an arbitrary basis leads to inconsistency and random variation in outcome reporting.

Core outcomes and standardization initiatives

Thus far, efforts to improve consistency of outcome measurement in perioperative research have taken two approaches. The first approach involves determining the most appropriate “outcome domains” to describe the perioperative care (i.e., what to measure). Not all outcome domains will be relevant for all perioperative trials; however, establishing a “core” set of perioperative outcomes would facilitate formulating reliable comparisons and combinations of data from trials that report those outcomes, thus enabling their inclusion in systematic reviews and allowing investigators to increase the value of their studies.

The second approach involves agreeing on definitions for specific endpoints (i.e., standardized criteria regarding how to measure) for each outcome domain, e.g., myocardial injury after surgery, to ensure all trials use standardized definitions for reporting. Standardization would improve consistency between trials, reduce the use of non-validated endpoints, and ensure the use of precise widely accepted definitions for specific endpoints that are often defined only vaguely, if at all, in the current literature. Standardized endpoints and core outcome measures may of course be combined, providing researchers with clear consensus-based guidance on which outcomes should be reported and how such outcomes should be defined.

Core outcome sets

The concept of core outcome sets initially grew from research on rheumatoid arthritis during the 1990s. The 1992 Outcome Measures in Rheumatology Clinical Trials (OMERACT) conference developed from increasing recognition that assessing the impact of interventions was impossible without a consensus on what outcomes should be measured and how they should be defined. An agreement on a core outcome set for rheumatology trials was reached at the conference.61 The OMERACT group has since developed and validated core outcome sets for several rheumatological conditions and has pioneered methodology for choosing measurement instruments via its OMERACT “filter”.62

Core outcome sets are increasingly being developed in a wide range of medical disciplines, from eczema to colorectal cancer.63,64 The Core Outcome Measures in Effectiveness Trials (COMET) initiative was established in 2010 to meet this growing need by “bringing together people interested in the development and application of agreed standardized sets of outcomes”.9 A related aim of the COMET initiative is to guide the methodology of core set development, which has itself become a subject of considerable research interest.65 While no specific guidelines have been published to date, a recent systematic review of methods used in developing core outcome sets identified three important principles, namely, involving all stakeholders (i.e., patients and carers as well as clinicians and researchers) at all stages, achieving widespread consensus (usually via some form of Delphi process), and methodological transparency.66

Core outcome measures for perioperative and anesthetic care (COMPAC)

An initiative to develop a core outcome set for trials in perioperative medicine and anesthesia is currently underway.6 Its aim is to develop a basic “core set” of outcomes for reporting in all perioperative trials without placing any restriction on the reporting of other more specific outcomes in particular studies. The methodology for COMPAC is based on the COMET initiative’s recommendations. A group of stakeholders (comprising perioperative clinicians, patients, carers, and researchers) is first convened, and then key steps are taken to complete a “long list” of all relevant outcome measures. First, a comprehensive literature search is performed to describe existing perioperative outcome measurements, followed by stakeholder consultation exercises to identify any other potentially relevant outcomes. A Delphi process is then utilized through which stakeholders select a shortlist of candidate outcome measures, and finally, a consensus process is employed to reach agreement on the core outcome set.

European Society of Anaesthesiology (ESA)/European Society of Intensive Care Medicine (ESICM) joint task force standards

The challenge of standardizing endpoints in perioperative research has recently been addressed by a joint task force of the ESA and the ESICM.5 This represents the first international collaboration with plans to develop standardized endpoint definitions, with the stated aim of “providing a methodological standard for use in large pragmatic clinical studies designed to improve patient outcomes after surgery”.

The task force, comprising 12 perioperative experts with diverse backgrounds, conducted a literature review of perioperative outcome assessment. They then discussed the evidence base to reach agreement on a standardized definition for 22 pre-specified perioperative complications and four composite outcomes, along with a simple severity grading and recommendations for measuring HRQL after surgery. The standard definitions produced were deliberately straightforward and user-friendly, reflecting their intended use in large pragmatic trials where ease of data collection is an important consideration. Similarly, the task force incorporated existing definitions already widely used in audit and research, no doubt mindful that new definitions might further add to the confusion in selecting perioperative outcome endpoints. Severity grading was consistently defined throughout and based on the degree of anticipated harm and need for clinical intervention.

The task force discouraged measures of resource use, considering them “unreliable surrogate markers of clinical outcome because they are affected by hospital and healthcare policy as well as by clinician behaviour”. Nevertheless, they endorsed the measurement of recovery after surgery, proposing the quality of recovery (QoR)-15 instrument as a suitable standard,17 and emphasized the importance of measuring HRQL following surgery. Then again, none of the suggested HRQL endpoints were developed for postoperative use except the Post-operative Quality of Recovery Scale (PQRS).67 Since PQRS has been validated only up to three months postoperatively, the task force also highlighted the need for new instruments to assess long-term HRQL after surgery.

While the ESA/ESICM endeavour represented the first international effort to standardize outcome definitions for perioperative research, the recommendations were not presented as a final solution. The task force acknowledged that the process had strengths and limitations and that “further work may improve the definitions provided and broaden the scope”. Nonetheless, this international collaboration not only highlights the current lack of consensus-based standards in perioperative outcome measurement but also provides a useful, patient-relevant, clinically important, and precisely defined candidate set of standardized endpoints for perioperative research.

BJA symposium “defining perioperative endpoints” (June 2015)

A symposium organized by Monash University and sponsored by the British Journal of Anaesthesia was held in June 2015 at the Collaborative Clinical Trials in Anaesthesia Conference at the Monash Centre in Prato Italy. The symposium, chaired by Paul Myles (Australia) and Mike Grocott (UK), brought together over 50 internationally recognized experts in perioperative research to discuss the challenges of developing standardized endpoints for perioperative clinical trials. The overall aim of achieving a broad internationally recognized consensus on endpoint measurement involves a multi-step collaborative development process incorporating themes from both the COMPAC initiative and the ESA/ESICM endpoints workstreams.

Discussions at the conference focused first on identifying major themes or outcome domains that require standardized endpoints and then on forming expert working groups for each domain identified. Each working group will involve four to eight experts from around the world whose task will be to review the existing literature and then propose candidate standardized endpoints for their respective outcome domains. Consensus on the standardized endpoints for each domain will then be sought across all participants, likely through a modified Delphi process. An overall working party will be established to coordinate the COMPAC and STandardized EndPoints in Perioperative care (STEPP) processes.

This endeavour to develop standardized endpoints will thus build on the recommendations from the ESA/ESICM initiative as well as inform the work of the COMPAC initiative towards a core perioperative outcome set. Uniquely among these initiatives, the latter stages of COMPAC/STEPP will include patients and carers in deciding which outcomes warrant inclusion in a core set for all perioperative trials. The results of COMPAC/STEPP will be presented at the 16th World Congress of Anaesthesiologists in Hong Kong in August 2016.

Implications of standardized endpoints

The development of core outcomes and standardized endpoints has considerable implications for perioperative research. Methodological standards in research (to minimize the risk of bias) are universally recognized, and explicit standards exist for judging methodological quality.68-70 Similarly, standards have existed for many years for reporting different types of trials.71-73 These frameworks exist to promote high-quality research and to ensure its accurate dissemination; however, to date, there is a lack of objective standards or guidelines for outcome measurement in perioperative research.

As mentioned previously, several medical specialties are considerably ahead of perioperative medicine in this regard.61,65,74 The examples of rheumatoid arthritis and pain medicine neatly illustrate the difficulties in measuring complex or multidimensional outcomes without standardized definitions. Perioperative medicine research has clearly reached a level of complexity that also warrants standardized outcome measurement. Now that we no longer consider mortality or length of stay as adequate markers of surgical outcomes,75 more complex multidimensional outcome measures are increasingly used to quantify overall postoperative morbidity and recovery. Reporting such endpoints mandates agreement on how they should be measured. Without explicit standards among the research community, definitions for outcome measurement are doomed to remain arbitrary.

The potential benefits of standardizing outcome measurement are particularly significant in perioperative medicine, owing to the paucity of large multinational randomized-controlled trials. As long as smaller scale trials remain the norm in perioperative research, systematic reviews incorporating such trials will continue to form the highest level of evidence to inform practice. Nevertheless, the strength of conclusions drawn in a systematic review depends on the similarity of the studies included.76 The significant heterogeneity in methodology, sample populations, and outcome measurement prevents drawing such firm conclusions, which largely defeats the purpose of systematic reviews. Consensus on core outcome measures and standardized endpoints within the field of perioperative research would thus overcome a significant obstacle to conducting useful systematic reviews and meta-analyses.

Conclusions

Measuring outcomes that are patient-relevant, clinically important, and valid is fundamental to the delivery of high-quality clinical care and to innovation and development of such care through research. As surgical innovations become more complex and the burden of age and comorbidities in the surgical patient population continues to increase, understanding the benefits and harms of surgical interventions becomes ever more important. Nevertheless, we can understand only what we can adequately describe. Truly collaborative decision-making, delivery of safe effective care, and ongoing quality improvement are all critically dependent on reliable valid measurement of patient-relevant and clinically important data. Attempts to describe the full spectrum of outcomes following surgery necessarily entail moving beyond the traditional endpoints of mortality and resource use towards more complex measures of morbidity, patient-reported outcomes, and functional status.

Without standardization and consensus to guide the use of increasingly complex and nuanced endpoints, there is a real risk that perioperative research will become embroiled in a mire of inconsistent heterogeneous outcome measures that cannot be meaningfully compared and contrasted between trials or combined within meta-analyses. This will result in limiting the value of the research effort and in depriving patients and clinicians of definitive answers. Collaboration in perioperative medicine—whether between institutions or across continents—has enormous potential to improve the value of research outputs. Standardizing endpoints for outcome measurement is fundamental to maximizing the quality of such collaboration and ensuring the impact of future perioperative research.