Article Text


Using serum urate as a validated surrogate end point for flares in patients with gout: protocol for a systematic review and meta-regression analysis
  1. Melanie B Morillon1,2,
  2. Lisa Stamp3,
  3. William Taylor4,
  4. Jaap Fransen5,
  5. Nicola Dalbeth6,
  6. Jasvinder A Singh7,
  7. Robin Christensen1,
  8. Marissa Lassere8
  1. 1Musculoskeletal Statistics Unit, Department of Rheumatology, The Parker Institute, Copenhagen University Hospitals, Bispebjerg and Frederiksberg, Frederiksberg, Denmark
  2. 2Department of Rheumatology, Odense University Hospital, Svendborg, Denmark
  3. 3Department of Medicine, University of Otago, Christchurch, New Zealand
  4. 4Department of Medicine, University of Otago, Wellington, New Zealand
  5. 5JF Department of Rheumatology, Radboud University Medical Centre, Nijmegen, The Netherlands
  6. 6Department of Medicine, University of Auckland, Auckland, New Zealand
  7. 7Department of Medicine, University of Alabama at Birmingham & Birmingham Veterans Affairs Medical Center, Birmingham, Alabama, USA
  8. 8Department of Rheumatology, St George Hospital, University of NSW, Sydney, New South Wales, Australia
  1. Correspondence to Dr Melanie Birger Morillon; melanie.birger.morillon{at}


Introduction Gout is the most common inflammatory arthritis in men over 40 years of age. Long-term urate-lowering therapy is considered a key strategy for effective gout management. The primary outcome measure for efficacy in clinical trials of urate-lowering therapy is serum urate levels, effectively acting as a surrogate for patient-centred outcomes such as frequency of gout attacks or pain. Yet it is not clearly demonstrated that the strength of the relationship between serum urate and clinically relevant outcomes is sufficiently strong for serum urate to be considered an adequate surrogate. Our objective is to investigate the strength of the relationship between changes in serum urate in randomised controlled trials and changes in clinically relevant outcomes according to the ‘Biomarker-Surrogacy Evaluation Schema version 3’ (BSES3), documenting the validity of selected instruments by applying the ‘OMERACT Filter 2.0’.

Methods and analysis A systematic review described in terms of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines will identify all relevant studies. Standardised data elements will be extracted from each study by 2 independent reviewers and disagreements are resolved by discussion. The data will be analysed by meta-regression of the between-arm differences in the change in serum urate level (independent variable) from baseline to 3 months (or 6 and 12 months if 3-month values are not available) against flare rate, tophus size and number and pain at the final study visit (dependent variables).

Ethics and dissemination This study will not require specific ethics approval since it is based on analysis of published (aggregated) data. The intended audience will include healthcare researchers, policymakers and clinicians. Results of the study will be disseminated by peer-reviewed publications.

Trial registration number CRD42016026991.

Statistics from

Strengths and limitations of this study

  • Our study's strengths include clinical expertise in rheumatology.

  • The content experts in the group have extensive knowledge of the literature and experience with gout treatment.

  • The methodologists in the group are members of the Outcome Measures in Rheumatology Clinical Trials (OMERACT) Gout Working Group, and have experience with conducting and reporting randomised clinical trials, systematic reviews and meta-analyses.

  • A possible and anticipated weakness may be the quantity and quality of the trials we identify.


Clinicians making treatment decisions should refer to methodologically strong clinical trials examining the impact of therapy on clinically important outcomes (ie, outcomes that are important to patients). However, clinically important outcomes can be difficult to study, as the required trials need very large sample sizes or long-term patient follow-up. Thus researchers or drug developers look for alternatives. Substituting surrogate end points for the target event allows conduct of shorter and smaller trials, thus offering a solution to the dilemma, if the end points are convincing as surrogate end points.

There are obvious advantages to using biomarkers and surrogate end points, but concerns about clinical applicability and statistical validity to evaluate these aspects hinder their efficient application. A surrogate end point may be defined as an ‘objective’ laboratory measurement or a physical sign used as a substitute for a clinically meaningful end point that measures directly how a patient feels, functions or survives.1 This definition was recommended and further explored at a National Institute of Health (NIH) sponsored workshop in 1998 which agreed on definitions for biomarker, surrogate end point and clinical end point. The agreed definition of a biomarker states ‘a biological marker (biomarker) is a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes or pharmacologic responses to a therapeutic intervention’.2 ,3

In gout, monosodium crystal formation occurs when supersaturation levels of ∼6.8 mg/dL (0.41 mmol/L) are reached at 37°C. Reduction in serum urate (SU) to <6 mg/dL (0.36 mmol/L) is a key goal in the long-term management of gout. As such SU measurement has become an integral part of the management of gout and a critical outcome measure in clinical studies of gout therapies. The Outcome Measures in Rheumatology Clinical Trials (OMERACT) Delphi exercise identified SU as a mandatory outcome measure in chronic gout studies with the highest median rating.4 SU as a biomarker makes inherent sense given the strong relationship between the risk of gout and SU. However, is SU a surrogate end point of relevant clinical outcomes such as gout attacks, tophus regression and radiological damage?


At OMERACT 8 (Malta, 2006) Lassere et al5 proposed a schema for the evaluation of biomarkers as surrogate end points. The schema was operationalised as a score obtained from four domains: target outcome, study design, statistical strength and penalties.5 This schema was based on the NIH definitions of biomarker, surrogate end point and clinical end point published in 2001.2 The distinction between a surrogate and a biomarker was determined by the strength of association between the biomarker and the clinical end point of interest. To be called a surrogate, it was proposed that a biomarker must meet the rank (score) of at least three within the target outcome, study design and statistical strength domains, and there must not be evidence from a randomised controlled trial (RCT) that the use of the biomarker caused patient harm.5

At OMERACT 9 (Kananaskis, 2008) the soluble biomarker group revised the requirements for the specific situation of a soluble biomarker being predictive of structural radiographic damage in ankylosing spondylitis, psoriatic arthritis and rheumatoid arthritis.6 There was an increased emphasis on the technical assay requirements of the biomarker but the strength of association domain, while discussed in the text, did not appear in the OMERACT 9 levels of evidence framework. There was no consensus on all aspects of the framework, and the criteria by which a soluble biomarker could be said to meet the levels of evidence framework were not defined.

At OMERACT 10 (Kota Kinabalu, 2010) evidence was presented that SU fulfilled the OMERACT 9 soluble biomarker requirements in terms of domain 4 (performance criteria) and there is limited evidence from observational studies and one RCT that changes in SU were associated with changes in patient-centred outcomes for gout.7 However, the meeting did not endorse SU as a biomarker for clinically relevant outcomes for gout. The reasons for the lack of endorsement might be that the strength of evidence was weak, the criteria for endorsement are unclear and the chosen patient-centred outcomes (particularly the number of flares) were not universally held to be clinically meaningful.

In parallel to OMERACT, Lassere et al5 systematically reviewed the biomarker–surrogate literature and modified the levels of evidence schema built on the OMERACT 8 proposal which over time went through three iterations (‘Biomarker-Surrogacy Evaluation Schema version’ (BSES), BSES1 which was the OMERACT 8 proposal,5 BSES2 which specified the statistical criteria more precisely8 and BSES3 which replaced the penalties domain with a combined clinical and pharmacological generalisability domain). BSES3 contains four domains: study design, target outcome, statistical evaluation and generalisability. It also specified the kind of statistical association required to justify the link between the biomarker and the clinical end point being sufficiently strong to consider the biomarker as a surrogate end point.9 ,10

In 2012 blood pressure was evaluated using the BSES3 and online material described its application and interpretation.10 The BSES3 framework represents the currently best available approach to validating a biomarker as a surrogate end point. We propose that this framework be endorsed by OMERACT as the framework for validation of biomarker–surrogates for rheumatology clinical trials. It represents the logical extension of work developed at OMERACT 8 and provides a clear pathway by which a putative biomarker, soluble or otherwise, can be evaluated, in contrast to the OMERACT 9 framework. For example, Lassere et al10 have used trial-level data and the BSES3 framework to convincingly show that diastolic and systolic blood pressures are valid surrogate end points for stroke risk reduction. In a recent meta-regression, the approach has also been used to evaluate progression-free survival (PFS) in metastatic renal cell carcinoma.11


We wish to use the example of SU as a soluble biomarker for the major clinical end point of acute gout attacks, in the disease of gout. A minor clinical end point would be tophus size change from baseline to final visit, the change in the number of tophi, and pain. Other patient relevant end points included in the OMERACT core set of outcomes for clinical trials in patients with chronic gout will also be evaluated in exploratory analyses: health-related quality of life (HRQOL), patient global assessment of disease activity and physical disability (activities limitation).

The justification for choosing this biomarker and the clinical end point of flares as the major end point is described as follows:

  • First, SU is recommended as a treatment target by several guidelines for the management of gout.12–14 This strongly implies (although it is not stated explicitly) that changes in SU or achievement of a target level of SU will be strongly associated with clinically relevant outcomes.

  • Second, some regulatory bodies (eg, Food and Drug Administration and European Medicines Agency) have tended to assume that beneficial drug effects on SU will likely have beneficial effects on clinical outcomes in gout. National Institute for Health and Care Excellence (NICE) recommended that febuxostat be available for people who are intolerant of allopurinol or who have contraindications to allopurinol.15 In other words, although NICE did not see persuasive evidence for improved clinical outcomes with the use of febuxostat, it was sufficient that the drug effectively lowered SU to below 6 mg/dL.

  • Third, we have previously shown that SU fulfils the technical performance criteria for a valid soluble biomarker proposed at OMERACT 9.7 Flare (acute attack) of gout is a key clinical manifestation of gout. It constitutes the primary or only manifestation for several years until persistent, tophaceous disease develops. In the expectation that effective management strategies aim to prevent chronic tophaceous disease from occurring it is justifiable to focus on attacks as the clinically relevant end point for the majority of people for gout. Although gout attacks can vary in severity (often modified by acute gout treatment), it is clear that every attack is associated with some level of symptoms and disability. Gout attacks therefore align with how a patient ‘feels or functions’ and can be reasonably be identified as a clinically relevant end point.1

However, we recognise that other clinical outcomes are relevant and will evaluate these within the same framework. This proposal fits in the Filter 2.0 framework by making explicit and quantifying the link between Core Area domains of Pathophysiology Manifestations (biomarker) and domains of Life Impact (flare, pain, HRQOL, tophus). This framework links disease-centred variables of biological and pathological processes with patient-centred variables of how a patient feels, functions and survives as proposed at OMERACT 6.5


There are two objectives:

  1. To determine the strength of the relationship between SU and patient-relevant outcomes, including flares, tophi, HRQOL, pain and function using meta-regression of RCTs.

  2. To evaluate whether SU is a surrogate end point for clinically relevant outcomes in patients with gout as defined by the BSES3 framework.


A reduction in SU will be associated with improvement in clinically relevant patient-reported outcomes including gout flares and tophus size/number.

Methods and analysis

Protocol and registration

The protocol for the systematic review and meta-regression analysis was prepared while planning and documenting the review methods, guarding the project team against arbitrary decision-making during review conduct and to prompt global collaboration.16 Our protocol was prepared according to the recommendations given in Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P)16 and registered on PROSPERO (CRD42016026991); this protocol and coming manuscripts will conform to the PRISMA guidelines for reporting systematic reviews and meta-analyses.17

Eligibility criteria

The eligibility criteria for objective 1 is any RCT comparing an active drug (alone or in combination) in patients with gout with any control or placebo, with a minimum duration of 3 months. The eligibility criteria for objective 2 are any RCT, controlled clinical trial or open-label trial (OLT) comparing an (apparently) active drug (alone or in combination) in patients with gout with any control or placebo, with a minimum duration of 3 months and longitudinal observational studies of gout with a minimum duration of 3 months.

For both criteria, patients will be at least 18 years of age and meeting the preliminary American College of Rheumatology (ACR) criteria for acute arthritis of primary gout18 or given a diagnosis of gout as described by the authors.

Search and selection of trials

The following electronic databases will be searched: PubMed, EMBASE, the Cochrane Library including the Cochrane Central Register of Controlled Trials (CENTRAL) and Cochrane Database of Systematic Reviews (CDSR). The search will be limited to English language studies in humans, but not limited by year of publication. The reference lists from comprehensive reviews and identified clinical trials are also manually searched.

Results of the various searches will be reviewed independently by two authors (LS and MBM). Titles and abstracts will be reviewed and if further information is required (to assess eligibility criteria), the full text will be obtained. A record of reasons for excluding studies will be kept enabling generation of a figure illustrating the flow of information through the different phases of the systematic review continuing to meta-regression analysis. Disagreements will be resolved by an independent third mediator (WT).

Data extraction

EndNote X7 software will be used to manage the records retrieved from searches of electronic databases. Results from hand searches will be tracked on a Microsoft Excel spreadsheet. A customised data extraction form will be created in Microsoft Excel to capture all the information available for each individual trial.

The biomarker is defined as the change in SU from baseline to 3 months, or where 3-month values are not available, the value at 6 months or 12 months (in order of preference). This can be estimated if only baseline and change is reported.

The clinical end points (dependent variables) are defined as follows:

  • Major outcome: gout flares;

  • Minor outcomes: size of sentinel tophus (if size was not measured, we will use number or presence/absence in order of preference) and pain at final study visit.

Exploratory analyses: HRQOL (36-Item Short Form Health Survey, SF-36), patient global assessment of disease activity, and physical disability (activities limitation; eg, Health Assesment Questionnaire (HAQ)).

Effect sizes for continuous end points will be recorded as the standardised mean difference. If there is more than one active treatment arm, analysis will treat this as a separate study that is, substudy (see Meta-regression analysis section). All variable values will be based on the intention-to-treat population from each study whenever possible.

Risk of bias in individual studies and judging the quality of evidence

The RCTs will be assessed for methodological quality (ie, internal validity) using the Cochrane Risk of Bias tool.19 If at least one of the domains is rated as inadequate, the trial will be considered at high risk of bias. If all domains are judged as low, the trial will be considered at low risk of bias. Otherwise, the trial is considered as having unclear risk of bias. Data extraction and risk-of-bias assessment will be performed independently by two reviewers; disagreements will be resolved by a third reviewer. While interpreting the overall findings after the meta-analysis, etc, Grading of Recommendations Assessment, Development and Evaluation (GRADE) will be used to rate the overall quality of the evidence based on both the apparent risk of bias, publication bias, imprecision, inconsistency, indirectness and magnitude of effect; that is, the GRADE ratings of very low-quality, low-quality, moderate-quality or high-quality evidence per outcome will reflect the extent to which we are confident that the effect estimates are correct.20

Meta-regression analysis

To combine the individual study results, we will perform meta-analyses using SAS software (PROC MIXED V.9. 3; SAS Institute, Cary, North Carolina, USA), applying a restricted maximum likelihood (REML) method to estimate the between-study variance (ie, T2) and the combined estimate of effect. We will estimate the anticipated heterogeneity between trials with a standard (Cochran's) Q-test statistic, and we will evaluate this based on the I2 value, which is interpreted as the percentage of variability in treatment effect estimates that is due to between study heterogeneity rather than chance. Although our meta-regression analysis is undertaken correctly from a technical point of view, relations with averages of patients' characteristics can be potentially misleading. Thus, following our systematic review, we will attempt to get access to individual participant datasets investigating patients' characteristics; this will to some extent move us away from looking at relations across trials, to inspection of relations within trials.21

The primary purpose of this project is to evaluate the surrogacy status of SU as a ‘predictor’ of gout flare rate reduction using meta-regression of RCTs. Randomisation is essential for the causal surrogacy relationship; therefore, only RCTs will be included in the main meta-regression analysis. Non-randomised study designs will be summarised separately by meta-regression to confirm the consistency of association between the biomarker and clinical end points in other contexts. Cohort studies will be summarised as a narrative review. The analyses of both randomised and non-randomised studies contribute to the evaluation of SU within the BSES3 framework.

Furthermore, in the meta-regression, the relationship between SU and clinically relevant outcomes can be undertaken using different outcome metrics. We will define these as primary and secondary analyses. In the primary analysis the dependent variable is a rate ratio (ie, an incidence density ratio) comparing the ratio of incidence rates of gout flare events in active versus control arms occurring at any given point in time; incidence rate is the occurrence of an event over person-time (ie, in this setting in person-months). The rate ratio allows trials of different duration to be included in the analysis. The independent variable is between arm difference of within-arm change (on-trial SU from baseline SU) of SU. Therefore, in a trial of 3 months duration, flare rate over 3 months is the dependent variable and change in SU over 3 months is the independent variable.

In secondary analyses the dependent variable is risk ratio reduction (RRR) of within trial gout flare rate. The relative ratio reduction (also called the risk ratio reduction) is the flare risk in the control arm minus the flare risk in the active arm, divided by the flare risk in the control arm (this can also be calculated by 1- relative risk (RR), where RR is the flare risk in the active arm divided by the flare risk in the control arm). Therefore the relative risk reduction (RRR) is the difference in flare risk in two arms (control-active), expressed as a percentage of the risk of the control arm.

The independent variable is within trial, by-arm difference of proportion with SU<6 mg/dL at the end of the trial.

In a RCT, by-arm difference in SU change is likely to be causal and change in SU is easily interpretable as a surrogacy metric in gout by clinicians. Relative risk reduction is more familiar to clinicians than rate ratio but ignores trial duration. Although SU<6 mg/dL is the most common primary end point of RCTs of gout interventions, a by-arm difference in proportion achieving an SU target may be more difficult to interpret than a SU change. In addition to gout flares, the SU as a surrogate end point for two other clinical outcomes, HRQoL and tophus size, will also be evaluated as secondary clinical outcomes. If the trial does not report these outcomes, the authors will be contacted and the by-arm outcomes requested.

A quantitative evaluation of trial-level statistical surrogacy using the BSES310 includes determining the slope coefficient of the surrogacy relationship, trial-level R2 (coefficient of determination)22 and the surrogate threshold effect (STE)23 ,24 ,25 and STE proportion (STEP)8 ,10 of the surrogate and true-clinical-end point relationship using data from a meta-regression of RCTs.

The STE is informative as it captures both the slope and dispersion of the surrogate-true relationship in a single metric.25 The STE is the SU difference needed to predict the primary clinical end point, gout flare rate ratio, in a new trial, if only SU is measured in the new trial. The STE is determined by comparing the difference between control and active arms SU and flare rate, respectively, as follows: (1) calculate the SU change and gout flare rate ratio based on each arm in each trial, (2) calculate the difference between control and active arms for SU change and gout flare rate ratio, (3) regress SU and gout flare rate ratio difference values using weighted by trial size errors-in-variables (specifying a reliability coefficient of 0.9) regression and by a weighted by trial-size meta-regression (as a sensitivity analysis), (4) calculate the 95% prediction limits of the regression and (5) find the SU value where the 95% prediction line intersects with the horizontal flare rate x-axis of no flare rate ratio benefit (where the flare rate ratio y-axis is equal to 1.0). Similar analyses will be explored with flare rate relative risk reduction and proportion with SU<6 mg/dL at the end of the trial. In this analysis the interest is the SU target <6 mg/dL by-arm proportion where the 95% prediction line intersects with the horizontal flare rate x-axis of no flare relative risk reduction benefit (ie, where the flare relative risk reduction y-axis is equal to zero). Subsequent analyses will evaluate HRQoL and tophus size as clinically relevant outcomes.

Where more than two arms from a single trial are present, the by-arm comparisons are down-weighted following A'Hern et al26 because all within trial comparisons are not independent. In all trial comparisons, this requires that a single ‘control’ comparator is determined. In trials with a true placebo, the placebo is the control comparator. In trials without placebo, then the control comparator is an intervention arm that best reflects usual care. For example, in a five-arm trial with a true placebo there are four comparisons, and each comparison is down-weighted using analytic weights.10 This allows all arms from each trial to be evaluated in the meta-regression but adjusted for multiple comparisons with the control.

The primary and the secondary analyses are prespecified as an all drug classes combined analysis. In addition to the STE, slope, R2trial-level, and regression diagnostics, we will also evaluate the impact of effect modifiers; male sex, disease duration (<2 years, 2–10 years, more than 10 years), presence of clinical tophi (yes, no) on the SU and gout flare rate relationship. Furthermore, study design and other trial-related methodological issues, including the effect of differential cross-over, differential drop-out, whether trials included mandatory flare-prevention strategies such as mandatory colchicine and non-steroidal anti-inflammatory drug (NSAID), GRADE ratings20 and risk of bias tool19 ratings will also be explored.

The Scottish Intercollegiate Guidelines Network (SIGN) checklist27 will be used to evaluate the methodology of longitudinal observational studies of gout.

Once these statistical results are available (1) SU reduction and (2) SU target <6 mg/dL will be evaluated as a surrogate end point gout using the BSES3 criteria.


It is important to emphasise that the evaluation of SU as a surrogate end point is for the context of using SU as an end point in clinical trials (surrogate biomarker). This is quite different to using SU to help guide clinical decision-making, for example treating to a specific SU target, or to identify that the treatment is working (monitoring biomarker). Although the meta-regression approach undertaken by the proposed study will help inform clinical decision-making, the evidence needed for treatment targets requires a different research design.

Complete application of the BSES3 framework ideally also uses individual patient level data from multiple clinical trials. Although this analysis is planned, it is contingent on agreement of relevant pharmaceutical companies to share their data and is therefore not a formal part of this protocol.

Observational studies will be included in the search strategy, but will be reported separately as a narrative review in light of the inherent risk of bias in non-randomised and uncontrolled observational study designs.


View Abstract


  • Contributors LS, WT, ND, JF, JAS, RC and ML participated in the conception and design of this protocol. RC, JF and ML provided statistical advice for the design and analysis. LS, WT, ND, JF, RC, MBM, JAS and ML drafted the protocol. LS, WT, ND, JF, MBM, JAS, RC and ML critically reviewed the manuscript for important intellectual content and approved the final version.

  • Funding Musculoskeletal Statistics Unit at the Parker Institute is supported by grants from the Oak Foundation.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.