Reporting of outcomes in gastric cancer surgery trials: a systematic review

Background The development of clinical guidelines for the surgical management of gastric cancer should be based on robust evidence from well-designed trials. Being able to reliably compare and combine the outcomes of these trials is a key factor in this process. Objectives To examine variation in outcome reporting by surgical trials for gastric cancer and to identify outcomes for prioritisation in an international consensus study to develop a core outcome set in this field. Data sources Systematic literature searches (Evidence Based Medicine, MEDLINE, EMBASE, CINAHL, ClinicalTrials.gov and WHO ICTRP) and a review of study protocols of randomised controlled trials, published between 1996 and 2016. Intervention Therapeutic surgical interventions for gastric cancer. Outcomes were listed verbatim, categorised into groups (outcome themes) and examined for definitions and measurement instruments. Results Of 1919 abstracts screened, 32 trials (9073 participants) were identified. A total of 749 outcomes were reported of which 96 (13%) were accompanied by an attempted definition. No single outcome was reported by all trials. ‘Adverse events’ was the most frequently reported ‘outcome theme’ in which 240 unique terms were described. 12 trials (38%) classified complications according to severity, with 5 (16%) using a formal classification system (Clavien-Dindo or Accordion scale). Of 27 trials which described ‘short-term’ mortality, 15 (47%) used one of five different definitions. 6 out of the 32 trials (19%) described ‘patient-reported outcomes’. Conclusion Reporting of outcomes in gastric cancer surgery trials is inconsistent. A consensus approach to develop a minimum set of well-defined, standardised outcomes to be used by all future trials examining therapeutic surgical interventions for gastric cancer is needed. This should consider the views of all key stakeholders, including patients.


Objective
This review examines the degree of variation in the reporting of outcomes used in gastric cancer surgery trials.

Background
The development of clinical guidelines for the surgical management of gastric cancer should be based on robust evidence from well-designed trials. Being able to reliably compare and combine the outcomes of these trials is a key factor in this process.

Methods
Systematic literature searches and a review of study protocols were undertaken to identify randomized controlled trials (RCTs), published between 1996 and 2016, investigating therapeutic surgical interventions for gastric cancer. Outcomes were listed verbatim, categorized into groups (outcome themes) and examined for definitions and measurement instruments.

Results
Of 1919 abstracts screened, 32 trials (9,073 participants) were identified. A total of 749 outcomes were reported of 96 (13 per cent) were accompanied by an attempted definition. No single outcome was reported by all trials. 'Adverse events' was the most frequently reported 'outcome theme' in which 240 unique terms were described. 12 trials (38%) classified complications according to severity, with 5 (16%) using a formal classification system (Clavien-Dindo or Accordion scale). Of 27 trials which described 'short-term' mortality, 15 (47%) used one of 5 different definitions. Six out of the 32 trials (19%) described 'patient-reported outcomes'.

Conclusion
Reporting of outcomes in gastric cancer surgery trials is inconsistent. A consensus approach to develop a minimum set of well-defined, standardized outcomes to be used by all future trials  This study is based on a reproducible and transparent methodology which has been subjected to critical appraisal during a peer-review process.
The GASTROS study, of which this review forms the first stage, is advised by a Study Advisory Group which includes patient representatives.
Including non-English and non-randomized studies in our search strategy may have yielded a greater number of outcomes. Gastric cancer remains a leading cause of cancer-related death globally 1 . Long-term survival remains poor and has not improved significantly over the last four decades 2 . Whilst there has been a shift to multi-modal therapy over the last decade, surgery remains the primary method of curative treatment.
Many developments in surgical techniques aim to improve long-term survival, whilst minimizing post-operative complications. Understanding which of these approaches are optimal for patients should be based on robust evidence from well-designed trials. This process involves the synthesis of evidence in the form of systematic reviews which can only be reliably undertaken if trials report the same outcomes and measure them in the same manner.

Aims & Objectives
This review forms part of the first stage of a three-stage study, which intends to examine and address problems with inconsistent outcome reporting in gastric cancer surgery trials (GASTROS -GAstric Cancer Surgery TRials Reported Outcome Standardisation). The study aims to develop a 'core outcome set' (COS) -a minimum group of standardized and well-defined outcomes, relevant to key stakeholders and measured by all trials 3 -to standardize the reporting of outcomes in randomized control trials within this field. Our previously published study protocol contains an overview of all three stages 4 . This review specifically aims to examine the degree of variation in the reporting of outcomes described by gastric cancer surgery trials. Additionally, the outcomes reported by trials in this review will then be used to generate a 'long-list' of potentially important outcomes which will be prioritized during a Delphi survey in stage two of the study.

Definitions
The GASTROS study, and more specifically this review, focuses on outcome reporting in 'therapeutic surgical trials'. A 'surgical trial' has been previously defined as one of the following 5 : • Type 1 -A trial of medical interventions in surgical patients • Type 2 -A trial which compares a surgical intervention to another surgical intervention • Type 3 -A trial which compares a surgical intervention to a non-surgical intervention The GASTROS study focuses on 'type 2' trials. In the context of gastric cancer, a 'therapeutic surgical intervention' is defined as a potentially curative procedure which aims to excise the gastric neoplasm resulting in partial or total organ loss.
Search strategy A summary of the review's inclusion and exclusion criteria is summarised in table 1.

Identifying studies
Detailed search strategies were developed for each of the following electronic databases examined: • Evidence Based Medicine Reviews via OVID  • CINAHL via EBSCO (January 1 st 1996 to March 30, 2016).
In order to identify surgical interventions and outcome measures being used in current studies, we searched the following databases for protocols of ongoing trials, including completed trials not yet published: • The US National Institutes of Health Trials Register (http://clinicaltrials.gov); • The WHO International Clinical Trials Registry Platform (http://apps.who.int/trialsearch/default.aspx).
Non-English language studies were excluded from this review due to resource limitations. Trials published only as conference abstracts were excluded as they are often limited by 'word count' and hence the abstract would not represent a comprehensive list of outcomes measured in the respective study.

Assessment of eligibility
For quality assurance, two review authors (BA and AMG) independently screened the titles and abstracts retrieved from the electronic searches. This assessment was undertaken in groups of ten abstracts in reverse chronological order. Once there was complete agreement with two consecutive groups of ten abstracts, the remaining abstracts were split and each reviewer screened independently.
Full text copies of all study publications that appeared to meet the inclusion criteria were obtained. Full text copies were also obtained where there was insufficient information in the title or abstract to make a clear judgement.
BA and AMG independently assessed the full text copies for eligibility. This assessment was undertaken in groups of ten publications in reverse chronological order. Once there was complete agreement with two consecutive groups of ten abstracts, the remaining publications were split and each reviewer extracted data independently. Any disagreements were resolved through discussion.
Any unresolved disagreements were referred to the GASTROS study management team for a final decision.

BA and AMG independently reviewed all eligible publications and extracted data into a Microsoft
Excel (Version 2013, Microsoft, Washington, DC, USA) spreadsheet.

Publication versus Study
It is not uncommon that investigators publish results at different stages of their trial and with each publication present a new set of outcomes. The GASTROS study team decided to amalgamate the outcomes published in all publications associated with a single trial to more fairly reflect outcomes being reported by research groups.

Demographics
The following demographic data were recorded for each trial:

Outcomes
The following data were recorded for each outcome: 1. Outcome measured (and whether stated as primary or secondary outcome). Where a primary outcome was not explicitly stated, the outcome on which the sample size calculation was based was taken as the primary outcome.
2. Whether the outcome was defined or not. Outcomes were considered defined if text of their meaning or a citation was provided.
3. The definition of the outcome. 4. The method of outcome measurement (indicators and/or tools used, if relevant). 5. Time points and time-period at or during which the outcome was measured (for example quality of life at 3-months post-surgery).

Rationalizing & Grouping Outcomes
Outcomes were extracted verbatim from publications and minimal merging of terms was undertaken.
Outcomes were merged only where terms were clearly identical. For example, 'anastomotic leak', 'anastomotic leakage' and 'anastomotic leaks' would be merged into 'anastomotic leak'. Outcomes were organised into 'outcome themes' -broad categories encompassing similar types of outcomes.   Table 1. Inclusion and exclusion criteria for this review.

Included Excluded
Types of Studies • Type 2* surgical randomized controlled trials (RCTs) and protocols of surgical RCTs (all trial phases).
• Systematic reviews of type 2 surgical RCTs.
• English Language studies.
• Type 1 or type 3* surgical RCTs and systematic reviews of type 1 or type 3 RCTs.
• Non-English language studies.

Population
• Patients aged 18 years and over.
• Patients below the age of 18.

Interventions
• Partial or total gastrectomy.
• Surgery with curative intent.
• Surgery with non-curative intent (i.e. in stage 4 cancer with prior expectation of an R1 or R2 resection) for the relief of symptoms such as gastric outlet obstruction or bleeding.

Conditions
• Invasive cancer of the stomach and gastroesophageal junction.
*Type 1 -a trial of medical interventions in surgical patients, type 2 -a trial which compares a surgical intervention to another surgical intervention, type 3 -a trial which compares a surgical intervention to a non-surgical intervention 5 .

Analysis of Outcomes According to Themes
Outcomes were organised into eight 'outcome themes', illustrated in figure 2 and described in table 3.
A comprehensive list of reported outcomes is presented in appendix 1.

Mortality
Death after surgery was generally described as 'short-term' and 'long-term' survival. Long-term survival was used as a primary outcome measure in 41% of trials (13 out of 32). The terms used to describe long-term mortality and the time-points at which they were measured was inconsistent (table   4). 'Short-term' mortality was reported by 84% of trials ( Most of the 24 ongoing or unpublished trials  are recruiting in China (n=13), with twenty examining 'extent of lymphadenectomy' or minimally invasive approaches to surgery. A total of 220 uniquely termed outcomes are planned to be reported, thirty-five of which (16%) have an accompanying definition in the respective protocol. The commonest term used to report 'long-term survival' is 'overall survival' (OS) which will be measured by sixteen trials. Seven of these trials plan to measure OS after 5-years of follow-up, three at 3-years of follow-up, and six did not identify time points at which OS would be measured. At the time of our search, one trial protocol contained no information about which outcomes are to be measured.
QoL is due to be measured by ten trials (42 per cent) with five trials proposing to use one or a combination of four different measurement instruments (EORTC QLQ-C30 with QLQ-STO22, SF-36, GIQLI and Euro-Quality of Life-5D). Seven protocols described the timing of the quality of life measurements as follows: • 'Preoperative, postoperative 3 weeks and postoperative 12 months'.
• 'Regularly for three years after surgery'.
• 'Preoperatively, five days postoperatively, three months, six months and one year postoperatively'.

Mortality
Outcomes related to short-and long-term survival/death rates and cause of death.
27 (84) 19 (83) Technical Aspects of Surgery Outcomes recorded directly in the operating theatre (e.g. operation time, blood loss). 31 (97) 13 (57) Recovery from Surgery Report of patient condition following surgery and the ability to return to preoperative or premorbid state. 16 (50) 9 (39) Adverse Events Forms of short-and long-term postoperative complications following surgery. 31 (97) 18 (78) * Certain patient-reported outcomes may fall under other 'themes', e.g. 'post-operative pain' may relate to 'recovery from surgery'. ** One trial protocol contained no information about planned outcomes to report, therefore 23 out of total 24 trials were included in this table.
o n l y 23 • Date of randomisation until the day of death or the day of last follow-up (censored).
• Date of surgery to the date of death from any cause, censoring the follow-up time at the most recent date for living patients.
• Date of randomization to the date of death.
• Date of randomization to the date of death from any cause.
• OS included operative deaths.
To address these inconsistencies, we believe that a 'core outcome set' (COS) is required for gastric cancer surgery studies. Developing a minimum reporting standard will significantly contribute to getting the most out of randomized control trials which are expensive, labour intensive and logistically challenging to set up. A COS does not aim to restrict the outcomes that are reported, but merely to ensure that the most critical outcomes (as decided by key stakeholders) are clearly defined and measured uniformly.
One of the strengths of this study is that it is based on a reproducible and transparent methodology which has been subjected to critical appraisal from a study management team and peer-review process; a protocol of the GASTROS study which aims to develop this COS has been published previously 4 . Nonetheless, there are limitations. Including non-English and non-randomized studies in our search strategy may have yielded a greater number of outcomes and may have identified further 'outcomes'. However, when finalising our inclusion criteria, the two primary objectives of this review were considered -namely a) to describe the current landscape of outcome reporting in gastric cancer surgery RCTs and b) to take forward a 'long-list' of outcomes to be prioritised (by means of a Delphi survey) to form the basis of a COS for RCTs. Whilst we accept that such a COS would have benefits to non-RCTs and national audits, our primary focus was to improve the quality of RCTs and hence excluding other study types. In addition, there will be an opportunity during the Delphi survey (stage 2 of the GASTROS study) for participants to add further outcomes (not already identified from this review) which key stakeholders deem important to be considered for prioritisation.
In summary, the reporting of outcomes in gastric cancer surgery trials is inconsistent and there is large variation with respect to definitions, measurement tools and timing of measurement. This means that research and audit data cannot be synthesized efficiently. We believe that a COS to define a minimum

Availability of Data and Materials
Upon the completion of the study, supporting data will be available upon request.

8, table 1
Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched.

7, 8
Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.

7
Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis).

7
Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.

9
Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made.

9
Risk of bias in individual studies 12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. n/a Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre-specified. n/a

Study selection
17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram.

Figure 1
Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. n/a DISCUSSION Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).

16
Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias).

Reprints
Reprints will not be available from the authors.

This study is funded by the National Institute for Health Research (NIHR) Doctoral Research
Fellowship Grant (DRF-2015-08-023).

Disclaimer
This paper presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Running Title
Reporting of Outcomes in Gastric Cancer Surgery Trials.

Background
The development of clinical guidelines for the surgical management of gastric cancer should be based on robust evidence from well-designed trials. Being able to reliably compare and combine the outcomes of these trials is a key factor in this process.

Objectives
To examine variation in outcome reporting by surgical trials for gastric cancer; to identify outcomes for prioritisation in an international consensus study to develop a core outcome set in this field.

Data Sources
Systematic literature searches (Evidence Based Medicine, MEDLINE, EMBASE, CINAHL, ClinicalTrials.gov and WHO ICTRP) and a review of study protocols of randomized controlled trials, published between 1996 and 2016.

Intervention
Therapeutic surgical interventions for gastric cancer. Outcomes were listed verbatim, categorized into groups (outcome themes) and examined for definitions and measurement instruments

Results
Of 1919 abstracts screened, 32 trials (9,073 participants) were identified. A total of 749 outcomes were reported of which 96 (13 per cent) were accompanied by an attempted definition. No single outcome was reported by all trials. 'Adverse events' was the most frequently reported 'outcome theme' in which 240 unique terms were described. 12 trials (38%) classified complications according to severity, with 5 (16%) using a formal classification system (Clavien-Dindo or Accordion scale). Of 27 trials which described 'short-term' mortality, 15 (47%) used one of 5 different definitions. Six out of the 32 trials (19%) described 'patient-reported outcomes'. Article Summary

Strengths and Limitations of This Study
• This systematic review is the first to describe the variation in outcome reporting within the field of surgical trials for gastric cancer.
• The study is based on a reproducible and transparent methodology which has been subjected to critical appraisal during a peer-review process. The study forms part of a larger project (The GASTROS Study) to develop a 'core outcome set' (COS) for use in surgical trials for gastric cancer and was reviewed and funded by the National Institute of Health Research (UK).
• Only English-language and randomized studies were included in the analysis. Expanding the search may have resulted in the identification of other relevant outcomes reported in this field. Gastric cancer remains a leading cause of cancer-related death globally 1 . Long-term survival remains poor and has not improved significantly over the last four decades 2 . Whilst there has been a shift to multi-modal therapy over the last decade, surgery remains the primary method of curative treatment.
Many developments in surgical techniques aim to improve long-term survival, whilst minimizing post-operative complications. Understanding which of these approaches are optimal for patients should be based on robust evidence from well-designed trials. This process involves the synthesis of evidence in the form of systematic reviews which can only be reliably undertaken if trials report the same outcomes and measure them in the same manner.
This review forms part of the first stage of a three-stage study, which intends to examine and address problems with inconsistent outcome reporting in gastric cancer surgery trials (GASTROS -GAstric Cancer Surgery TRials Reported Outcome Standardisation). The study aims to develop a 'core outcome set' (COS) -a minimum group of standardized and well-defined outcomes, relevant to key stakeholders and measured by all trials 3 -to standardize the reporting of outcomes in randomized control trials within this field. Our previously published study protocol contains an overview of all three stages 4 .
Within our study protocol, we described the results from a 'rapid review' of gastric cancer surgery trials during a 24-month period which demonstrated significant variations in outcome reporting. We hypothesised that these variations were likely to represent a more widespread problem within this field. Inconsistencies in outcome reporting are prevalent within the medical literature and contribute significantly to 'research waste' 5 . Several reviews have demonstrated that trials within the same field often report different outcomes, define them poorly and use various outcome measurement instruments [6][7][8][9] . This results in data which cannot be reliably compared or combined leading to further confusion within the evidence base. As such, initiatives such as COMET (Core Outcome Measures in Effectiveness Trials) were formed to promote the development of COS to address these issues 3 . With respect to surgical trials for gastric cancer, a) no rigorous examination of outcome reporting has been previously undertaken and b) there is no COS for use in this field.

Aims & Objectives
This review aims demonstrate whether further work to develop a COS to be used in surgical trials for gastric cancer is required. Specifically, the objectives are: 1. to examine the degree of variation in the reporting of outcomes described by gastric cancer surgery trials.
2. to generate a 'long-list' of potentially important outcomes which will be prioritized during a Delphi survey in stage two of the study.

Definitions
The GASTROS study, and more specifically this review, focuses on outcome reporting in 'therapeutic surgical trials'. A 'surgical trial' has been previously defined as one of the following 10 : • Type 1 -A trial of medical interventions in surgical patients • Type 2 -A trial which compares a surgical intervention to another surgical intervention • Type 3 -A trial which compares a surgical intervention to a non-surgical intervention The GASTROS study focuses on 'type 2' trials due to the significant research activity within this field (a detailed justification can be found in our study protocol) 4 . In the context of gastric cancer, a 'therapeutic surgical intervention' is defined as a potentially curative procedure which aims to excise the gastric neoplasm resulting in partial or total organ loss. Identifying studies Detailed search strategies were developed for each of the following electronic databases examined: • Evidence Based Medicine Reviews via OVID Non-English language studies were excluded from this review due to resource limitations. Trials published only as conference abstracts were excluded as they are often limited by 'word count' and hence the abstract would not represent a comprehensive list of outcomes measured in the respective study.

Assessment of eligibility
For quality assurance, two review authors (BA and AMG) independently screened the titles and abstracts retrieved from the electronic searches. This assessment was undertaken in groups of ten 10 abstracts in reverse chronological order. Once there was complete agreement with two consecutive groups of ten abstracts, the remaining abstracts were split and each reviewer screened independently.
Full text copies of all study publications that appeared to meet the inclusion criteria were obtained.
Full text copies were also obtained where there was insufficient information in the title or abstract to make a clear judgement. Systematic reviews of RCTs were also retrieved to find studies which had previously not been identified.
BA and AMG independently assessed the full text copies for eligibility. This assessment was undertaken in groups of ten publications in reverse chronological order. Once there was complete agreement with two consecutive groups of ten abstracts, the remaining publications were split, and each reviewer extracted data independently. Any disagreements were resolved through discussion.
There were no unresolved disagreements that required referral to the GASTROS study management team for a final decision.

Data Extraction
BA and AMG independently reviewed all eligible publications and extracted data (described below) into a Microsoft Excel (Version 2013, Microsoft, Washington, DC, USA) spreadsheet.

Publication versus Study
It is not uncommon that investigators publish results at different stages of their trial and with each publication present a new set of outcomes. The GASTROS study team decided to amalgamate the outcomes published in all publications associated with a single trial to more fairly reflect outcomes being reported by research groups.

Trial Characteristics
The following data were recorded for each trial: 1. Author details 2. Title of publication We defined an outcome as 'a unique endpoint which attempts to describe health-related changes that occur secondary to a therapeutic intervention' 4 . The following data were recorded for each outcome: 1. Outcome measured (and whether stated as primary or secondary outcome). Where a primary outcome was not explicitly stated, the outcome on which the sample size calculation was based was taken as the primary outcome.
2. Whether the outcome was defined or not. Outcomes were considered defined if text of their meaning or a citation was provided.
3. The definition of the outcome.  Outcomes were extracted verbatim from publications and minimal merging of terms was undertaken.
Outcomes were merged to accommodate for variant spellings of the same words. For example, 'anastomotic leak', 'anastomotic leakage' and 'anastomotic leaks' were merged into 'anastomotic leak'. The verbatim texts and merged terms were verified and authorized respectively by the study management group.
From the experience of other groups undertaking reviews of outcome reporting, the resulting lists of outcomes are generally extremely long and unwieldy 6 . Consequently, developing a method to organize these outcomes has been necessary. The subject of taxonomy in outcome reporting, including hierarchical structure and which terms/definitions to use, is an emerging area of great significance. We set out our definitions a priori, which can be found in our study protocol 11 . Many COS developers have organized their outcomes into broad categories with common 'themes'. Our study is one of only a handful addressing outcome reporting in surgical trials related to the gastrointestinal (GI) tract. At the time of data analysis, we opted to group outcomes under 'themes' (detailed in table 2) similar to those described by other surgical COS [6][7][8]12 . Doing enables COS researchers to more readily understand trends in outcome reporting within the field of GI surgery. Whilst the themes used in our review enable the reader to understand the types of outcomes being reported, this system has not been developed through wider consensus and has not been subject to a validation process.
At the time of writing, a broader taxonomy for outcome classification had been proposed 13  Whilst the authors have demonstrated that this system is comprehensive and applicable to trials irrespective of the field being studied, they have called for further validation of their work. • Systematic reviews of type 2 surgical RCTs.
• English Language studies.
• Type 1 or type 3* surgical RCTs and systematic reviews of type 1 or type 3 RCTs.
• Non-English language studies.

Population
• Patients aged 18 years and over.
• Patients below the age of 18.

Interventions
• Partial or total gastrectomy.
• Surgery with curative intent.
• Surgery with non-curative intent (i.e. in stage 4 cancer with prior expectation of an R1 or R2 resection) for the relief of symptoms such as gastric outlet obstruction or bleeding.

Conditions
• Invasive cancer of the stomach and gastroesophageal junction.
Long-term survival.

Analysis of Outcomes According to Themes
Outcomes were organised into eight 'outcome themes', illustrated in figure 2 and described in table 2.
A comprehensive list of reported outcomes is presented in appendix 2.

Mortality
Death after surgery was generally described as 'short-term' and 'long-term' survival. Long-term survival was used as a primary outcome measure in 41% of trials (13 out of 32). The terms used to describe long-term mortality and the time-points at which they were measured was inconsistent ( Adverse events were the commonest outcome theme to be reported and made up half of the ten most reported outcomes (table 5). 'Anastomotic leak' was the commonest adverse event to be reported and was described using 5 different definitions (frequency each definition was used is presented in brackets): ₋ 'Clinical and radiological diagnosis' (2).

Patient Reported Outcomes
Patient-reported Outcomes (PROs) were reported in 19% of trials (6 out of 32) and included measures of quality of life (QoL) (n=3) and 'pain' (n=3). QoL was measured using validated tools for gastric cancer in two trials (EORTC QLQ-C30 with QLQ and Spitzer QoL Index) and a non-validated tool in one trial. Pain was measured using 3 different visual-analogue scales.

Multi-Centre Trials
Forty per cent (13 out of 32) of studies were multi-centre trials (

Findings from Study Protocols
Most of the 24 ongoing or unpublished trials  are recruiting in China (n=13), with twenty examining 'extent of lymphadenectomy' or minimally invasive approaches to surgery. A total of 220 uniquely termed outcomes are planned to be reported, thirty-five of which (16%) have an accompanying definition in the respective protocol. The commonest term used to report 'long-term survival' is 'overall survival' (OS) which will be measured by sixteen trials. Seven of these trials plan to measure OS after 5-years of follow-up, three at 3-years of follow-up, and six did not identify time points at which OS would be measured. At the time of our search, one trial protocol contained no information about which outcomes are to be measured.
QoL is due to be measured by ten trials (42 per cent) with five trials proposing to use one or a combination of four different measurement instruments (EORTC QLQ-C30 with QLQ-STO22, SF-36, GIQLI and Euro-Quality of Life-5D). Seven protocols described the timing of the quality of life measurements as follows: • 'Preoperative, postoperative 3 weeks and postoperative 12 months'.
• 'Regularly for three years after surgery'.
• 'Preoperatively, five days postoperatively, three months, six months and one year postoperatively'.
• Date of randomization to the date of death.
• Date of randomization to the date of death from any cause.
• OS included operative deaths.
• OS excluded post-operative deaths.  Furthermore, if the methodology of a particular trial is not sufficiently robust or the outcomes reported are not relevant to key stakeholders, the natural course will be for other researchers to examine the same interventions again, using a different approach. If these subsequent trials do not address the underlying methodological issues, they only contribute to a perpetual cycle which serves to weaken the evidence base. This is reflected within the field of gastric cancer surgery where thirteen trials have examined minimally invasive gastrectomy and a further 13 are actively recruiting to trials examining the same intervention.  86 . This has certainly not been the case with gastric cancer surgery trials over the last two decades and whilst there seems to be a greater acceptance by trials currently in recruitment that QoL is important to measure (although this group still represents less than half of ongoing trials), there remains great variation in relation to 'how' and 'when' it is measured.
To address these inconsistencies, we believe that a 'core outcome set' (COS) is required for gastric cancer surgery studies. Developing a minimum reporting standard will contribute to maximising the benefits from randomized control trials which are expensive, labour intensive and logistically challenging to set up. A COS does not aim to restrict the outcomes that are reported, but merely to ensure that the most critical outcomes (as decided by key stakeholders) are clearly defined and measured uniformly.
The challenges associated with inconsistent outcome reporting in trials is certainly not confined to the field of gastric cancer. The COMET (Core Outcome Measurement in Effectiveness Trials) initiative database (http://www.comet-initiative.org/studies/search) contains details of over 400 completed, active or planned COS projects from across many different specialities 87 . Whilst experience within this relatively new research field has grown considerably over the last decade, there is still much work to be done to further develop the various methodological approaches which can be applied. The GASTROS study aims to add to this in several ways including examining the role of 'internationalising' COS development by undertaking a multi-language Delphi survey as part of a consensus-seeking process.

Strengths and Limitations
In addition to being the first systematic review to examine this subject, this study is based on a reproducible and transparent methodology which has been subjected to critical appraisal from a study management team and peer-review process; a protocol of the GASTROS study which aims to develop this COS has been published previously 4 . Nonetheless, there are limitations. Including non-English reported in this field. However, when finalising our inclusion criteria for this review, the two primary objectives of this review were considered -namely a) to describe the current landscape of outcome reporting in gastric cancer surgery RCTs and b) to take forward a 'long-list' of outcomes to be prioritised (by means of a Delphi survey) to form the basis of a COS for RCTs. Whilst we accept that such a COS would have benefits to non-RCTs and national audits, our primary focus was to improve the quality of RCTs and hence excluding other study types. In addition, there will be an opportunity during the Delphi survey (stage 2 of the GASTROS study) for participants to add further outcomes (not already identified from this review) which key stakeholders deem important to be considered for prioritisation. A further limitation to this review was that it was not prospectively registered on a public database. However, as we describe above, the GASTROS study, including its scope and systematic review plan, has been peer-reviewed and published previously 4 .
In summary, the reporting of outcomes in gastric cancer surgery trials is inconsistent and there is large variation with respect to definitions, measurement tools and timing of measurement. This means that data cannot be synthesized efficiently. We believe that a COS to define a minimum set of standards to implement across all gastric surgical trial is warranted.           6 Eligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale.

8, table 1
Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched.

8,9
Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.

App 1
Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta analysis).

9
Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.

Section/topic # Checklist item Reported on page #
Risk of bias across studies 15 Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., publication bias, selective reporting within studies).
n/a Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre specified. n/a

Study selection
17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram.

Figure 1
Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. Table 2 Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12).

n/a
Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. n/a DISCUSSION Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).

Reprints
Reprints will not be available from the authors.

This study is funded by the National Institute for Health Research (NIHR) Doctoral Research
Fellowship Grant (DRF-2015-08-023).

Disclaimer
This paper presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Background
The development of clinical guidelines for the surgical management of gastric cancer should be based on robust evidence from well-designed trials. Being able to reliably compare and combine the outcomes of these trials is a key factor in this process.

Objectives
To examine variation in outcome reporting by surgical trials for gastric cancer; to identify outcomes for prioritisation in an international consensus study to develop a core outcome set in this field.

Data Sources
Systematic literature searches (Evidence Based Medicine, MEDLINE, EMBASE, CINAHL, ClinicalTrials.gov and WHO ICTRP) and a review of study protocols of randomized controlled trials, published between 1996 and 2016.

Intervention
Therapeutic surgical interventions for gastric cancer. Outcomes were listed verbatim, categorized into groups (outcome themes) and examined for definitions and measurement instruments

Results
Of 1919 abstracts screened, 32 trials (9,073 participants) were identified. A total of 749 outcomes were reported of which 96 (13 per cent) were accompanied by an attempted definition. No single outcome was reported by all trials. 'Adverse events' was the most frequently reported 'outcome theme' in which 240 unique terms were described. 12 trials (38%) classified complications according to severity, with 5 (16%) using a formal classification system (Clavien-Dindo or Accordion scale). Of 27 trials which described 'short-term' mortality, 15 (47%) used one of 5 different definitions. Six out of the 32 trials (19%) described 'patient-reported outcomes'.

Strengths and Limitations of This Study
• This systematic review is the first to describe the variation in outcome reporting within the field of surgical trials for gastric cancer.
• The study is based on a reproducible and transparent methodology which has been subjected to critical appraisal during a peer-review process.
• The study forms part of a larger project (The GASTROS Study) to develop a 'core outcome set' (COS) for use in surgical trials for gastric cancer and was reviewed and funded by the National Institute of Health Research (UK).
• Only English-language and randomized studies were included in the analysis; Expanding the search may have resulted in the identification of other relevant outcomes reported in this field.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  Within our study protocol, we described the results from a 'rapid review' of gastric cancer surgery trials during a 24-month period which demonstrated significant variations in outcome reporting. We hypothesised that these variations were likely to represent a more widespread problem within this field. Inconsistencies in outcome reporting are prevalent within the medical literature and contribute significantly to 'research waste' 5 . Several reviews have demonstrated that trials within the same field often report different outcomes, define them poorly and use various outcome measurement instruments [6][7][8][9] . This results in data which cannot be reliably compared or combined leading to further confusion within the evidence base. As such, initiatives such as COMET (Core Outcome Measures in Effectiveness Trials) were formed to promote the development of COS to address these issues 3 .  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   7 With respect to surgical trials for gastric cancer, a) no rigorous examination of outcome reporting has been previously undertaken and b) there is no COS for use in this field.

Aims & Objectives
This review aims demonstrate whether further work to develop a COS to be used in surgical trials for gastric cancer is required. Specifically, the objectives are: 1. to examine the degree of variation in the reporting of outcomes described by gastric cancer surgery trials.

Definitions
The GASTROS study, and more specifically this review, focuses on outcome reporting in 'therapeutic surgical trials'. A 'surgical trial' has been previously defined as one of the following 10 : • Type 1 -A trial of medical interventions in surgical patients • Type 2 -A trial which compares a surgical intervention to another surgical intervention • Type 3 -A trial which compares a surgical intervention to a non-surgical intervention The GASTROS study focuses on 'type 2' trials due to the significant research activity within this field (a detailed justification can be found in our study protocol) 4 . In the context of gastric cancer, a 'therapeutic surgical intervention' is defined as a potentially curative procedure which aims to excise the gastric neoplasm resulting in partial or total organ loss. Identifying studies Detailed search strategies were developed for each of the following electronic databases examined: • Evidence Based Medicine Reviews via OVID  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  Non-English language studies were excluded from this review due to resource limitations. Trials published only as conference abstracts were excluded as they are often limited by 'word count' and hence the abstract would not represent a comprehensive list of outcomes measured in the respective study.
Full text copies of all study publications that appeared to meet the inclusion criteria were obtained.
Full text copies were also obtained where there was insufficient information in the title or abstract to make a clear judgement. Systematic reviews of RCTs were also retrieved to find studies which had previously not been identified.
BA and AMG independently assessed the full text copies for eligibility. This assessment was undertaken in groups of ten publications in reverse chronological order. Once there was complete agreement with two consecutive groups of ten abstracts, the remaining publications were split, and each reviewer extracted data independently. Any disagreements were resolved through discussion.
There were no unresolved disagreements that required referral to the GASTROS study management team for a final decision.

Data Extraction
BA and AMG independently reviewed all eligible publications and extracted data (described below) into a Microsoft Excel (Version 2013, Microsoft, Washington, DC, USA) spreadsheet.

Publication versus Study
It is not uncommon that investigators publish results at different stages of their trial and with each publication present a new set of outcomes. The GASTROS study team decided to amalgamate the outcomes published in all publications associated with a single trial to more fairly reflect outcomes being reported by research groups.

Title of publication
1. Outcome measured (and whether stated as primary or secondary outcome). Where a primary outcome was not explicitly stated, the outcome on which the sample size calculation was based was taken as the primary outcome.
2. Whether the outcome was defined or not. Outcomes were considered defined if text of their meaning or a citation was provided.
Outcomes were merged to accommodate for variant spellings of the same words. For example, 'anastomotic leak', 'anastomotic leakage' and 'anastomotic leaks' were merged into 'anastomotic leak'. The verbatim texts and merged terms were verified and authorized respectively by the study management group.
From the experience of other groups undertaking reviews of outcome reporting, the resulting lists of outcomes are generally extremely long and unwieldy 6 . Consequently, developing a method to organize these outcomes has been necessary. The subject of taxonomy in outcome reporting, including hierarchical structure and which terms/definitions to use, is an emerging area of great significance. We set out our definitions a priori, which can be found in our study protocol 11 . Many COS developers have organized their outcomes into broad categories with common 'themes'. Our study is one of only a handful addressing outcome reporting in surgical trials related to the gastrointestinal (GI) tract. At the time of data analysis, we opted to group outcomes under 'themes' (detailed in table 2) similar to those described by other surgical COS [6][7][8]12 . Doing so enables COS researchers to more readily understand trends in outcome reporting within the field of GI surgery. Whilst the themes used in our review enable the reader to understand the types of outcomes being reported, this system has not been developed through wider consensus and has not been subject to a validation process.
At the time of writing, a broader taxonomy for outcome classification had been proposed 13  Whilst the authors have demonstrated that this system is comprehensive and applicable to trials irrespective of the field being studied, they have called for further validation of their work. • Systematic reviews of type 2 surgical RCTs.
• English Language studies.
• Type 1 or type 3* surgical RCTs and systematic reviews of type 1 or type 3 RCTs.
• Non-English language studies.

Population
• Patients aged 18 years and over.
• Patients below the age of 18.

Interventions
• Partial or total gastrectomy.
• Surgery with curative intent.
• Surgery with non-curative intent (i.e. in stage 4 cancer with prior expectation of an R1 or R2 resection) for the relief of symptoms such as gastric outlet obstruction or bleeding.

Conditions
• Invasive cancer of the stomach and gastroesophageal junction.
Long-term survival.

Analysis of Outcomes According to Themes
Outcomes were organised into eight 'outcome themes', illustrated in figure 2 and described in table 2.
A comprehensive list of reported outcomes is presented in appendix 2. Below, we present a summary of some of the most commonly reported short and long-term outcome themes.

Mortality
Death after surgery was generally described as 'short-term' and 'long-term' survival. Long-term survival was used as a primary outcome measure in 41% of trials (13 out of 32). The terms used to describe long-term mortality and the time-points at which they were measured was inconsistent (table   4). 'Short-term' mortality was reported by 84% of trials (

Adverse Events
Adverse events were the commonest outcome theme to be reported and made up half of the ten most reported outcomes (table 5). 'Anastomotic leak' was the commonest adverse event to be reported and was described using 5 different definitions (frequency each definition was used is presented in brackets): ₋ 'Clinical and radiological diagnosis' (2).

Patient Reported Outcomes
Patient-reported Outcomes (PROs) were reported in 19% of trials (6 out of 32) and included measures of quality of life (QoL) (n=3) and 'pain' (n=3). QoL was measured using validated tools for gastric cancer in two trials (EORTC QLQ-C30 with QLQ and Spitzer QoL Index) and a non-validated tool in one trial. Pain was measured using 3 different visual-analogue scales.  QoL is due to be measured by ten trials (42 per cent) with five trials proposing to use one or a combination of four different measurement instruments (EORTC QLQ-C30 with QLQ-STO22, SF-36, GIQLI and Euro-Quality of Life-5D). Seven protocols described the timing of the quality of life measurements as follows: • 'Preoperative, postoperative 3 weeks and postoperative 12 months'.
• '6 weeks, 12, 24, 36, 48 and 50 months after surgery'.   • Date of surgery to the date of death from any cause, censoring the follow-up time at the most recent date for living patients.
• Date of randomization to the date of death.
• Date of randomization to the date of death from any cause.
• OS included operative deaths.
To address these inconsistencies, we believe that a 'core outcome set' (COS) is required for gastric cancer surgery studies. Developing a minimum reporting standard will contribute to maximising the benefits from randomized control trials which are expensive, labour intensive and logistically challenging to set up. A COS does not aim to restrict the outcomes that are reported, but merely to ensure that the most critical outcomes (as decided by key stakeholders) are clearly defined and measured uniformly.
The challenges associated with inconsistent outcome reporting in trials is certainly not confined to the field of gastric cancer. The COMET (Core Outcome Measurement in Effectiveness Trials) initiative database (http://www.comet-initiative.org/studies/search) contains details of over 400 completed, active or planned COS projects from across many different specialities 87 . Whilst experience within this relatively new research field has grown considerably over the last decade, there is still much work to be done to further develop the various methodological approaches which can be applied. The GASTROS study aims to add to this in several ways including examining the role of 'internationalising' COS development by undertaking a multi-language Delphi survey as part of a consensus-seeking process.

Strengths and Limitations
In addition to being the first systematic review to examine this subject, this study is based on a reproducible and transparent methodology which has been subjected to critical appraisal from a study management team and peer-review process; a protocol of the GASTROS study which aims to develop this COS has been published previously 4 . Nonetheless, there are limitations. Including non-English  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   33 and non-randomized studies in our search strategy may have identified other different outcomes reported in this field. However, when finalising our inclusion criteria for this review, the two primary objectives of this review were considered -namely a) to describe the current landscape of outcome reporting in gastric cancer surgery RCTs and b) to take forward a 'long-list' of outcomes to be prioritised (by means of a Delphi survey) to form the basis of a COS for RCTs. Whilst we accept that such a COS would have benefits to non-RCTs and national audits, our primary focus was to improve the quality of RCTs and hence excluding other study types. In addition, there will be an opportunity during the Delphi survey (stage 2 of the GASTROS study) for participants to add further outcomes (not already identified from this review) which key stakeholders deem important to be considered for prioritisation. A further limitation to this review was that it was not prospectively registered on a public database. However, as we describe above, the GASTROS study, including its scope and systematic review plan, has been peer-reviewed and published previously 4 .

8, table 1
Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched.

8,9
Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated.

App 1
Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta analysis).

9
Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators.

10
Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made.

10
Risk of bias in individual studies 12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. n/a Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating which were pre specified. n/a

Study selection
17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram.

Figure 1
Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations. Table 2 Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12).

n/a
Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. n/a DISCUSSION Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers).