Article Text

Original research
Comparing the reporting and conduct quality of exercise and pharmacological randomised controlled trials: a systematic review
  1. Scott C Adams1,2,3,
  2. Julia McMillan4,
  3. Kirsten Salline5,
  4. Jessica Lavery6,
  5. Chaya S Moskowitz6,
  6. Konstantina Matsoukas7,
  7. Maggie M Z Chen3,
  8. Daniel Santa Mina3,8,
  9. Jessica M Scott9,10,
  10. Lee W Jones9,10
  1. 1Department of Cardiology, Toronto General Research Institute, Toronto, Ontario, Canada
  2. 2Ted Rogers Cardiotoxicity Prevention Program, Peter Munk Cardiac Centre, Toronto, Ontario, Canada
  3. 3Kinesiology and Physical Education, University of Toronto, Toronto, Ontario, Canada
  4. 4Albert Einstein College of Medicine, Bronx, New York, USA
  5. 5Internal Medicine, NYU Langone Health, New York, New York, USA
  6. 6Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
  7. 7Library, Memorial Sloan Kettering Cancer Center, New York, NY, USA
  8. 8Anesthesia and Pain Management, Toronto General Hospital, Toronto, Ontario, Canada
  9. 9Medicine, Memorial Sloan Kettering Cancer Center, New York, New York, USA
  10. 10Medicine, Weill Cornell Medical College, New York, New York, USA
  1. Correspondence to Dr Lee W Jones; jonesl3{at}mskcc.org

Abstract

Objective Evaluate the quality of exercise randomised controlled trial (RCT) reporting and conduct in clinical populations (ie, adults with or at risk of chronic conditions) and compare with matched pharmacological RCTs.

Design Systematic review.

Data sources Embase (Elsevier), PubMed (NLM) and CINAHL (EBSCO).

Study selection RCTs of exercise in clinical populations with matching pharmacological RCTs published in leading clinical, medical and specialist journals with impact factors ≥15.

Review methods Overall RCT quality was evaluated by two independent reviewers using three research reporting guidelines (ie, Consolidated Standards of Reporting Trials (CONSORT; pharmacological RCTs)/CONSORT for non-pharmacological treatments; exercise RCTs), CONSORT-Harms, Template for Intervention Description and Replication) and two risk of bias assessment (research conduct) tools (ie, Cochrane Risk of Bias, Jadad Scale). We compared research reporting and conduct quality within exercise RCTs with matched pharmacological RCTs, and examined factors associated with quality in exercise and pharmacological RCTs, separately.

Findings Forty-eight exercise RCTs (11 658 patients; median sample n=138) and 48 matched pharmacological RCTs were evaluated (18 501 patients; median sample n=160). RCTs were conducted primarily in cardiovascular medicine (43%) or oncology (31%). Overall quality score (composite of all research reporting and conduct quality scores; primary endpoint) for exercise RCTs was 58% (median score 46 of 80; IQR: 39–51) compared with 77% (53 of 68; IQR: 47–58) in the matched pharmacological RCTs (p≤0.001). Individual quality scores for trial reporting and conduct were lower in exercise RCTs compared with matched pharmacological RCTs (p≤0.03). Factors associated with higher overall quality scores for exercise RCTs were journal impact factor (≥25), sample size (≥152) and publication year (≥2013).

Conclusions and relevance Research reporting and conduct quality within exercise RCTs is inferior to matched pharmacological RCTs. Suboptimal RCT reporting and conduct impact the fidelity, interpretation, and reproducibility of exercise trials and, ultimately, implementation of exercise in clinical populations.

PROSPERO registration number CRD42018095033.

  • clinical trials
  • rehabilitation medicine
  • clinical pharmacology
  • statistics & research methods

Data availability statement

All data relevant to the study are included in the article or uploaded as supplemental information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • A total of n=30 159 participants from 96 randomised controlled trials (RCTs) of exercise and pharmacological therapies published in high-impact journals were included.

  • We used a combination of five established and one investigator-developed inventories to comprehensively evaluate and compare the quality of research reporting and conduct of exercise and pharmacological RCTs.

  • Main limitations of the study include the restriction to journals with impact factors ≥15 and the lack of broadly applicable or unified guidelines to compare across exercise and pharmacological therapy RCTs.

Introduction

Reports from epidemiological studies and randomised controlled trials (RCTs) indicate that exercise therapy is safe and well tolerated, and associated with broad health benefits in adults.1 Accordingly, exercise is considered standard of care therapy for many clinical populations (ie, adults with or at risk of chronic conditions), with established guidelines from numerous international agencies.2–4

Clinical recommendation of exercise for a particular clinical indication is predicated on evidence from RCTs.5 Optimal reporting of RCTs evaluating pharmacological and non-pharmacological therapies is facilitated by multiple standardised guidelines (eg, Consolidated Standards of Reporting Trials (CONSORT),6 7 Template for Intervention Description and Replication (TIDieR)).8 Reports of RCTs are required to conform to at least one of these guidelines when submitting to scientific journals across all areas of medicine. Relatedly, risk of bias (ROB) tools (eg, Cochrane ROB,9 Jadad Scale10) evaluate RCT research conduct. Numerous reviews have evaluated reporting quality and conduct of medical (eg, surgical,11 medical device12 and pharmacological13 interventions) RCTs. Only a few previous systematic reviews have assessed the quality of exercise RCT reporting and conduct.14–18 However, these reviews were limited in scope (eg, did not use comprehensive guidelines like CONSORT and Cochrane ROB; included a small number of trials) and incompletely reported key aspects of study methods (eg, item rating criteria, reviewer training). To our knowledge, no exercise reviews have contextualised their findings via direct comparison with trials in other research disciplines.

Therefore, our primary objective was to comprehensively evaluate the overall quality of exercise RCT reporting and conduct in clinical populations. The primary outcome was overall quality score (ie, the combined quality scores from three research reporting and two research conduct inventories). We also compared the quality of research reporting and conduct from exercise RCTs with matched RCTs of pharmacological therapies (a well-established field of biomedical research with a long history of adopting RCT methods)19 using (1) the complete guidelines and (2) only key items from the guidelines (ie, those generally applicable to both intervention types) to provide context for our findings. Secondary objectives were to compare individual items within the research reporting and conduct inventories as well as to examine factors associated with overall quality score.

Methods

Search strategy

This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)20 and AMSTAR 221 guidelines (online supplemental methods 1 and 2). Full study methods are provided within online supplemental methods 3–7 and online supplemental table 1. Briefly, a research informationist (KM) conducted two sequential literature searches for exercise (first search) and pharmacological (second search) RCTs within the Embase (Elsevier), PubMed (NLM) and CINAHL (EBSCO) databases (figure 1). The search for exercise RCTs was conducted using a combination of relevant keywords and controlled vocabulary: (1) exercise training intervention and (2) RCTs. The search was restricted to trials published between 1 January 2008 (the year the CONSORT extension for non-pharmacological treatments (CONSORT-NPT) was first published22 and the search date (8 March 2018). Meta-data (ie, journal, cohort/population, sample size and number of study sites) were extracted for eligible exercise RCTs and used to define the matching criteria for pharmacological RCTs. The pharmacological RCT search was conducted on 20 November 2018. The search was similarly restricted by date (1 January 2008–20 November 2018) and used a combination of relevant search terms and matching criteria for: (1) pharmaceutical intervention, (2) RCTs, (3) journal, (4) cohort/population, and (5) number of study sites (single or multicentre). We purposefully restricted our search to medical journals with impact factors ≥15 because journals with higher impact factors are more likely to endorse and enforce reporting quality guidelines23–25 and publish both exercise and pharmacological RCTs—leading to a more balanced foundation for comparison between study types.

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram. RCT, randomised controlled trial.

Study eligibility criteria

Exercise RCTs involving adults (≥18 years of age) with chronic conditions, written in English, and published in journals with impact factors ≥15 according to the 2016 Journal Citation Reports (Clarivate Analytics) between 1 January 2008 and the search dates (exercise: 8 March 2018; pharmacological: 20 November 2018) were eligible. Exercise therapy interventions were defined as those involving chronic (>3 weeks), repeated sessions of supervised (in person, with or without a distance-based component) aerobic training (ie, endurance activity, ≥15 min/session), resistance training (ie, multiple large muscle group exercises involving repeated voluntary muscle contractions against a resistance greater than those normally encountered in activities of daily living), or the combination, with the objective of improving health-related outcomes.26 27 Pharmacological interventions were defined as studies involving the administration of established or experimental pharmacological agents with the objective of improving health.

Study selection, matching, data extraction and additional sources

Trained study reviewers (JM and KS; see online supplemental method 3 for training description) independently screened and evaluated identified article titles and abstracts in the DistillerSR web platform (Evidence Partners, Ottawa, Canada; figure 1). Next, full manuscripts of potentially eligible articles were independently reviewed using DistillerSR. Excluded exercise records are listed in online supplemental table 1.28 Matching criteria for exercise and pharmacological therapy RCTs included: (1) publishing journal (±5 impact factor points according to the 2016 Journal Citation Reports (Clarivate Analytics, formerly ISI Web of Knowledge)), (2) study population (sharing similar disease characteristics), (3) study sample size (±30% difference in study sample size), and (4) number of study sites (single vs multisite). These specific matching criteria were selected to establish impartial comparison between exercise and pharmacological RCTs. The ‘publishing journal’ criterion was selected because studies published within the same journal should, in theory, be held to similar reporting standards. If no direct match could be identified within the same journal, we used an investigator-defined cut-off of ±5 impact factor points to find alternate matches because impact factor has been shown to be associated with RCT reporting and methodological quality.29 30 The ‘study population’ criterion was chosen to account for differences in the research methods and standards across specific clinical populations and specialties. If no direct population match could be identified, we considered closely related populations. For example, for trials among patients with cardiac diseases, cardiomyopathy or heart failure was considered surrogate. We selected the ‘study sample size’ and ‘number of study sites’ as criteria to control for differences in the methods (eg, human and physical resources, infrastructure) used to conduct smaller versus larger trials. To this end, an investigator-defined cut-point of a 30% difference in sample size was used to match RCTs of similar scale and logistical complexity. Exercise and pharmacological therapy RCTs had to be matched on a minimum of two of the four matching criteria to be eligible. The pharmacological therapy RCT with values closest to the target exercise RCT was used if more than one potential match was identified. Full data were extracted for all eligible RCTs from the primary article and all other publicly available supplemental data sources using DistillerSR and Reference Guides. Disagreements concerning eligibility, data extractions, and ROB assessments were resolved by consensus (JM and KS) and adjudicated by a third party (SCA) when consensus could not be obtained. The corresponding author for each article was contacted by investigators (SCA, JMS, LWJ) to request information on incomplete and missing items. After 4 weeks, non-responding authors were recontacted and provided an additional ~4 weeks to respond. Reporting totals were revised after the closure of data collection (ie, final author contact (1 September 2019)).

Evaluation measures

Each trial was evaluated on two sets of criteria: (1) quality of research reporting and (2) quality of research conduct using complete standardised inventories and/or key items from these inventories, as needed. Exercise RCTs were evaluated on a maximum of 78 potential items and pharmacological RCTs were evaluated on a maximum of 63 potential items. The quality of exercise research reporting was first assessed using CONSORT-NPT (52 items),6 CONSORT-Harms (10 items)31 and TIDieR (16 items).8 The quality of pharmacological research reporting was assessed using CONSORT (37 items7) and CONSORT-Harms (10 items). However, there are no TIDieR-equivalent guidelines available to assess pharmacological intervention reporting. Therefore, intervention reporting for pharmacological interventions was assessed using six key items from TIDieR (including intervention length, modality, location, frequency, dose and adherence). Exercise dose consisted of session intensity and duration (aerobic and resistance interventions) as well as the number of sets and repetitions (resistance interventions only). Exercise RCT reporting was also re-evaluated using just the 37 items from the CONSORT guidelines that are common to both intervention types.7 Notably, there were items within the CONSORT-based reporting quality guidelines (and TIDieR guidelines for exercise RCTs) that were not applicable (NA) based on the unique aspects of individual exercise and pharmacological RCTs. Items rated as NA were excluded from the calculation of primary and secondary outcomes for each study (see the Endpoints and Data analysis sections). All research reporting quality items were rated (with equal weighting and maximum score of 1 point per item) as: 1=‘properly reported’ or 0=‘unclear’ (incompletely reported); or ‘not reported’ (missing); NA=‘not applicable’.

The quality of research conduct was assessed using the Cochrane ROB inventory (7 items9) and the Jadad Scale (3 items).10 Cochrane ROB items were rated (with equal weighting) as: 2=‘low risk of bias’; 1=‘unclear risk of bias’ or 0=‘high risk of bias’. The first two items in the Jadad Scale were scored as 2=‘low risk of bias’ or 0=‘high risk of bias’; and the third item was scored as 1=‘low risk of bias’ or 0=‘high risk of bias.’

Endpoints

The primary endpoint was overall quality score defined as the sum of numerical quality scores from all research reporting and conduct inventories relative to the total number of applicable items. Secondary endpoints were defined as the numerical quality scores for each research reporting guideline and conduct inventory relative to the total number of applicable items for the study.

Data analysis

Characteristics of RCTs were summarised using descriptive statistics. Quality scores were calculated and reported in numerical and percentage score formats. Percentage quality scores were calculated for the primary endpoint (overall quality score) and secondary endpoints (individual scores for the quality of reporting guidelines and quality of conduct inventories) as the achieved score relative to the total number of applicable items per RCT. All items from the two research conduct inventories were applicable for every study and scored with values of 0.1 or 2 resulting in total quality score for research conduct-related items of 19 per study. The variation in the total number of applicable items per study was caused by different numbers of reporting quality guideline items being rated as ‘NA’, resulting in median numbers of eligible items (ie, denominators for percentage score calculations) of 80 for exercise RCTs and 68 for pharmacological RCTs. Generalised linear models (GLMs) were specified with a binomial family and logit link to compare the scores of exercise and pharmacological RCTs. For the quality of research conduct scales (Cochrane ROB, Jadad Scale), item ratings were analysed as low or unknown risk of bias versus high risk of bias. The model accounts for differences in the number of eligible items and the matching between the exercise and pharmacological RCTs. GLMs were also used to evaluate factors associated with overall quality scores for exercise and pharmacological therapy RCTs separately. Potential factors included journal impact factor (<25 vs ≥25), RCT sample size (<152 vs ≥152 participants), number of study sites (single vs multiple sites) and year of publication (<2013 vs ≥2013). Cut-offs for impact factor, sample size and year of publication were based on the medians. Exploratory one-way analyses of variance (ANOVAs) were used to assess whether reporting quality varied across studies matched on 50%, 75% and 100% of matching criteria. For comparisons of the individual components of the composite scores, p values were adjusted for multiple comparisons within research reporting and conduct inventories using a Bonferroni correction. Data are presented as median (IQR) and OR (95% CIs). Inter-rater reliability was evaluated using intraclass correlation coefficient (ICC) calculated via one-way ANOVA.32 Analyses were performed using R V.4.0.2.33

Patient and public involvement

Patients were not included in the design and conduct of this review. However, optimising patient safety and benefit is the fundamental purpose of this review. Specifically, the proximal objective of the review is to identify opportunities to improve the rigour and reproducibility of exercise research that, in turn, will facilitate the delivery of robust evidence-based exercise interventions across diverse clinical populations and settings.

Results

See online supplemental tables 2–12 for full study characteristics and results. A total of 2836 potential exercise records were identified with 866 duplicate records removed using EndNote citation management software (Clarivate Analytics). A total of 1970 records underwent title and abstract screening (figure 1). Of these, 264 records underwent full review with 48 exercise RCTs meeting eligibility criteria.34–81 The 48 primary searches for pharmacological therapy trials produced 2815 records. The median number of records returned per search was 15 (range: 0–853). Review of the primary search results produced 19 matched pharmacological RCTs; the remaining 29 pharmacological RCTs were identified via review of modified secondary searches.82–129 Overall, 13 pairs of exercise and pharmacological RCTs were matched on 100% of our four matching criteria, 18 pairs of RCTs were matched on 75%, and 17 pairs of RCTs were matched on 50%. On average, exercise and pharmacological therapy RCTs were matched on three of four criteria. The results of agreement for the two raters’ assessments for the exercise and pharmaceutical studies publication scores were: overall quality score: ICC=0.85 (95% CI: 0.78 to 0.89); quality of research reporting guidelines: ICC=0.83 (95% CI: 0.75 to 0.88); and quality of research conduct inventories: ICC=0.73 (95% CI: 0.62 to 0.81).

Missing information (author contact)

Each RCT had missing information. The median number of eligible reporting quality items for exercise RCTs was 61 (IQR: 59–62) and pharmacological RCTs was 49 (IQR: 48–50). The median percentage (numerical; numerical range) of missing or indeterminate reporting quality items in exercise RCTs was 46% (28 of 61 items; 13–49) compared with 27% (13 of 49 items; 5–26) in pharmacological RCTs. Sixteen (33%) and seven (15%) corresponding authors of the exercise and pharmacological RCTs responded with a median of 12.5 (IQR: 10.0–16.2) and 5.0 (IQR: 4.0–6.5) additional items.

RCT characteristics

RCT characteristics are summarised in table 1. Exercise therapy RCTs included a total of 11 658 participants (7411 (64%) were allocated to experimental arms; including studies with 1–3 intervention arms) compared with 18 501 participants (11 909 (64%) allocated to experimental arms) in the pharmacological therapy RCTs. The median sample sizes of exercise RCTs were 138 (IQR: 100–236) and 160 (IQR: 98–314) for pharmacological RCTs. Overall, 34 of 48 exercise RCTs (71%) and 31 of 48 pharmacological RCTs (65%) reported positive primary outcomes.

Table 1

Characteristics of exercise and pharmacological therapy RCTs

Primary and secondary endpoints

The median overall quality score for RCTs of exercise therapy was 58% (46 of 80; IQR: 49–65) compared with 77% (53 of 68; IQR: 71–84; p≤0.001) for pharmacological therapy RCTs (table 2). For secondary endpoints, median research reporting quality scores across all complete guidelines were significantly lower in exercise RCTs in comparison with pharmacological RCTs (table 2). The lowest scoring research reporting quality guideline was CONSORT-Harms for both exercise and pharmaceutical studies. In exercise RCTs, median CONSORT-Harms score was 32% (3 of 9; IQR: 11–51) compared with 67% (6 of 10; IQR: 40–73) in pharmacological RCTs (p≤0.001; table 2). Harms reporting was missing entirely from 19% (9 of 48) of exercise RCTs and 4% (2 of 48) of pharmacological RCTs. Exercise RCTs reported 57% (8 of 15; IQR: 7–10) of TIDieR items (table 2). Over 75% of exercise RCTs were missing details related to intervention personnel, progression and participant adherence (table 3).

Table 2

Quality of exercise and pharmacological therapy RCT reporting and conduct

Table 3

Individual TIDieR item reporting summary for exercise therapy RCTs

In exercise RCTs, median Cochrane ROB score was 71% (10 of 14; IQR: 64–79) compared with 93% (13 of 14; IQR: 86–93) in pharmacological RCTs (p≤0.001; table 2). A summary of Cochrane ROB assessments for individual exercise and pharmacological therapy RCTs is provided in table 4. Exploratory one-way ANOVAs did not indicate a difference in reporting quality outcomes between exercise and pharmacological RCTs matched on 50%, 75% or 100% of the matching criteria.

Table 4

Cochrane ROB ratings for individual exercise and pharmacological therapy RCTs

Comparison of key items

Thirty-seven of 52 CONSORT items, all 10 CONSORT-Harms items, and 6 of 16 TIDieR items were considered key items. Median reporting scores for the key items from CONSORT and TIDieR were not significantly different between exercise and pharmacological RCTs; whereas, reporting scores for CONSORT-Harms were significantly lower for exercise RCTs (table 2). Compared with pharmacological RCTs, exercise RCTs had lower reporting of key study methods (eg, blinding after group assignment (60% vs 98%), balanced discussion of harms vs benefits (39% vs 66%), intervention modality (39% vs 66%), intervention dose (50% vs 98%), and complete intervention descriptions (0% vs 67%)).

Factors associated with reporting quality

Journal impact factor ≥25 (OR: 1.36; 95% CI: 1.18 to 1.57), larger sample size ≥152 (OR: 1.29; 95% CI: 1.11 to 1.51), and more recent publication year ≥2013 (OR: 1.18; 95% CI: 1.03 to 1.34) were associated with higher overall quality scores in exercise RCTs (table 5). The only factor associated with greater overall quality scores in pharmacological RCTs was more recent publication year ≥2013 (OR: 1.35; 95% CI: 1.14 to 1.60; p<0.001).

Table 5

Factors associated with overall quality score, stratified by study type

Discussion

We evaluated the quality of research reporting and conduct within exercise therapy RCTs in clinical populations, then compared with the quality of reporting and conduct in matched pharmacological therapy RCTs. Our findings demonstrate that the quality of exercise therapy RCT reporting and conduct is suboptimal according to all complete guidelines and inventories used in this study and is inferior to RCTs of pharmacological therapy. However, the mean overall reporting quality for RCT methods and interventions, but not harms, was similar between intervention types when considering key items within the respective guidelines.

To our knowledge, five systematic reviews14–18 have evaluated the overall quality of research reporting and conduct within exercise RCTs in clinical populations. Our findings corroborate the findings of these systematic reviews demonstrating the overall quality of exercise RCT reporting and conduct is suboptimal. For instance, in 27 exercise RCTs involving 1467 patients with metabolic syndrome, Ostman et al17 reported a median overall quality of 60% (range: 33%–87%) using the Tool for the assEssment of Study qualiTy and reporting in EXercise130 guideline. Similarly, Borror et al14 evaluated 12 exercise RCTs (representing 135 patients) with type 2 diabetes using a combination of 16 items from CONSORT, Jadad, Physiotherapy Evidence Database guidelines131 and the Delphi list.132 The combined trial reporting and conduct quality score was 49% (range: 38%–58%). Nevertheless, prior reviews have several important limitations. For instance, these reviews14–18 did not use the complete versions of comprehensive and widely accepted guidelines (eg, CONSORT, Cochrane ROB) and, thus, did not rigorously evaluate the quality of all salient aspects of trial reporting and conduct. In addition, the number of exercise trials evaluated was small, comparisons of reporting with matched pharmacological trials were not performed, and no data extraction training or standardisation was described within these studies. Thus, our review that was conducted by well-trained independent reviewers using specialised reference guides to facilitate standardised data extraction according to five distinct but complementary established guidelines/tools to assess and compare a large number of exercise trials and matched pharmacological trials provides the most rigorous evaluation of exercise research quality to date.

Although overall quality scores were poor in RCTs of exercise therapy, these findings were generally driven by poor research reporting quality scores across select individual guidelines rather than suboptimal RCT conduct per se. Foremost among these, the finding that harms were the most poorly reported aspects of exercise RCTs is concerning. Previous reviews in patients with cancer,133 chronic fatigue,134 and multiple sclerosis135 have specifically focused on evaluating the reporting of adverse event frequency and descriptions; this information was completely missing within 23%–88% of included exercise trials.133–135 Our study extends these findings by demonstrating that harms-related monitoring and reporting were missing or incompletely reported in ≥75% of exercise RCTs; and, relatedly, >50% of articles failed to provide a balanced discussion of risks to benefits for the tested interventions. In contrast, a related assessment of 325 chemotherapy trials reported a mean CONSORT-Harms score of 63%,136 compared with mean harms scores of 36% (exercise RCTs) and 57% (pharmacological RCTs) in our study. Based on our findings, we cannot support or refute the prevailing dogma that exercise is a safe and tolerable intervention strategy in most areas of clinical medicine.1 However, it is not possible to fully evaluate the harms to benefit ratio of exercise without accurate monitoring and reporting of adverse events within exercise RCTs—a critical consideration in the clinical recommendation of any medical intervention.

Reporting of intervention methods is the most commonly assessed quality metric in exercise RCTs to date. Our findings support previous reviews of exercise interventions in patients with peripheral arterial disease,137 cancer,138 hypertension139 and recovering from stroke140 demonstrating essential elements, including details on the exercise prescription regimen itself, are incompletely reported. For example, Hacke et al used TIDieR to assess intervention reporting quality in 24 exercise RCTs involving 1195 patients with hypertension and reported that 91% of exercise intervention studies were missing information about intervention supervisors and 52% were missing details of intervention adherence.139 Relatedly, Tew et al also used TIDieR and reported that 20%–26% of reports failed to describe several of the most fundamental exercise intervention elements (ie, exercise mode, intensity, tailoring and progression) in 58 exercise RCTs in patients with peripheral arterial disease.137 In our study, information on patient compliance to the planned exercise regimen as well as the expertise of the individuals implementing the intervention was missing or incomplete in >90% of trials; fundamental details pertaining to dose of prescribed exercise were also missing in 50% trials. By contrast, pharmacological intervention compliance was similarly missing in ~80% of trials; however, prescribed pharmacotherapy dose was only missing in 2% of studies. Incomplete intervention description not only hinders study reproducibility and cross-study integration (for meta-analyses) but also precludes quantification of exercise and pharmacotherapy dose—a key metric for elucidation of dose/exposure–response relationships and translation into clinical practice.141

A major strength of this review is that, to our knowledge, it is the first to compare the quality of research reporting and conduct within exercise and pharmacological therapy RCTs. We used rigorous data extraction and evaluation processes to provide the first direct evidence that the quality of research reporting and conduct within exercise RCTs is inferior to similar pharmacological RCTs using the complete reporting guidelines (CONSORT and CONSORT-NPT). For context, the reporting quality of pharmacological RCTs in our review is comparable with previous reviews. For example, using CONSORT, Peron et al142 found that reporting quality of pharmacological RCTs in oncology ranged from 72% to 74%. A similar review conducted by Ritchie et al reported a CONSORT score of 72% in 57 pharmacological RCTs (33% of studies involved patients with metabolic and cardiorespiratory diseases).13 Our findings are consistent with these studies and suggest that comparable research reporting quality scores for exercise RCTs are, on average, 15%–20% lower. There were no differences observed in mean overall reporting quality when comparing exercise and pharmacological RCTs according to key items from the CONSORT guidelines; however, the reporting of several critical individual items was suboptimal within exercise RCTs (eg, complete intervention descriptions, intervention dose, blinding status). Our findings provide important direction to improve the completeness and rigour of exercise trial reporting.

Several factors may contribute to the lower quality scores for research reporting and conduct within exercise trials. For instance, CONSORT was developed primarily to support the reporting of pharmacological trials and may not adequately capture aspects unique to the conduct of non-pharmacological trials such as exercise.143 This issue should have been addressed, in theory, with publication of the CONSORT-NPT extension in 2008.6 22 Indeed, this extension was developed to facilitate complete reporting across the fundamental aspects of RCTs applicable to all non-pharmacological trials, including exercise. Reporting quality of traditional biomedical therapy RCTs (eg, surgical, pharmaceutical) has improved since the publication of the CONSORT guidelines and superior in journals adopting these guidelines.144–146 We similarly found that exercise RCTs published more recently (>2013) had higher overall quality scores. These findings are encouraging and suggest that the awareness and use of established guidelines and inventories to support research reporting and conduct may be increasing, although there remains marked room for improvement. Continued improvement in this context will require continued education of exercise investigators to conform with such guidelines and journals/reviewers hold authors accountable to use of such guidelines. Stricter adherence to CONSORT-NPT, for example, would improve the reporting quality of most fundamental trial aspects; however, this tool may still be too generic to support the comprehensive reporting of features unique to exercise trials, especially intervention description. To this end, adoption of TIDieR, or the more recent exercise-specific Consensus on Exercise Reporting Template guidelines,147 is warranted to improve the reporting and reproducibility of exercise interventions within exercise RCTs.

Our study has several limitations. First, the restriction to journals with impact factors ≥15 may overestimate the quality of research reporting and conduct within the included exercise and pharmacological therapy RCTs. Relatedly, the exclusion of exercise RCTs published within sports science journals may underestimate the quality of exercise studies. Nevertheless, we felt it was necessary to selectively draw from this subset of journals given they are most likely to publish RCTs of both intervention types and endorse and enforce reporting quality guidelines23–25 to impartially compare and contextualise our findings. Second, the lack of broadly applicable or unified guidelines to compare across exercise and pharmacological therapy RCTs also merits consideration. Guidelines used to evaluate the quality of RCT reporting were either different between study types (ie, CONSORT-NPT6 vs CONSORT7), developed specifically for harms reporting in pharmacological trials,31 or investigator-derived given that there are formal standards for non-pharmacological (ie, TIDieR), but not pharmacological, intervention reporting. We controlled for differences in the numbers of evaluable and applicable items across the reporting quality guidelines and used four matching criteria to control the influence of differences in (1) journal editorial standards and policies, (2) population-specific research methods and standards, and (3) the methods, resources, and infrastructure required to conduct smaller versus larger trials. Future research could be strengthened by the establishment of standardised matching criteria to facilitate comparisons between branches of biomedical research. Third, we did not update the search following the extraction of the 96 included studies published from 2008 to 2018, which may have introduced bias related to search recency. However, the association between year of publication and reporting quality was evaluated and discussed as herein. Finally, we acknowledge that using non-specific assessment tools (eg, using CONSORT-NPT to evaluate exercise trials or TIDieR to evaluate pharmacological interventions) potentially introduces measurement bias. We limited our evaluations and comparisons to include only reporting and conduct quality items that were applicable to the type of intervention to address this concern and selected 6 of TIDieR’s 16 items to facilitate comparisons of intervention reporting quality between exercise and pharmacological RCTs. Development of discipline-specific measurement tools such as CONSORT extensions for acupuncture interventions148 and patient-reported outcomes149 may be needed to improve reporting of exercise trials.

In summary, the overall quality of research reporting and conduct within exercise RCTs is suboptimal and inferior to pharmacological RCTs. Stricter adherence to established guidelines and inventories is warranted to facilitate the generation of high-quality evidence needed to optimise the safety, efficacy and implementation of exercise therapy in clinical populations.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplemental information.

Ethics statements

Acknowledgments

We would like to thank the corresponding authors for their time and effort in providing us with supplemental information during the author contact phase of the review.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @DrAdamsSC, @jessicalavs, @maggiecaat, @DR_SantaMina, @cardiac_fitness

  • JMS and LWJ contributed equally.

  • Contributors LWJ conceived the study idea. SCA and JMS coordinated the systematic review. SCA, LWJ and JMS wrote the first draft of the manuscript. KM designed the search strategy. KS and JM screened abstracts and full texts. JM, KS, MMZC and SCA acquired the data. JM, KS and SCA judged risk of bias in the studies. JL and CSM performed the data analyses. SCA, JM, KS, KM, JL, CSM, MMZC, DSM, JMS and LWJ interpreted the data analysis. SCA, JM, KS, KM, JL, CSM, MMZC, DSM, JMS and LWJ critically revised the manuscript. LWJ had full access to all study data and takes responsibility for the integrity of the data and the accuracy of the data analysis. The findings of this study have been presented clearly, honestly, and without fabrication, falsification, or inappropriate data manipulation. LWJ is the guarantor.

  • Funding LWJ and JMS are supported by research grants from the National Cancer Institute. LWJ and JMS are supported by AKTIV Against Cancer. JL, CSM, JMS and LWJ are supported by the Memorial Sloan Kettering Cancer Center Support Grant/Core Grant (P30 CA008748).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.