Article Text
Abstract
Objective To measure the frequency of adequate methods, inadequate methods and poor reporting in published randomised controlled trials (RCTs) and test potential factors associated with adequacy of methods and reporting.
Design Retrospective analysis of RCTs included in Cochrane reviews. Time series describes the proportion of RCTs using adequate methods, inadequate methods and poor reporting. A multinomial logit model tests potential factors associated with methods and reporting, including funding source, first author affiliation, clinical trial registration status, study novelty, team characteristics, technology and geography.
Data Risk of bias assessments for random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data and selective reporting, for each RCT, were mapped to bibliometric and funding data.
Outcomes Risk of bias on six methodological dimensions and RCT-level overall assessment of adequate methods, inadequate methods or poor reporting.
Results This study analysed 20 571 RCTs. 5.7% of RCTs used adequate methods (N=1173). 59.3% used inadequate methods (N=12 190) and 35.0% were poorly reported (N=7208). The proportion of poorly reported RCTs decreased from 42.5% in 1990 to 30.2% in 2015. The proportion of RCTs using adequate methods increased from 2.6% in 1990 to 10.3% in 2015. The proportion of RCTs using inadequate methods increased from 54.9% in 1990 to 59.5% in 2015. Industry funding, top pharmaceutical company affiliation, trial registration, larger authorship teams, international teams and drug trials were associated with a greater likelihood of using adequate methods. National Institutes of Health funding and university prestige were not.
Conclusion Even though reporting has improved since 1990, the proportion of RCTs using inadequate methods is high (59.3%) and increasing, potentially slowing progress and contributing to the reproducibility crisis. Stronger incentives for the use of adequate methods are needed.
- statistics & research methods
- health informatics
- health policy
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
This work combines the strengths of expert human assessments with data science techniques to build a comprehensive database on biomedical research quality, including the full-text and systematic assessment of randomised controlled trial (RCT) methods with bibliometric and funding information in a sample of 20 571 RCTs.
The study analyses trends in methods and reporting over 25 years and identifies factors associated with biomedical research quality including funding source, first author affiliation, clinical trial registration status, study novelty, team characteristics, technology and geography.
PubMed identifier, full-text and/or funding information were not available for all RCTs. 30.5% of RCTs (unpublished or published in journals not indexed in PubMed) did not have a PubMed identifier. 43.2% of RCTs with PubMed identifier did not have a full text available from the Harvard Library. 23.6% of included RCTs were reported in articles disclosing National Institutes of Health or industry funding. Classification of sectors relies on primary reported affiliation.
Cochrane reviewers may have been able to obtain more information on more recent RCTs (from authors, registries or protocols rather than the primary report), suggesting some of the apparent improvement in reporting may reflect an improvement in access to study details.
This study does not identify causal mechanisms explaining biomedical research quality.
Introduction
The quality and reliability of biomedical research are of paramount importance to treatment decisions and patient outcomes. Flawed research conclusions can lead to poor treatment and harm patients. As much as 85% of the annual US$265 billion spent on biomedical research may be wasted due to inadequate methods.1–8
Previous scientific work aiming to evaluate the reliability of biomedical research has been limited by data and methodological issues. Data challenges included the time and resources necessary to assess methods and reporting, resulting in the use of small selected samples and/or limited information available for each scientific article evaluated in larger samples.9–28 As a result, it remains unknown what is the overall magnitude of waste due to inadequate methods and reporting in biomedical research and what factors are associated with the use of adequate vs inadequate research methods.
To address these questions, this study combines the full text of randomised controlled trials (RCTs) and systematic assessment of study methods with bibliometric and funding information in a large sample of RCTs included in ‘gold-standard’ systematic reviews. The study describes the evolution of adequate research methods and reporting over time. A multinomial logit model tests potential factors associated with methods and reporting, including funding source, first author affiliation, clinical trial registration status, study novelty, team characteristics, technology and geography.
Methods
This work combines the strengths of human expert assessments with data science techniques to build a comprehensive database on biomedical research quality, including full text, systematic assessment of study methods, bibliometric and funding information in a sample of 20 571 RCTs. Python V.3.6 and Stata V.15 were used to assemble the database and conduct the analysis.
Data
Cochrane reviews constitute a valuable data source to assess biomedical research quality as they follow strict methods and precise reporting guidelines defined in the Cochrane Handbook.29 30 This study does not involve new assessment of the methods and reporting of included RCTs, but relies entirely on the assessments available in the Cochrane reviews, which are systematically performed by two expert reviewers who compare their assessments and reach consensus on the final assessment.29 The research method dimensions evaluated in Cochrane reviews include random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data and selective reporting (detailed in online supplementary table A1).31
Supplemental material
The database assembly had seven steps: (1) All included references were extracted from each review, including PubMed identifiers, (2) all risk of bias assessments on the six dimensions of the 2011 update of the Cochrane Risk of Bias Assessment Tool (see online supplementary table A1) were extracted from each review. Each assessment included three variables: bias type (eg, random sequence generation), judgment (eg, low risk) and support for judgment (eg, computer random number generator), (3) each RCT was matched with its main published reference as identified by Cochrane reviewers, (4) PubMed records corresponding to these publications, including bibliometric information and first author affiliation, were retrieved using the E-utilities public application programming interface (API), (5) affiliation information for other authors (not available from PubMed over the study period) was retrieved from SCOPUS, (6) full text for references with PubMed identifier were retrieved from the Harvard Library, and (7) industry funding information was extracted from the full-text.
Sector affiliation with university, government, hospital, non-profit, top pharmaceutical company or other firm, as well as geographical variables were derived from the first author affiliation address. Top 25 universities were identified using the 2007 Academic Ranking of World Universities in Clinical Medicine and Pharmacy (see online supplementary material appendix A). Firms were classified as top pharmaceutical companies or other firms using the listing of pharmaceutical companies with a revenue greater than US$10 billion in any year since 2011 (see online supplementary material appendix B). Technologies were retrieved from the keywords and abstracts of the Cochrane Reviews. Private funding information was retrieved from the full-text of the main reference.
Sample
Figure 1 summarises the data flow. All RCTs assessed for risk of bias after 2011 (update of the Risk of Bias Assessment Tool) and through October 2017 were considered for inclusion (63 748 RCTs included in 4195 reviews). This list of Cochrane reviews is reported in online supplementary appendix C.
Criteria for study inclusion were: (1) the review included all six assessments (to allow comparison of the overall use of adequate methods, inadequate methods and poor reporting across reviews) (1988 reviews dropped), (2) the article reporting the study was referenced in PubMed (to allow bibliometric data to enter the analysis) (9201 RCTs dropped), (3) duplicates were removed and (4) RCTs assessed multiple times with different outcomes (eg, high risk in one review, unclear risk in another) were dropped (404 RCTs dropped).
Applying these criteria, the analysis sample for the descriptive statistics and the time series of methods included 20 571 RCTs. A full-text PDF was available from the Harvard Library for 11 686 RCTs. This subsample was needed to retrieve private funding information from the full text of the paper and constitutes the analysis sample for those regressions including funding information.
Analysis
The outcomes were risk of bias on the six assessed methodological dimensions and RCT-level assessment of adequate methods, inadequate methods or poor reporting. The six methodological dimensions assessed included (1) random sequence generation, (2) allocation concealment, (3) blinding of participants and personnel, (4) blinding of outcome assessment, (5) incomplete outcome data and (6) selective reporting (detailed in online supplementary table A1). The category ‘other bias’ was not used in this study, as it includes concerns not necessarily about methods or reporting, such as conflicts of interest.
Following guidelines for assessing the quality of evidence31 and previous empirical work,7 the RCT-level assessment was ‘adequate methods’ if the study was at low risk of bias on all dimensions assessed. It was ‘inadequate methods’ if the study was at high risk of bias for one or more reasons. It was ‘poorly reported’ if the reviewers did not have enough information to assess whether the methods used were adequate or inadequate (if the study was at ‘unclear’ risk of bias).
Several reasons support the use of at least one high risk of bias assessment as the definition for inadequate methods. Some risk of bias domains might translate into more statistical bias than others, but empirical evidence on the relative importance of the risk of bias domains is limited, and the effect of several versus one high risk assessment on research outcomes is unknown.32 33 The empirical relationship between risk of bias assessments and research outcomes (including actual statistical bias) requires further research.
There is also a theoretical reason to use at least one high risk of bias assessment as the definition of method inadequacy. Cochrane risk of bias domains can be mapped to important conditions to make RCTs valuable. If not truly randomised or if differences between the treatment and control group are introduced post randomisation, an RCT does not produce an unbiased estimate of the treatment effect.34 These two conditions imply that one inadequacy in the randomisation process (non-random sequence generation or inadequate allocation concealment), or one difference introduced post randomisation between the treatment and control groups (through inadequate blinding of participants, personnel or outcome assessors) or after the trial (due to incomplete outcome data or selective reporting) should be the default threshold for assessing methods adequacy.
Two analyses were performed. The first reports the time series of the proportion of RCTs using adequate methods, inadequate methods and poor reporting, for each dimension and in aggregate. The second tests whether adequate methods, inadequate methods and poor reporting are associated with funding source (National Institutes of Health (NIH) grant or industry funding), sector affiliation of first author (top university, other university, government, hospital, non-profit, top pharmaceutical company and other Firm), other industry affiliation, clinical trial registration status, study novelty (first or subsequent study on a particular research question), team characteristics (number of authors and international collaboration), technology (drug, device, surgery, behavioural intervention or other intervention) and geography of first author (Canada, Europe, UK, USA or other country). A multinomial logit model using these variables predicts overall adequate methods, inadequate methods and poor reporting, as well as risk of bias along each dimension assessed.
Patient involvement
Patients were not involved in any aspect of the study design, conduct or in the development of the research question or outcome measures. As a meta-research study based on existing published research, there was no patient recruitment for data collection.
Results
Prevalence of adequate methods, inadequate methods and poor reporting
Table 1 presents descriptive statistics. Only 5.7% of RCTs used adequate methods on all six dimensions (n=1173). 59.3% used inadequate methods on at least one dimension (n=12 190) and 35.0% were poorly reported (n=7208).
Figure 2 shows the proportion of RCTs at low, high or unclear risk of bias for random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data and selective reporting, for all RCTs assessed on all six dimensions (n=20 571). Thirty-eight per cent of trials used inadequate methods for blinding of participants and personnel. A total of 15%–20% of trials used inadequate methods for blinding of outcome assessment (20%), incomplete outcome data (19%) and selective reporting (15%). The proportion of trials using inadequate methods for random sequence generation and allocation concealment was lowest (respectively, 5% and 7%), but these two dimensions were frequently poorly reported (respectively, 47% and 58% of trials).
Methods and reporting over time
Figure 3 shows the overall proportion of RCTs using adequate methods, inadequate methods and poorly reported methods by year of publication. The proportion of poorly reported RCTs decreased, five percentage points per decade, from 42.5% in 1990 to 30.2% in 2015. The proportion of RCTs using adequate methods increased linearly, three percentage points per decade, from 2.6% in 1990 to 10.3% in 2015. The proportion of RCTs using inadequate methods increased from 54.9% in 1990 to 59.5% in 2015.
Reporting improved on all dimensions. The proportion of RCTs using adequate methods for random sequence generation, allocation concealment, blinding of outcome assessment, incomplete outcome data and selective reporting increased. However, the proportion of trials using inadequate methods for blinding of participants and personnel increased.
Figure 4 provides graphs similar to figure 3 for all RCTs assessed on at least one dimension (n=63 748). Similar patterns suggest that the evolution over time observed for the RCTs assessed on all dimensions (n=20 571) reflects the evolution over time in all RCTs assessed on at least one dimension.
Factors associated with methods and reporting
Figure 5 reports regression results from a multinomial logit model predicting overall quality. Online supplementary tables A2 and A3 report all regression results.
Public funding was not associated with the overall use of adequate methods. However, NIH-funded RCTs were less likely to use inadequate methods for random sequence generation (RR=0.29, p<0.001) and allocation concealment (RR=0.51, p<0.001). Industry-funded RCTs were slightly more likely to use adequate methods (RR=0.84, p<0.05), because of better blinding of participants and personnel (RR=0.87, p<0.05).
First author affiliation with a top pharmaceutical company was associated with increased use of adequate methods (RR=0.43, p<0.01). First author affiliation with a top university was not.
Registered trials (RR=0.42, p<0.001), larger authorship teams (RR=0.95, p<0.001), international teams (RR=0.51, p<0.01) and RCTs on drugs (RR=0.50, p<0.001) were less likely to use inadequate methods. RCTs on medical devices were more likely to use inadequate methods (RR=1.71, p<0.01).
Discussion
In 1951, the first review assessing the quality of clinical trials found that only 27 of 100 were well controlled.35 36 Since, a steady stream of scholarly work periodically voiced concerns about the quality of medical research.1–8 37 38 Recent medical reversals39 and the reproducibility crisis40 have sharpened focus on medical research quality. Newly available large scale data and data science techniques provide powerful tools to measure the overall magnitude of the problem, investigate its determinants and provide an evidence base to inform the design and evaluation of future interventions. This study assessed whether methods and reporting improved over time and identified the characteristics of better and worse RCTs.
This study has six main results. First, in a large sample of RCTs assessed in systematic reviews, only 5.7% used adequate methods, 59.3% used inadequate methods, and 35.0% were poorly reported. Since the 1990s, reporting has improved. But in parallel with this improvement in reporting, the proportion of trials using both adequate and inadequate methods has increased.
The overall proportion of poorly reported trials decreased by about five percentage points per decade. This is good news but much remains to be done. At the current rate of improvement, it would take 50 years for 95% of RCTs to be adequately reported. These results are consistent with previous research finding improvements in reporting in several clinical areas such as physiotherapy10 and dentistry.26 The trends for each dimension assessed separately are also very similar to those found in another large sample of RCTs.25
This improvement in reporting happened over a period of time when the Consolidated Standards of Reporting Trials (CONSORT) statement, a minimum set of evidence-based reporting recommendations, and other initiatives, such as the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network, developed to improve reporting practices.41–45 Since the 1990s, the CONSORT statement has been endorsed by over 50% of the core clinical journals indexed in PubMed and may improve reporting of RCTs they publish.46 Spurred by the CONSORT statement, the EQUATOR Network was launched in 2008 in the UK to improve the reliability of medical publications by promoting transparent and accurate reporting of health research.47 Since, it has developed into a global initiative aiming to improve research reporting worldwide.36
In parallel with this improvement in reporting, the proportion of trials using both adequate and inadequate methods has increased. The linear increase in the proportion of RCTs using adequate methods is heartening. However, improvement in the use of adequate methods is even slower than improvement in reporting. At the current rate of improvement (three percentage points per decade), it would take more than a century for half the RCTs to use adequate methods. This finding is consistent with previous empirical results in small samples,23 but contrasts with research in larger samples analysing each methodological dimension separately to conclude that methods improved over time.25
Second, NIH-funded RCTs were not more likely to use adequate methods. This is surprising given the rigorous grant application process, shown to select better scientific proposals,48 and the public stakes in the reliability of publicly funded research.49 Notably, the efforts of the NIH to address the reproducibility crisis began just at the end of the study period.50
Third, top pharmaceutical company affiliation was significantly associated with better methods. Affiliation with other companies was not. Heterogeneity across firms may explain inconsistency of previous research on the effect of industry funding or affiliation on research methods and outcomes.28 51
Fourth, university prestige was not associated with greater use of adequate methods. The current scientific reward system focuses on numbers of publications and citations rather than the assessment of research methods.52 The resulting incentives affect both scientists and institutions, as through the allocation of grant funding.53 54 Thus, in a climate of hypercompetition,55 the use of adequate methods and reporting might yield little reward while exposing scientists to better informed scrutiny.
Fifth, team size and international collaboration were associated with greater use of adequate methods. Increasing the number of authors by one was associated with a small, but highly significant improvement in methods and reporting. Many RCTs are published by large teams so it is not surprising that the effect of one additional author was small. But this effect was also highly significant, consistent with previous research finding that larger teams and international teams produce more frequently cited research.56 57 Team characteristics associated with performance in other settings open avenues for future research.58 59
Finally, RCTs on drugs were more likely to use adequate methods than RCTs on other interventions, while RCTs on devices were more likely to use inadequate methods. In many countries, trials on drugs are more tightly regulated than trials on devices. In the USA, under the Federal Food, Drug and Cosmetic Act (1938), drugs and devices face different premarket review and postmarket compliance requirements. The finding is also consistent with specific barriers to the conduct of RCTs on medical devices, in particular for randomisation and blinding, and with the lack of scientific advice and regulations for medical device trials.60 RCTs on drugs were using better methods and reporting than RCTs on other interventions, but much remains to be done. This finding is consistent with previous work showing that even RCTs used in the drug approval process frequently use inadequate methods and reporting.61
Future research should carefully evaluate the effect of method adequacy on research outcomes, and identify successful strategies and incentives to accelerate the diffusion of good reporting practices and the adoption of adequate methods. Given the size of the medical research industry and its effect on human lives, successful evidence based policies could have tremendous impact.
Limitations
PubMed identifier, full-text and/or funding information were not available for all RCTs. 30.5% of RCTs (unpublished or published in journals not indexed in PubMed) did not have a PubMed identifier. 43.2% of RCTs with PubMed identifier did not have a full text available from the Harvard Library. 23.6% of included RCTs were reported in articles disclosing NIH or industry funding. Classification of sectors relies on primary reported affiliation. This paper does not identify causal mechanisms explaining biomedical research quality.
Cochrane reviewers may have been able to obtain more information on more recent RCTs (from authors, registries or protocols rather than the primary report), suggesting that some of the apparent improvement in reporting may in fact be an improvement in access to study details.
Conclusion
Even though reporting has improved since 1990, the proportion of RCTs using inadequate methods is high (59.3%) and increasing, potentially slowing progress and contributing to the reproducibility crisis. Stronger incentives for the use of adequate methods are needed.
Acknowledgments
The author thanks David Cutler, Richard Freeman, Mack Lipkin, Tim Simcoe, Ariel Stern, Griffin Weber, Richard Zeckhauser and the participants at the NBER-IFS International Network on the Value of Medical Research meetings for helpful conversations and feedback, and Harvard Business School Research Computing Services for technical advice and support. The author also thanks the editors and the two referees, Paul Glasziou and Simon Gandevia, for their most helpful and generous feedback and clear guidance.
References
Footnotes
Contributors MC designed the study, performed the analysis, interpreted the results, wrote the manuscript and approved the final version to be published. MC accepts full responsibility for the work and the conduct of the study, had access to the data and controlled the decision to publish.
Funding MC gratefully acknowledges support by the National Institute on Aging of the National Institutes of Health under Award Number R24AG048059 to the National Bureau of Economic Research (NBER).
Disclaimer The content of this article is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health or the NBER.
Competing interests MC has completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declares: the author reports grants from the National Institute on Aging of the National Institutes of Health during the conduct of the study.
Patient consent for publication Not required.
Ethics approval This is a meta-research study.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data sources necessary to reproduce the analysis are described in the main text or the online supplementary material. No additional data available.