Article Text
Abstract
Objective We assessed how well articles in major medical and psychiatric journals followed best reporting practices in presenting results of intervention studies.
Method Standardised data collection was used to review studies in high-impact and widely read medical (JAMA, Lancet and New England Journal of Medicine) and psychiatric (American Journal of Psychiatry, JAMA Psychiatry, Journal of Clinical Psychiatry and Lancet Psychiatry) journals, published between 1 September 2018 and 31 August 2019. Two team members independently reviewed each article.
Measures The primary outcome measure was proportion of papers reporting consensus elements required to understand and evaluate the results of the intervention. The secondary outcome measure was comparison of complete and accessible reporting in the major medical versus the major psychiatric journals.
Results One hundred twenty-seven articles were identified for inclusion. At least 90% of articles in both medical and psychiatric journals included sample size, statistical significance, randomisation method, elements of study flow, and age, sex, and illness severity by randomisation group. Selected elements less frequently reported by either journal type were confidence intervals in the abstract, reported in 93% (95% CI 84% to 97%) of medical journal articles and 58% (95% CI 45% to 69%) of psychiatric journal articles, and sample size method (93%, 95% CI 84% to 97% medical; 69%, 95% CI 57% to 80% psychiatric), race and ethnicity by randomisation group (51%, 95% CI 40% to 63% medical; 73%, 95% CI 60% to 83% psychiatric), and adverse events (94%; 95% CI 86% to 98% medical; 80%, 95% CI 68% to 88% psychiatric) in the main text. CIs were included less often in psychiatric than medical journals (p<0.004 abstract, p=0.04 main text, after multiple-testing correction).
Conclusions Recommendations include standard inclusion of a table specifying the outcome(s) designated as primary, and the sample size, effect size(s), CI(s) and p value(s) corresponding to the primary test(s) for efficacy.
- STATISTICS & RESEARCH METHODS
- PSYCHIATRY
- GENERAL MEDICINE (see Internal Medicine)
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
A standardised questionnaire assessed the inclusion of key elements in reporting original findings on clinical interventions in 1 year of papers from seven high impact journals, with a focus on clarity of communication.
The articles surveyed represented those likely to reach and influence large numbers of clinicians and clinical investigators in general medicine and psychiatry.
All articles were reviewed by two team members experienced in psychiatric research.
The results do not necessarily reflect the completeness of reporting in other journals.
Because the questionnaire was developed specifically for the study, results may not be directly comparable to results from prior studies.
Introduction
Medicine advances study by study and report by report. To ensure that progress, manuscripts must clearly communicate key elements of their study design and their findings to individual readers, both clinicians and researchers. The peer review process of medical journals helps ensure completeness and clarity. The introduction, expansion and revision of standardised reporting guidelines for intervention studies, such as the broadly accepted Consolidated Standards of Reporting Trials (CONSORT) guidelines for randomised trials,1–3 have improved the quality and comprehensiveness of scientific reporting, by allowing informed evaluation of study quality, including potential sources of bias, and requiring standardised and complete reporting practices. These elements allow physicians to better assess the potential benefits of interventions for their patient populations and provide investigators the details they need to perform new studies. However, the guidelines only provide these advantages if they are used. As readers, we have noticed some differences in the completeness and clarity of reporting from journal to journal and paper to paper.
Previous studies have examined reporting practices for randomised trials, many of which have focused on specific disciplines, locations or designs. Reviews synthesising results across studies of adoption of CONSORT guidelines have found improvement in some areas but still inconsistent implementation of certain recommended practices.4 5 More recently, a large study of methodological quality using automated methods to examine more than 175 000 trials published between 1966 and 2018 found improvement over time in risk of bias, lower risk in high impact journals and marked increases in trial registration and inclusion of a CONSORT statement since 2010, though both remained below 50%.6
Few studies have focused on psychiatry. A study comparing reporting practices for randomised trials before and after CONSORT found inconsistent (<80% of studies) inclusion of recommended elements of study reporting, including clearly defined outcomes and study flow, for articles published post-CONSORT 2002–2007.7 Similarly, a systematic review of adherence to reporting practices for articles on social and psychological intervention (SPI) trials published in 2010 found low (<60%) inclusion rates for elements of trial design, data analysis and study flow.8 More recently, a study of reporting quality of abstracts for psychiatric randomised controlled trials published 2012–2014 found improved but still inconsistent inclusion of recommended elements, including definition of the primary outcome and effect size and precision, following publication of CONSORT for abstracts.9 All studies focused on high-impact journals. Together these results demonstrate that consensus recommendations have an impact, and that practices are improving over time, but that basic information required for evaluating the benefit of interventions may still not be easily available for a substantial percentage of studies.
Our study provides an update on adherence to key practices in high impact general and medical psychiatric journals, with a focus on accessibility of information to the reader. To assess how well published articles reporting results of intervention studies follow best reporting practices and effectively communicate key elements of study design and results, our research team conducted a review of studies published in several major medical and psychiatric journals over a 12-month period using a standardised data collection tool. The inclusion of journals specialising in psychiatry follows our own subspecialty interests and expertise. In addition, it allowed representation of journals focusing on a single discipline and increased the representation of non-pharmacological therapeutic intervention studies for which gold-standard double-blind randomised designs are often not employed. We expected that, due to both complexities sometimes associated with the design of nonpharmacological intervention studies and differences in reporting expectations for those studies, some key elements would be reported less often in psychiatric journals than general medical journals.
Increasingly, preregistration requirements10 and publication of study protocols ensure that detailed information on design and results for studies are available to those with time and resources to find them, so our emphasis was not on whether elements of the study design were technically included in the body of the research article. Rather, our emphasis was on whether information on those elements of study design were easily identifiable and interpretable by readers—in the case of our group, readers with a background in research. This paper reports the results of our review, identifies potential sources of confusion for readers based on common points of disagreement within our research team. Finally, based on our findings, we make suggestions for practical modifications to facilitate future communication.
Methods
The senior author of our research team (BC) reviewed the titles of all research articles in three high-impact and widely read medical journals (JAMA, The Lancet and The New England Journal of Medicine), and four high-impact and widely read psychiatric journals (The American Journal of Psychiatry, JAMA Psychiatry, The Journal of Clinical Psychiatry and Lancet Psychiatry), published in the 1-year period between 1 September 2018 and 31 August 2019, to identify those articles reporting on the effectiveness of an intervention. Online prepublications were not reviewed. The journals chosen were peer reviewed, clinically rather than basic science oriented, with a history of publishing a substantial number of papers on treatment interventions, general in their focus within medicine or psychiatry, rather than focused on particular illnesses or conditions in their area, and widely read, with a combination of high impact factors and high number of paper downloads. Interventions included pharmacological treatments, surgical procedures and psychotherapy and psychosocial therapies. Studies reporting only on safety were excluded. Abstracts, and article text as necessary, were reviewed to verify appropriateness for inclusion. Articles were then randomly ordered, and each was assigned to two of five members of our research team: 3 psychiatrists (BC, PH, DO), 1 research coordinator (SB) and 1 biostatistician (CR), all with more than 10 years of experience in psychiatric research.
Each article was reviewed independently by its two assigned team members, and each reviewer completed a standardised checklist developed by the study team. Items on the checklist were based on published recommendations for scientific and statistical reporting,1–3 11 though our checklist was not intended to comprehensively cover or correspond to any all-inclusive set of recommendations. Rather, items included were those the authors considered most key to providing information about the quality of the evidence (eg, randomisation, sample size) and generalisability of clinical research findings (eg, clinical/demographic characteristics of sample, adverse events). The checklist is included as a online supplemental file 1. Reviewers were instructed to spend approximately 30–60 min reviewing each article. That length of time was considered adequate to find the descriptions sought, if they were easily discoverable in the manuscript. A complete copy of the checklist is available in a online supplemental file 1. Responses to the checklist were recorded using REDCAP. Articles were reviewed from 9 August 2019, through 15 December 2020.
Supplemental material
After completion of all article reviews, responses were tabulated and checked for agreement between reviewers. Cases of disagreement about key characteristics of the articles and interventions, including type of intervention, comparison group, use of randomisation, and designation as a preliminary or pilot study, were resolved by discussion between the first and senior authors (CR and BC). For other items, a statistic or attribute was considered present in the article if either reviewer reported it on the checklist. For items reported conditional on the presence of another item (eg, reporting of a multiple testing correction in the presence of multiple tests of efficacy designated as primary), agreement between reviewers that the conditional item was present was required. All article attributes recorded are categorical and were summarised using frequencies and percentages across all journals as well as for general medical and psychiatric journals separately. Ninety-five per cent CIs were calculated using Wilson’s method. Frequency of reporting was compared between general medical and psychiatric journals using Fisher’s exact test. A Hommel multiple-testing correction was applied post-hoc to account for the total number of statistical tests conducted. Articles with one missing reviewer response were included as instances of disagreement for calculations of agreement and included in frequencies based on the response of the other reviewer. Data analysis was conducted using SAS (V.9.4, SAS Institute) and R (V.4.1.2) software.
Patient and public involvement
Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research as this was not appropriate or possible.
Results
One hundred twenty-seven articles were identified for inclusion. Characteristics of the intervention studies are presented in table 1. The most common type of intervention studied both for articles in general medical journals and specialty psychiatric journals was pharmacological treatment (‘drug’), followed by procedures, for articles in general medical journals, and psychotherapy or psychosocial therapy, for articles in psychiatric journals. Almost all studies (94%) were randomised, and few (5%) were designated as a preliminary or pilot study in the abstract.
Results on statistics reported in the abstract are presented in table 2. Almost all abstracts (97%, 95% CI 92% to 99%) reported sample size, and most reported effect sizes (92%, 95% CI 86% to 96%), p values (80%; 95% CI 72% to 86%), and CIs (76%, 95% CI 68% to 83%). CIs were provided in the abstracts of 93% (95% CI 84% to 97%) of medical journal articles but only 58% (95% CI 45% to 69%) of psychiatric journal articles. Reporting on multiple testing correction in the abstract was rare (4%, 95% CI 2% to 9%).
Results on reporting practices in the body of the articles are presented in table 3. Randomisation method and most elements of study flow (number enrolled, number randomised, number completed) were almost always reported (≥95%), with number screened reported slightly less often (92%, 95% CI 86% to 96%). Sample size was reported in all articles, but the method for determining sample size was reported less consistently, particularly for articles in psychiatric journals (93%, 95% CI 84% to 97% medical; 69%, 95% CI 57% to 80% psychiatric). Almost all articles that reported a method for determining sample size reported a power analysis (98%, 95% CI 92% to 99%). Some characteristics of the sample by intervention group were reported for all studies, with age, sex and illness severity reported for all or almost all studies (≥95%). Race and ethnicity were less consistently reported, with 61% (95% CI 53% to 69%) of all articles and 51% (95% CI 40% to 63%) of medical journal articles reporting either race or ethnicity. Adverse events were reported in 87% (95% CI 81% to 92%) of all articles and 80% (95% CI 68% to 88%) of psychiatric journal articles. And concomitant medications were only reported in about half the studies, in either general medical or psychiatric journals. Key statistics for efficacy results, including effect sizes and CIs for outcomes designated as primary, p values, and CIs, were reported for most but not all studies. CIs for efficacy results were reported more consistently in medical journals (99%, 95% CI 92% to 100%) than psychiatric journals (81%, 95% CI 70% to 89%). Among 105 articles with a supplement reviewed by both raters, there were four instances when raters agreed demographic information was reported only in the supplement (4%, 95% CI 1% to 9%) and no instances when key statistics were reported only in the online supplemental file 1. Though agreement between reviewers was acceptable (>80%) for most items reported, agreement was less than 80% for 12 of the 38 items and less than 60% for four items: clear statement of one hypothesis in the abstract (58%), statement that primary tests were chosen before the study was performed (40%), missing data method specified (56%), and other sample characteristics reported (53%).
Not all results from the survey are included. We intended to report on use of blinding and stratified randomisation, but application of precise definitions of these elements of study design based on the information reported in the articles was not feasible. For many papers, reviewers could not clearly determine and agree on the number of outcomes, comparisons, and tests for efficacy reported, and results dependent on identifying or quantifying the number of specific outcomes, comparisons and tests. Therefore, results on these measures are excluded. Results on statistical testing for differences in baseline characteristics and use of specific statistics to characterise efficacy are not included due to the need to interpret these results in the context of specific study designs, which varied highly among papers. Answers to open-ended questions were not tabulated except as needed to characterise the type of intervention.
Discussion
Many important elements of study design and results were reported consistently and clearly across studies. Of note, most elements of study flow were identified by our team members for more than 95% of articles, which likely reflects the expected and often journal-required inclusion of CONSORT flow diagrams for intervention studies.1 2 Most surprising to us was the low agreement between our team members for some items, including the number of tests designated as primary or key for demonstrating efficacy and the sample size used for those tests. In discussions among us, it was noted that papers were often inconsistent in reporting specifics on tests and subjects in different places in the text or in reporting what were prior versus subsequently determined hypotheses of interest. Although disagreement for some elements was probably due to differences in interpretation of the survey questions between team members, including differences in leniency in assigning credit for the reporting of some items, we believe disagreement about primary tests of efficacy reflects a true lack of clarity in many papers about which outcome or outcomes and corresponding statistical tests addressed the central research question of the study.
As expected, and consistent with findings of lower adherence to CONSORT recommendations of abstracts of psychiatric trials published in psychiatric specialty journals than medical journals,9 reporting of some items was less common in psychiatric journals than general medical journals. This observation reflects the need for increased attention to best reporting practices for psychiatric specialty journals and perhaps other high-profile specialty journals. Of note, CONSORT guidelines specific to SPIs were published in 2018.12 Most items reported less frequently in psychiatric journals were covered by the SPI checklist, including statement of a clearly defined primary outcome and inclusion of an associated effect size and precision in the abstract and inclusion of effect sizes and their precision in the body of the paper.
Limitations of the study include reviewer awareness of journal of publication and study authors at time of review and selection of articles from a limited number of journals. Awareness of journals and authorship at the time of review could have introduced personal bias due to expectations about journal quality, familiarity with other scholarship of the authors, or expectations that medical journals would adhere more strictly to reporting expectations than psychiatric journals. Because we selected journals with a high impact factor likely to reach a broad audience, our results are not generalisable to all medical and psychiatric journals. However, to the extent that reporting practices for the journals in our study differ substantially from reporting practices from the broader pool of medical and psychiatric journals, we expect that the included journals have stricter expectations. Finally, items in the checklist used to evaluate the articles do not correspond directly to items included in prior checklists or studies, which limits direct comparison of our results to results from other sources.
We recommend that the results sections of abstracts consistently designate those statistics corresponding to primary tests of efficacy. Simply using wording identifying a primary measure or outcome would suffice for this purpose. In addition, we recommend the standard inclusion of a table specifying the outcome or outcomes designated as primary, the sample size corresponding to the primary test or tests for efficacy reported in the article, and effect sizes, CIs, and p values corresponding to the primary test(s). The table should also designate whether study participants were randomised to the treatment tested and whether participants, treating clinicians, and any raters, or only some but not all of these participants and investigators, were blind to treatment assignment for assessment of any outcome designated as primary. Though there may be some instances in which this format does not correspond to the study design or statistical framework for the analysis, in most cases this format should be straightforward to apply, and some elements are already required by trial registries including clinicaltrials.gov. This suggestion on reporting of design and outcome elements is similar to inclusion of the CONSORT flow diagram, as required by many journals. Separation of this key information from the remainder of the text should simplify its identification for readers and reviewers, and particularly for clinicians who may have limited time to review and scrutinise research findings. Such a table, like the table for CONSORT design and reporting elements, would take up little additional space and greatly assist the reader in finding key information needed to understand a study and evaluate its results.
The consistency of reporting of most key statistics, including age and sex for each treatment group, is encouraging. However, information on race and ethnicity should be expected, along with age and sex, to encourage recruitment of representative samples in intervention studies and to inform readers on the generalisability of findings.13 14 CIs and methods for sample size determination should be reported with more consistency, particularly for psychiatric studies, where such reporting was too often lacking. Adverse events should be tracked and a clear statement on the occurrence of such events should be included for all studies, even if no events were observed.
Other additions that were often missing but would be helpful within the body of the research article include (1) an assessment by the authors whether the assumptions underlying the sample size determination for the study were met, (2) direct statements about the application of multiple testing corrections for studies with more than one primary test of efficacy, (3) a statement on the treatment of missing data, even if these statements would not be required for those familiar with the statistical methods implemented, and (4) concomitant medications in each studied group. These additions, and the suggested table, would not require a great deal of space but could substantially improve clarity.
In essence, we are suggesting that authors and journals should prioritise making the most fundamental aspects of each study easily accessible to the majority of readers. Some simple formatting and reporting changes, as noted above, would accomplish that purpose.
Data availability statement
Data are available upon reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors All authors were involved in the design of the study, reviewed articles, recorded results on the checklist and have made significant contributions. CR (guarantor) contributed to the design of the survey, performed statistical analyses, and drafted and edited the manuscript. SMB contributed to the design, drafting and conduct of the survey, organised the results and edited the manuscript. DO contributed to the drafting of survey and interpretation of results and edited the manuscript. PQH contributed to the drafting of the survey and edited the manuscript. BMC conceived the research question, chose the articles surveyed, contributed to the design and drafting of survey, interpreted results, and edited the manuscript. All authors have approved the final version of the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests Drs. Ravichandran, Cohen and Ongur and Ms Babb report no competing interests. Dr Harris is a consultant Medical Director for Aetna Behavioral Health. Dr. Öngür has grant support from NIH/NIMH: K24MH104449.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Author note All authors were involved in the design of the study, reviewed articles, recorded results on the checklist, and have made significant contributions. CR contributed to the design of the survey, performed statistical analyses, and drafted and edited the manuscript. SMB contributed to the design, drafting and conduct of the survey, organized the results, and edited the manuscript. DO contributed to the drafting of survey and interpretation of results and edited the manuscript. PQH contributed to the drafting of the survey and edited the manuscript. BMC conceived the research question, chose the articles surveyed, contributed to the design and drafting of survey, interpreted results, and edited the manuscript. All authors have approved the final version of the manuscript.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.