Article Text

Original research
Cross-sectional study of preprints and final journal publications from COVID-19 studies: discrepancies in results reporting and spin in interpretation
  1. Lisa Bero1,
  2. Rosa Lawrence2,
  3. Louis Leslie2,
  4. Kellia Chiu3,
  5. Sally McDonald3,
  6. Matthew J Page4,
  7. Quinn Grundy5,
  8. Lisa Parker6,
  9. Stephanie Boughton7,
  10. Jamie J Kirkham8,
  11. Robin Featherstone7
  1. 1 General Internal Medicine/Public Health/Center for Bioethics and Humanities, University of Colorado—Anschutz Medical Campus, Denver, Colorado, USA
  2. 2 Center for Bioethics and Humanities, University of Colorado - Anschutz Medical Center, Denver, Colorado, USA
  3. 3 Charles Perkins Centre and School of Pharmacy, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
  4. 4 School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
  5. 5 Faculty of Nursing, University of Sydney, Toronto, Ontario, Canada
  6. 6 Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia
  7. 7 Editorial and Methods Department, Cochrane, London, UK
  8. 8 Biostatistics, Manchester University, Manchester, UK
  1. Correspondence to Dr Lisa Bero; LISA.BERO{at}CUANSCHUTZ.EDU

Abstract

Objective To compare results reporting and the presence of spin in COVID-19 study preprints with their finalised journal publications.

Design Cross-sectional study.

Setting International medical literature.

Participants Preprints and final journal publications of 67 interventional and observational studies of COVID-19 treatment or prevention from the Cochrane COVID-19 Study Register published between 1 March 2020 and 30 October 2020.

Main outcome measures Study characteristics and discrepancies in (1) results reporting (number of outcomes, outcome descriptor, measure, metric, assessment time point, data reported, reported statistical significance of result, type of statistical analysis, subgroup analyses (if any), whether outcome was identified as primary or secondary) and (2) spin (reporting practices that distort the interpretation of results so they are viewed more favourably).

Results Of 67 included studies, 23 (34%) had no discrepancies in results reporting between preprints and journal publications. Fifteen (22%) studies had at least one outcome that was included in the journal publication, but not the preprint; eight (12%) had at least one outcome that was reported in the preprint only. For outcomes that were reported in both preprints and journals, common discrepancies were differences in numerical values and statistical significance, additional statistical tests and subgroup analyses and longer follow-up times for outcome assessment in journal publications.

At least one instance of spin occurred in both preprints and journals in 23/67 (34%) studies, the preprint only in 5 (7%), and the journal publications only in 2 (3%). Spin was removed between the preprint and journal publication in 5/67 (7%) studies; but added in 1/67 (1%) study.

Conclusions The COVID-19 preprints and their subsequent journal publications were largely similar in reporting of study characteristics, outcomes and spin. All COVID-19 studies published as preprints and journal publications should be critically evaluated for discrepancies and spin.

  • ethics (see Medical Ethics)
  • public health
  • qualitative research

Data availability statement

Data are available in a public, open access repository. Data from this study are available in OSF project file (https://osf.io/5ru8w/).

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Strengths and limitations of this study

  • We examine two critical threats to research integrity—components of outcome reporting and the presence of spin—in COVID-19 studies on treatment or prevention published as preprints and journal publications.

  • We selected studies from the Cochrane COVID-19 Register rather than conducting a literature search to optimise the identification of COVID-19 clinical research that is useful for systematic reviews.

  • We may have identified a different number of discrepancies if we compared later versions of the preprint, rather than the first version, with the journal publication.

  • Although clinically important, our focus on COVID-19 research may not be representative of other types of research published as preprints, then journal publications.

  • We limited our sample to preprints which authors submitted to journals and that were published.

Introduction

Preprints have been advocated as a means for rapid sharing and updating of research findings, which could be particularly valuable during a pandemic.1 Preprints are non-peer-reviewed postings of research articles. Preprints have been a common form of publication in the natural sciences for decades, and more recently in the life sciences. In 2019, BMJ, Yale and Cold Spring Harbor Laboratory launched medRxiv, a preprint server dedicated to clinical and health sciences research.

In April 2020, medRxiv published between 50 and 100 COVID-19-related preprints daily.1 The accelerated pace of research related to COVID-19 has increased the potential impact and risk of using preprints. Widespread public dissemination of preprints may spread misinformation.2 A study comparing 34 preprints and 62 publications about therapies for COVID-19 found that publications had significantly more citations than the preprints (median of 22 vs 5.5 citations; p = 0.01), but there were no significant differences for attention and online engagement metrics.3

Most preprint servers conduct some type of screening prior to posting, commonly related to the scope of the article, plagiarism, and compliance with legal and ethical requirements,4 but preprints have not been peer reviewed and may not meet the methodological and reporting requirements of a journal. A review of the medRxiv preprint server 1 year after its launch found that 9967 of 11 164 (89%) of submissions passed screening.5 It is not clear whether or how preprint servers might screen for quality of results reporting or spin.6 7 Spin refers to specific reporting practices that distort the interpretation of results so that results are viewed more favourably.

Preliminary studies suggest that reporting discrepancies may exist between preprints and subsequent publications. However, there has been no systematic assessment of results reporting or spin between preprints and their final journal publications. Carneiro et al counted reported items from a checklist meant to cover common points from multiple reporting guidelines and found reporting quality to be marginally higher in journal articles, both in a set of bioRxiv preprints matched to their journal publication (n=56 article/group) and in an unmatched set (n=76 articles/group).8 An analysis of preprints from arXiv, a primarily physics/mathematics preprint server, and their journal publications using text comparison algorithms found little difference between preprints and published articles.9 However, an analysis of medRxiv and bioRxiv preprints related to COVID-19 pharmacological interventions found that only 24% (23/97) of preprints were published in a journal within 0–98 days (median: 42.0 days). Among these, almost half (11/23, 48%) had modifications in the title or results section, although the nature of these modifications is not described.10 An analysis of spin in preprints and journal publications for COVID-19 trials found a single difference between two matched pairs of preprints and their journal publications: the discussion of limitations in the abstract. Limitations were discussed in the abstract of one article, but not in its accompanying preprint.11 An analysis of 66 preprint–article pairs of COVID-19 studies found 38% had changes in study results, such as a numeric change in HR or a change in p value, and 29% had changes in abstract conclusions, most commonly from ‘positive without reporting uncertainty’ in the preprint to ‘positive with reporting of uncertainty’ in the article.12

The trustworthiness and validity of scientific publications, even after peer review, are weakened by a variety of problems.13 14 Selective and incomplete results reporting15 16 and spin17 18 are two critical threats, especially for clinical studies of treatment or prevention. These reporting practices could be particularly dangerous for users of COVID-19 research as they can inflate the efficacy of interventions and underestimate harms. Given the high prevalence, visibility, and potentially rapid implementation of COVID-19 research published as preprints, this study is the first to compare components of outcome reporting and the presence of spin in COVID-19 studies on treatment or prevention that are published both as preprints and journal publications.

Methods

The protocol for this study was registered in the Open Science Framework.19

Data source and search strategy

We sampled studies from the Cochrane COVID-19 Study Register (https://COVID-19.cochrane.org/), a freely available, continually updated, annotated reference collection of human primary studies on COVID-19, including interventional, observational, diagnostic, prognostic, epidemiological and qualitative designs. The register is ‘study based’, meaning references to the same study (eg, press releases, trial registry records, preprints, journal preproofs, journal final publications, retraction notices) are all linked to a single study identifier. References are screened for eligibility to determine if they are primary studies (eg, not opinion pieces or narrative reviews). Data sources for the Cochrane COVID-19 Study Register at the time of the search included ClinicalTrials.gov, the International Clinical Trials Registry Platform, PubMed, medRxiv and Embase.com. The Cochrane register prioritises medRxiv as a preprint source because an internal sensitivity analysis in May 2020 showed that 90% (166/185) of the preprints that were eligible for systematic reviews came from this source. The register also includes preprint records sourced from PubMed.

All studies in the register are classified by study design (interventional, observational, modelling, qualitative, other or unclear) and research aim (prevention, treatment and management, diagnostic/prognostic, epidemiology, health services research, mechanism, transmission, other). Studies may be classified as having multiple research aims. Four searches using the register’s search filters for study reference types (preprints and journal articles) and study characteristics (study type and study aim) were used to retrieve references with a study aim of (a) treatment and management or (b) prevention and classified as interventional or observational (see OSF (Open Science Framework) project for the complete search strategies: (https://osf.io/8qfby/)). As the register is updated daily, we repeated the search. The Cochrane COVID-19 Study Register was first searched by RF on 13 October, and updated on 29 October 2020. The results were exported to Excel and duplicates manually identified. The searches identified 297 references for 117 studies, with 67 (21 interventional, 46 observational) that met our inclusion and exclusion criteria for study selection (figure 1).

Figure 1

Flowchart of study inclusion.

Inclusion and exclusion criteria for study selection

We included studies of COVID-19 treatment or prevention identified in the search that had both a posted preprint and final journal publication.

We included studies with aims of diagnosis/prognosis, epidemiology, health services research, mechanism, transmission and other if they also had an aim coded as (a) treatment and management or (b) prevention. We excluded modelling studies, qualitative studies and studies that reported only descriptive data (eg, demographic characteristics). We screened all records for each included study to identify posted preprints and journal publications from each study. We excluded duplicates and records for protocols, trial registries, commentaries, letters to the editor, news articles and press releases. We excluded records that did not report results and non-English records.

We compared the preprint and journal publication for each included study. In the case of multiple preprints or journal publications reporting study results, we selected the first preprint version and the final journal publication that reported on similar study populations. This was to ensure that the preprint version evaluated in our study had not been altered in response to any comments, which could constitute a form of peer review, and that it was representative of the version most likely to be seen by clinicians, journalists and other research users as new research became available.

Data extraction

Ten investigators (LB, SLB, KC, QG, JJK, LL, RL, SMc, LP and MJP) working independently in pairs extracted data from the included studies. Discrepancies in data extraction were resolved by consensus. If agreement could not be reached, an investigator who was not part of the coding pair resolved the discrepancies. All extracted data from the included studies were stored in REDCap, a secure web-based application for the collection and management of data.20 We extracted data from both the medRxiv page and PDF for preprints and the online publication or PDF for journal articles, referring to the PDF if information differed. We extracted data on results reporting, presence of spin and study characteristics as described below.

Study characteristics

For each preprint, we recorded the earliest posting date; for each journal publication we extracted the submitted/received, reviewed, revised, accepted and published date(s), where available.

From each journal publication, we extracted: authors, title, funding source, author conflicts of interests, ethics approval, country of study and sample size. For the accompanying preprint, we determined if these study characteristics were also reported. If they were, and the content of the item differed between the preprint and publication, details of the discrepancy were recorded. In addition, we recorded discrepancies between the preprint and journal publication in demographic characteristics of study participants (eg, sex, race/ethnicity, diagnosis), discussion of limitations (regardless of whether there was a labelled limitations section or not), and tables and figures.

Primary outcomes

Our primary outcome measures were (1) discrepancies in results reporting between preprints and journal publications and (2) presence and type of spin in preprints and journal publications.

Results reporting

We collected data on discrepancies in (1) number of outcomes reported in preprints and journal publications and, for outcomes reported in both preprints and journal publications, (2) components of results reporting. For each journal publication and preprint, we recorded the number of outcomes reported, whether outcomes were reported only in the preprint or journal publication, and the outcome descriptor (eg, mortality, hospitalisation, transmission, immunogenicity, harms).

For outcomes that were reported in both preprints and journal publications, we collected data on components of outcome reporting based on recommendations for clinical study results reporting.16 21 We recorded whether there were discrepancies between any components of outcome reporting between journal publications and preprints. We extracted the text relevant to each discrepancy:

  • Measure (eg, PCR test).

  • Metric (eg, mean change from baseline, proportion of people).

  • Time point at which the assessment was made (eg, 1 week after starting treatment).

  • Numerical values reported (eg, effect estimate and measure of precision).

  • Statistical significance of result (as reported).

  • Type of statistical analysis (eg, regression, χ2 test).

  • Subgroup analyses (if any).

  • Whether outcome was identified as primary or secondary.

Spin

Studies have used a variety of methods to measure spin in randomised controlled trials and observational studies.17 Based on our previously developed typology of spin derived from a systematic review of spin studies,17 we developed and pretested a coding tool for spin that can be applied to both interventional and observational studies of treatment or prevention. In the context of research on treatment or prevention of COVID-19, the most meaningful consequences of spin are overinterpretation of efficacy and underestimation of harms. Therefore, our tool emphasises these manifestations of spin. We searched the abstracts and full text of each preprint and journal publication for three primary categories of spin, and accompanying subcategories:

  1. Inappropriate interpretation given study design.

    • Claiming causality in non-randomised studies.

    • Interpreting a lack of statistical significance as equivalence.

    • Interpreting a lack of statistical significance of harm measures as safety.

    • Claim of any significant difference despite lack of statistical test.

    • Other.

  2. Inappropriate extrapolations or recommendations.

    • Suggestion that the intervention or exposure is more clinically relevant or useful than is justified given the study design.

    • Recommendation made to population groups/contexts outside of those investigated.

    • (Observational) Expressing confidence in an intervention or exposure without suggesting the need for further confirmatory studies.

    • Other.

  3. Selectively focusing on positive results or more favourable data presentation.

    • Discussing only significant (non-primary) results to distract from non-significant primary results.

    • Omitting non-significant results from abstract/discussion/conclusion.

    • Claiming significant effects for non-significant results.

    • Acknowledging statistically non-significant results from the primary outcome but emphasising the beneficial effect of treatment.

    • Describing non-significant results as ‘trending towards significance’.

    • Mentioning adverse effects in the abstract/discussion/conclusion but minimising their potential effect or importance.

    • Misleading description of study design as one that is more robust.

    • Use of linguistic spin.

    • Other.

Analysis

We report the frequency and types of discrepancies in study characteristics and results reporting between preprints and journal publications. We report the proportion of preprints and journal publications with spin and the types of spin. We iteratively analysed the text descriptions of discrepancies identified; we grouped descriptions into common categories, while still accounting for all instances of discrepant reporting, even if they only occurred once, to demonstrate the range of the phenomenon.

To determine whether preprints that were posted after an article had likely received peer review influenced the number of discrepancies, we conducted a post hoc sensitivity analysis by removing seven studies where the preprint was posted up to 7 days before the revision, acceptance or publication dates of the journal publication.

The OSF project linked to our protocol (https://osf.io/5ru8w/) provides our protocol modifications, list of included preprints and journal publications, data dictionary, and dataset.

Patient and public involvement

No patient involvement.

Results

Study characteristics

Of the 67 included studies, 57 were studies of treatment and management, 9 of prevention and 1 of both. The preprints and journal publications were published between 1 March 2020 and 30 October 2020 with a mean time between preprint and journal publication of 65.4 days (range 0–271 days). The topics of the studies varied and included effects of clinical and public health interventions, associations of risk factors with COVID-19 symptoms, and ways to improve implementation of public health measures, such as social distancing. Almost a third of studies (21/67, 31%) were conducted in the USA, followed by Italy and Spain (n=6, 9% each), and China (n=5, 7%). The majority of studies reported public or non-profit funding sources (n=32, 49%) or that no funding was provided (n=24, 36%). Over half the studies also reported that the authors had no conflicts of interest (n=37, 53%).

Discrepancies in study characteristics

Table 1 shows discrepancies in study characteristics reported in preprints and journal publications. The table shows whether each study characteristic was reported or not; if a study characteristic was reported in both the preprint and journal publications, discrepancies in content are described. More preprints than journal publications reported funding source, author conflicts of interest and ethics approval; more journal publications than preprints reported participant demographics and study limitations. In all categories, most discrepancies occurred in the content of items that were reported, rather than in whether the item was present or not. For example, journal publications contained additional information on funding sources, conflicts of interest, demographic characteristics and limitations, as well as more tables and figures compared to preprints (table 1).

Table 1

Discrepancies in study characteristics (n=67 studies)

Results reporting

Of the 67 studies, 23 (34%) had no discrepancies in the number of outcomes reported between preprints and journal publications (table 2). Twenty-three studies had outcomes that were missing from either the preprint or the journal publication. Overall, 15 (22%) studies had at least one outcome that was included in the journal publication, but not the preprint; 8 (12%) had at least one outcome that was reported in the preprint only. The included studies had multiple outcomes. The majority of studies with missing reported outcomes (16/23, 70%) had one outcome missing from either the preprint or journal publication. However, two studies had five outcomes missing from the journal publication, but reported in the preprint only.22–25 As described in table 2, these omissions included important clinical or harm outcomes. For example, one preprint omitted toxicity outcomes that were reported in the journal publication.26 27

Table 2

Discrepancies in Number of Outcomes Reported (N=67 studies)

Table 3 shows the types of discrepancies in components of results reporting. We report the number of studies that had at least one discrepancy and, because studies have multiple outcomes, the number of discrepancies across all outcomes in the 67 studies. The most frequent types of discrepancies between outcomes reported in both preprints and journal publications were in the numerical values reported, statistical tests performed, subgroup analyses conducted, statistical significance reported and timepoint at which the outcome was assessed (table 3). The types of discrepancies were variable, although journal publications more commonly included additional statistical analyses and subgroup analyses compared with preprints. Journal publications more frequently reported outcomes measured over a longer time period than preprints.

Table 3

Discrepancies in components of results reporting for outcomes reported in both preprints and journal publications (N=67 studies; 258 outcomes)

Spin

At least one instance of spin occurred in the preprint, journal publication, or both in 30 (45%) of the 67 studies. Spin occurred in both preprints and journal publications in 23/67 (34%) studies, the preprint only in 5 (7%) studies, and the journal publications only in 2 (3%) studies (table 4). Spin, in any category, was removed between the preprint and journal publication in 5/67 (7%) studies; but added between the preprint and journal publication in 1 (1%) study.

Table 4

Categories of spin in preprints and Journal publications (n=67 studies)

Table 4 shows the categories of spin that occurred in preprints and their accompanying journal publications. Overall, 13 of 67 (19%) studies had changes in the type of spin present in the preprint versus the journal publication; 8 (12%) studies had at least one additional type of spin present in the preprint, 2 (3%) studies had at least one additional type of spin present in the journal publication. Inappropriate extrapolation or recommendations was the most frequently occurring type of spin in both preprints and journal publications (11/67, 16% of studies). This type of spin and inappropriate interpretation given the study design occurred more frequently in preprints than journal publications.

An example of inappropriate interpretation was found in both the preprint and journal publication for an open-label non-randomised trial: the study investigated the effect of hydroxychloroquine (and in combination with azithromycin) on SARS-CoV-2 viral load. They found a statistically significant viral load reduction at day 6; however, despite the small sample size and non-randomised study design, they concluded that their findings were ‘so significant’ and recommended that ‘patients with COVID-19 be treated with hydroxychloroquine and azithromycin to cure their infection and to limit the transmission of the virus to other people in order to curb the spread of COVID-19 in the world’.28 29 An example of inappropriate extrapolation or recommendations that occurred in both the preprint and journal publication is a study that recommended specific policy approaches that were not tested in the study: ‘The UK will shortly enter a new phase of the pandemic, in which extensive testing, contact tracing and isolation will be required to keep the spread of COVID-19. For this to succeed, adherence must be improved’.30 31 This observational study aimed to identify factors associated with individuals’ adherence to self-isolation and lockdown measures; the authors did not aim to investigate public adherence to testing recommendations or contact tracing, nor test their efficacy.

Sensitivity analysis

The mean time between preprint posting and journal article publication was 65.4 days (range 0–271) (online supplemental table S1). No preprints were posted after the revision, acceptance or publication dates for the accompanying journal publication. One preprint was posted the same date as the publication date. Discrepancies in study characteristics, outcome reporting and spin changed minimally when the analyses were conducted after removing seven studies where the preprint was posted up to 7 days before the revision, acceptance or publication dates of the journal publication (online supplemental table S2–S4).

Supplemental material

Discussion

Principal findings

Discrepancies between results reporting in preprints and their accompanying journal publications were frequent, but most often consisted of differences in content rather than a complete lack of reporting. Although infrequent, some outcomes that were not reported would have provided information that is critical for clinical decision-making, such as clinical or harm outcomes that appeared only in the journal publication. The finding that outcomes reported in journal publications were measured over a longer time frame than outcomes reported in preprints indicates that the preprints were being used to publish preliminary or interim data. Preliminary or interim findings should be clearly labelled in preprints.

Although almost half of the preprints and journal publications contained spin, there was no clear difference in the types of spin. Spin is an enduring problem in the medical literature.17 Our findings suggest that the identification and prevention of spin during journal peer review and editorial processes needs further improvement.

More preprints reported funding source, author conflicts of interest and ethics approval than journal publications. These differences may be due to the screening requirements of medRxiv, the main source of preprints in our sample. When reported in both, journal publications included more detailed information on funding source, conflicts of interest of authors, and demographics of the population studied. Journal publications also included more tables and figures, and more extensive discussion of limitations. Some of these differences may be due to more comprehensive reporting requirements of journals. Other changes, such as more information on the study population or greater discussion of limitations, may be due to requests for additional information during peer review.

Since preprints are posted without peer review and most journal publications in our sample were likely to be peer reviewed because they were identified from PubMed, our study indirectly investigates the impact of peer review on research articles. Articles may not have been peer reviewed in similar ways. Authors may have made changes in their papers that were independent of peer review. We observed instances where peer review appeared to improve clarity (eg, more detail on measurements)32 33 or interpretation (eg, requirement to present risk differences rather than just n (%) per treatment group).34 35 Empirical evidence on the impact of peer review on manuscript quality is scarce. A study comparing submitted and published manuscripts found that the number of changes was relatively small and, similar to our study, primarily involved adding or clarifying information.13 Some of the changes requested by peer reviewers were classified as having a negative impact on reporting, such as the addition of post hoc subgroup analyses, statistical analyses that were not prespecified or optimistic conclusions that did not reflect the trial results. In our sample, additions of subgroup and statistical analyses were common between preprints and journal publications, although we did not determine their appropriateness.

A small proportion of medRxiv preprints, 14% at the end of the server’s first year, were published as journal publications.5 Therefore, our sample could be limited to studies that their authors deemed of high enough quality to be eligible for submission to a journal. Or, our sample could be limited to articles that had not been rejected by a journal. It is possible that peer review was eliminating publications that were fundamentally unsound, while more quickly processing studies that were sound and useful. Under pandemic conditions, articles may undergo fewer revisions. For example, peer reviewers may not suggest changes they think are less important, or editors may accept articles when they would have normally requested minor or major revisions. Thus, in this situation, peer review may mainly be playing the role of determining whether a study should be published in a journal or not.

There were minimal changes in the frequency and types of discrepancies between preprints and journal publications when we conducted a sensitivity analysis limiting our sample to studies where the preprints were published before the revision or acceptance date of the journal publication. This suggests that our findings are robust even when the sample is limited to preprints that likely had not gone through the peer review process. Given this finding and the observed similarities between preprints and their subsequent journal publications, our results suggest that peer review during the accelerated pace of COVID-19 research publication may not have provided much added value. The urgency related to dissemination of COVID-19 research could have led journals to fast-track publication by abbreviating editorial or peer review processes, resulting in fewer differences between preprints and journal publications.

Comparison to other studies

Our results are consistent with other studies finding small changes in reporting between preprints and journal publications. A number of these studies have been limited by failing to assess the addition or deletion of outcomes and by the use of composite ‘scores’ that included items related to risk of bias and reporting. In contrast to our study, in a matched sample of preprints and journal publications, Carneiro et al found journal publications more likely to have conflict of interest statement than preprints. In a textual analysis using five different algorithms, Klein et al found very little difference in text between preprints and articles in a large matched sample.9 We also noted preprints and journal publications that were almost identical, or had very minor differences such as corrections of typos. Other studies are limited by comparing unmatched samples of preprints and articles. In a comparison of 13 preprints and 16 articles on COVID-19 that were not reporting on the same studies, Kataoka et al found no significant differences in risk of bias or spin in titles and conclusions.11

We found similar changes in numerical results to Oikonomidi et al who compared 66 preprint–article pairs for COVID-19 studies and found 25 (38%) of studies had changes.12 Oikonomidi classified 16 of these changes as ‘important’ based on (1) an increase or decrease by ≥10% of the initial value in any effect estimate and/or (2) a change in the p value crossing the threshold of 0.05, for any study outcome. We did not classify changes based on magnitude or threshold p values because changes in numerical values may be related to other components of outcome reporting that we observed, such as changes to follow-up times or the use of different statistical tests. Furthermore, deviations from a p value of 0.05 do not necessarily indicate changes in scientific or clinical significance. We examined changes in multiple components of outcome reporting that are considered essential, not just the numerical value of the outcome.16 21 The diversity of studies included in our sample would make any categorisations of scientific or clinical significance difficult and subjective. For example, studies were observational and experimental and not all studies conducted statistical analysis. The topics of the studies included tests of clinical and public health interventions, associations of risk factors with COVID-19 symptoms and ways to improve implementation of public health measures, such as social distancing.

Strengths and limitations of this study

We selected studies from the Cochrane COVID-19 Register rather than conducting a literature search. However, as the Cochrane COVID-19 Register has been optimised to identify COVID-19 clinical research for systematic reviews, we feel the search was comprehensive for identifying COVID-19 studies related to treatment or prevention that are most likely to have an impact on clinical practice or health policy. As a study-based register, all records related to a study are identified, enabling us to obtain all preprint and journal publication versions for a single study. Second, we compared the first version of the preprint with the final journal publication. We may have identified a different number of discrepancies if we compared later versions of the preprint with the journal publication. Third, although clinically important, our focus on COVID-19 research may not be representative of other types of research published as preprints, then journal publications. This study should be replicated in a sample of non-COVID-related interventional and observational clinical studies. Future research could also include assessment of outcome reporting components and spin in preprints that have not been published in journals. Fourth, although we compared non-peer-reviewed preprints to their accompanying journal publications, we did not directly assess the effects of peer review. Finally, coders were not blinded to the source or authors of preprints and journal publications, as this was not feasible and there is no evidence that it would alter the decisions made.

Conclusions

The COVID-19 preprints and their subsequent journal publications were largely similar in reporting of study characteristics, outcomes and spin in interpretation. However, given the urgent need for valid and reliable research on COVID-19 treatment and prevention, even a few important discrepancies could impact decision-making. All COVID-19 studies, whether published as preprints or journal publications, should be critically evaluated for discrepancies in outcome reporting or spin, such as failure to report data on harms or overly optimistic conclusions.

Data availability statement

Data are available in a public, open access repository. Data from this study are available in OSF project file (https://osf.io/5ru8w/).

Ethics statements

Ethics approval

This study analyses publicly available information and is exempt from ethics review.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @QuinnGrundy

  • Contributors LB conceived the project, drafted the protocol, acquired data, conducted analysis, interpreted data and drafted the paper. RL edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. LL edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. KC edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. SM edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. MP edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. QG edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. LP edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. SB edited the protocol, acquired data, conducted analysis, interpreted data and revised the paper. JJK edited the protocol, acquired data, conducted analysis, interpreted data, and revised the paper. RF edited the protocol, conducted the search, conducted analysis, interpreted data and revised the paper. All authors (LB, RL, LL, KC, SM, MJP, QG, LP, SB, JJK and RF) have approved the final manuscript. LB served as guarantor for all aspects of the work.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests RF is a Cochrane employee and part of the development team for the Cochrane COVID-19 Study Register. No other authors declare any other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Author note Data access: LB had full access to all the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.