Statistics from Altmetric.com
Strengths and limitations of this study
Analysis of real life peer review comments on submitted manuscripts in relation to sponsorship, direction of results and decision about acceptance.
Inclusion of manuscripts submitted to a general medical journal and specialty journals across different medical specialties.
Comprehensiveness of classification checklist assessed in two training sessions. Reviewer comments for 20% of included manuscripts scored by two raters and good level of inter-rater agreement.
Focus on reviewer comments made during initial peer review of articles; additional comments may have been raised during reviews of revised versions.
Reviewer comments may not inherently provide an objective reflection of the quality of articles, as there is evidence in the literature on the defects of peer review.
At peer-reviewed medical journals, submitted articles are sent out for external peer review if they are considered to be potentially suitable for publication. During peer review, manuscripts are scrutinised by independent experts or peers in the same field, to assist editors in their decision-making and to help improve the quality of submitted articles.1 ,2 The peer review process has been investigated to a limited extent. Some studies addressed the effects of blinding and training of reviewers, the detection rate of deliberately introduced errors by reviewers, and the impact of peer review on the quality of published articles.3–8
Few studies have systematically analysed the content of reviewer comments on submitted articles. Bordage9 studied the reasons given by reviewers for rejection of manuscripts submitted for publication in conference proceedings on research in medical education. Inappropriate statistics and overinterpretation of results were commonly reported.9 Turcotte et al10 analysed reviewer comments for manuscripts submitted to the Canadian Journal of Anesthesia, and found that lack of originality, inadequate experimental design and inappropriate conclusions were the main determinants of an article's fate. Hopewell et al7 focused on reviewer comments on the reporting of methodological items in randomised trials submitted to open peer review journals. The type of changes requested by reviewers included addition or clarification of randomisation, blinding, and sample size, and toning down of conclusions to reflect the results.7
The content of reviewer comments may be related to the direction of results and sponsorship. Emerson et al11 compared reviewer reports for a fabricated manuscript reporting positive results and an otherwise identical manuscript reporting no effect. Reviewers detected more errors in the no-difference version and awarded higher scores to the methodology section of the positive manuscript, although the methods sections were identical.11 Emerson et al11 showed that the positive article was more often recommended for publication than the no-difference version, although this observation could not be confirmed by others.12 None of the previous studies on peer review compared reviewer comments according to whether reported trials were sponsored by pharmaceutical companies or non-profit organisations.
Analysis of reviewer comments provides more insight into the shortcomings of drug trials that are submitted to medical journals, both from the perspective of the design and conduct of trials and the reporting quality in articles. It would be interesting to determine whether the occurrence of specific shortcomings in manuscripts is affected by sponsorship and results being either positive or negative. In the current study, we performed a descriptive content analysis of peer review comments made on manuscripts on drug trials submitted to eight medical journals to investigate the relationship between the content of comments and sponsorship, direction of results, and decision about acceptance, using a previously reported cohort.13
Journal and manuscript selection
We included manuscripts submitted from January 2010 through April 2012 to one general medical journal (BMJ) and seven specialty journals (Annals of the Rheumatic Diseases, British Journal of Ophthalmology, Gut, Heart, Thorax (all from the BMJ Group), Diabetologia, and Journal of Hepatology). We selected randomised controlled trials, in which at least one study arm assessed the efficacy or safety of a drug and a statistical test was used to evaluate treatment effects. This cohort has been described in detail previously.13 This study was limited to manuscripts that were sent out for external peer review.
Data extraction manuscripts
For each manuscript, the decision about acceptance and reviewer reports were extracted from submission systems or provided by journals. Manuscripts were either rejected after review or accepted for publication. We determined the number of reviewers that evaluated the first submitted version of each article. Manuscripts could be evaluated during multiple rounds of review before a final decision was made, but we focused on comments that were made during initial peer review of articles. Reviews of revised manuscripts were excluded. Information on sponsorship and the direction of results was previously extracted from manuscripts and classified according to predefined criteria.14 Reviewers were aware of the sponsorship of trials. In short, trials were classified as non-industry, industry-supported or industry-sponsored trials. For non-industry trials, no associations with pharmaceutical companies were reported. Studies reporting donation of study medication by a manufacturer, studies stating receipt of financial support from a pharmaceutical company, and studies with industry-affiliated authors were classified as industry-supported trials. For industry-sponsored trials, a pharmaceutical company was explicitly described as study sponsor, or the company funding the trial participated in the design, data collection, analysis and/or preparation of the manuscript. Trial results were scored as positive if results reported for the primary end point were statistically significant (p<0.05 or 95% CI for difference excluding 0 or 95% CI for ratio excluding 1) and supported the efficacy of the test drug, and negative if they did not. Results of non-inferiority trials were classified as positive if treatments were equivalent. Safety trials were classified as positive if the test drug was as safe as or safer than control.
Classification of reviewer comments
A validated instrument for classification of reviewer comments does not exist. Included journals did not provide standardised forms to reviewers, but general guidance for peer review was available on each journals’ website. Based on this guidance15–17 and previous research on peer review,9 ,10 ,18–20 a classification checklist for negative reviewer comments was composed. In two consecutive training sessions, reviewer comments for 10 randomly selected manuscripts were independently classified by two raters (MvL and HJO) in each session, to assess the consistency between raters and check on the comprehensiveness of the checklist. Both after the first and second training session, disparities in the interpretation of comments were discussed and the checklist was revised accordingly. The final version of the checklist (see online supplementary table S1) was then tested on reviewer comments for a random sample of 30 manuscripts that were independently classified by the two raters. Assuming an inter-rater agreement of at least 80% for each type of comment with the final checklist, 30 manuscripts were sufficient to estimate the agreement with a precision (SE) of at most 7%. If the inter-rater agreement during this test was considered sufficiently high, a single reviewer (MvL) could continue with the rating process. After the classification of reviewer comments for these 30 manuscripts, we calculated the percentage of agreement between raters for each type of comment in the checklist. κ Statistics were not considered suitable as some types of comments were rarely scored and resulting κ values would be inaccurate.21 For these 30 manuscripts, classification discrepancies were resolved by consensus between raters if the agreement for a comment was <85%. For the other types of comments, the score assigned by the rater who subsequently classified all comments for the other manuscripts (MvL) was decisive. Overall, reviewer comments for 50 manuscripts were scored by two raters in this study, which was equivalent to 20% (50 of 246) of the total number of included manuscripts. For each manuscript, a reviewer could have several remarks related to one type of comment. However, each type of comment was scored maximally once per reviewer.
Descriptive statistics were used to describe included manuscripts (data presented as frequencies and percentages). The relationship between each type of comment and sponsorship, direction of results and decision about acceptance was analysed using a generalised linear mixed model based on generalised estimating equations (GEE) with a binary distribution for the dependent variable and an identity link. In this model, the comment score of a reviewer (comment vs no comment) was used as the dependent variable. We included sponsor type, results or decision about acceptance as fixed variable in the model, and—if possible (if the model converged)—journal, to control for the journal to which a manuscript was submitted. The unique identification number that manuscripts received from a journal was included as cluster variable (random effect). Most manuscripts were reviewed by several reviewers. The model estimates the percentage of reviewers that will comment on a manuscript, depending on sponsor type, results or decision about acceptance (‘mean percentage of comments on a manuscript’). If a lower limit of the resulting CI was negative, it was truncated to 0. The number of different types of comments per manuscript was compared by sponsorship, results or decision about acceptance using univariate analysis of variance. We controlled for the number of reviewers per manuscript by including this as a covariate in the model. Two-sided p<0.05 was considered statistically significant. p Values were not adjusted for multiple comparisons. Statistical analyses were performed using SPSS software (V.20) and SAS for Windows (V.9.2, SAS Institute Inc).
To assure confidentiality of manuscripts and reviewer reports, confidentiality agreements were signed by the authors before gaining access to the data. As standard editorial and peer review processes were unchanged, authors and reviewers were not informed about this study. Research ethics committee (REC) approval was not required as this study involved no human participants.
From January 2010 through April 2012, 472 manuscripts on drug RCTs were submitted to eight journals, of which 250 articles (53.0%) were externally reviewed. For 246 manuscripts, reviewer comments for authors were available. Of these 246, 96 (39.0%) were accepted for publication (table 1). Eighty-nine (36.2%) were non-industry trials, while 78 (31.7%) were industry-supported and 79 (32.1%) were industry-sponsored trials. Most articles reported positive results (N=150, 61.0%). The number of reviewers for the first submitted version of an article ranged from 1 to 5. In total, 575 reviewer reports were evaluated.
Overall, the level of inter-rater agreement for the final version of the classification checklist was good. For all types of comments, the agreement between raters was close to or higher than 80%. For 20 of 26 items, there was >85% agreement (see online supplementary table S2).
Overall, the types of comments that were most frequently reported by reviewers included poor experimental design (range of point estimators in tables 2 and 3, and 4; 50.5–69.7%), inadequately reported methods (50.8–60.5%), incomplete study outcome data (58.7–68.2%), inadequate discussion of the meaning of results (44.2–56.1%), poor writing (34.7–42.8%) and inaccurate tables or figures (35.1–44.1%). In table 2, the mean percentage of comments on a manuscript is compared by sponsor type. For several types of comments, there was a relation between sponsorship and the mean percentage of comments. The percentage of comments regarding a lack of novelty was significantly associated with sponsorship (p=0.038); industry-sponsored trials were more likely to receive this comment (8.9%) than industry-supported (2.5%) and non-industry trials (6.1%). The percentage of comments regarding poor experimental design was also associated with sponsorship (p=0.019); non-industry trials more often received this comment (69.7%) than industry-supported (58.8%) and industry-sponsored trials (52.9%). Furthermore, the percentage of comments about inappropriate statistical analysis methods was associated with sponsorship (p=0.006); non-industry trials were more likely to receive this comment (28.4%) than industry-supported (23.5%) and industry-sponsored trials (15.1%). The percentage of comments regarding the article title not being representative of the study was also associated with sponsorship (p=0.012); industry-supported trials more often received this comment (8.1%) than non-industry (5.0%) and industry-sponsored trials (1.5%).
In table 3, the mean percentage of comments on a manuscript is compared by the direction of trial results. For most types of comments, there was no significant difference according to whether manuscripts reported positive or negative results. However, the percentage of comments regarding inappropriate conclusions was higher for articles with negative trial results (29.3%) than for articles with positive results (18.9%, p=0.010).
Table 4 shows the mean percentage of comments on a manuscript according to the decision about acceptance. The percentage of comments about the research question not being clinically relevant was higher among rejected manuscripts (7.8%) than accepted manuscripts (1.6%, p=0.002). Rejected manuscripts were more likely to receive comments regarding a lack of novelty (8.3%) than accepted manuscripts (2.6%, p=0.008). In addition, the percentage of comments about poor experimental design was higher for rejected manuscripts (68.6%) than for those that were accepted (50.5%, p<0.001). Reviewers more often reported that a study was insufficiently related to the literature among manuscripts that were accepted (18.6%) compared to those that were rejected (10.5%, p=0.041).
In table 5, the number of different types of comments per manuscript is shown, which was adjusted for the number of reviewers per manuscript. Overall, reviewers reported a mean number of 7.8 different types of comments per manuscript (range, 1–15 types of comments). The number of types of comments per manuscript was not associated with the direction of results or the decision about acceptance of manuscripts. There was a significant relation between sponsorship and the number of different types of comments per manuscript (p=0.035); non-industry trials on average received more types of comments per manuscript (8.2) than industry-sponsored trials (7.2).
This is the first study in which real life peer review comments made on submitted manuscripts were compared according to sponsorship, direction of results and decision about acceptance. Previous studies have been limited to experiments with fictitious manuscripts.11 ,12 The most frequently reported comments by reviewers included poor experimental design, inadequately reported methods, incomplete study outcome data, inadequate discussion of the meaning of results, poor writing, and inaccurate tables or figures, which is in line with findings of previous studies.7 ,9 ,10 ,18 Reviewers rarely reported on ethics, trial registration, or conflicts of interest, as was expected from prior research.10 ,19 ,20
Submitted manuscripts on industry-sponsored trials more often received comments regarding a lack of novelty compared to industry-supported and non-industry trials. However, we found no significant difference according to sponsor type for comments on the clinical relevance of research questions. It has been argued in literature that studies by pharmaceutical companies may be less innovative than non-industry studies. Drug companies may more often focus on late-stage drug development and producing variations of drugs already on the market, while academia may be more likely to perform creative, early-stage clinical research.22 ,23 Interestingly, industry-supported trials were least often criticised by reviewers for lack of novelty. This may suggest that collaboration between academia and the pharmaceutical industry could potentially lead to more innovative clinical studies.
Non-industry trials were more likely to receive comments regarding poor experimental design and inappropriate statistical analysis methods than industry-supported and industry-sponsored trials. In addition, non-industry trials received significantly more different types of comments per manuscript than industry-sponsored trials. Prior research based on published articles showed that the methodological quality of trials funded by pharmaceutical companies was equal to or tended to be higher than that of non-industry trials.24–27 Previously, we studied the shortcomings of protocols of drug trials that were submitted for approval to RECs.28 Based on the comments raised during REC review, we found that non-industry trials more often had shortcomings regarding methodology and statistical analyses than industry-sponsored trials,28 which is in line with findings of the current study.
Manuscripts with negative results were more likely to receive comments regarding overinterpretation or inappropriate conclusions in relation to results than manuscripts with positive results. The number of types of comments per manuscript was not associated with the direction of results. Evidence of inconsistencies between results and the interpretation of findings has previously been shown for published articles, especially among those with negative results.29 ,30 Authors may shape the impression of results in articles, that is, to add ‘spin’ to reports. Spin includes the use of specific reporting strategies to highlight that the experimental treatment is effective, despite non-significant results for the primary outcome, or to distract readers from non-significant results. This distorts the interpretation of results and misleads readers.30 ,31
Rejected manuscripts had more often received comments on the research question not being clinically relevant, lack of novelty and poor experimental design than accepted papers. The number of types of comments per manuscript was not associated with decision about acceptance though. Although we found significant differences between comments for articles that were eventually rejected or accepted, there are many reasons why papers can get rejected beyond what is in reviewer reports for the initial submitted version of manuscripts. Moreover, editorial processes and the amount of weight put on reviewer comments when making publication decisions can be very variable across journals. Papers that reviewers are positive about may be rejected, while others are published despite of negative reviewer comments. As manuscript review by journals is a complicated and multistage process, it is difficult to determine the exact influence of reviewer comments in editorial decision-making.
This study is strengthened by the inclusion of manuscripts submitted to a general medical journal and specialty journals across different medical specialties. The studies by Bordage9 and Turcotte et al10 were limited to articles on research in medical education or anaesthesia, which reduced the generalisability of their findings. Hopewell et al included open peer review journals where reviewer comments are included alongside published articles. Reviewers may more often provide rather uncritical comments when reviewing for such journals, as they may fear reprisals for criticising other researchers’ work openly.7 ,32 In this study, we assessed the comprehensiveness of the classification checklist in two training sessions. Reviewer comments for 20% of the included manuscripts were scored by two raters and the level of inter-rater agreement was good. In previous studies, the classification of reviewer comments was completely conducted by a single author.9 ,10
This study has some limitations. We focused on peer review comments for the first submitted version of articles. Some journals may send revised versions to new reviewers or back to the same reviewers. By focusing on reviews of initial versions, new comments raised during reviews of revisions may have been missed. However, initial reviewer reports often contain the most extensive comments and provide adequate information to compare reviewer comments according to sponsorship, direction of results and decision about acceptance. We have not assessed whether shortcomings that were detected by reviewers were corrected in revised manuscripts. Hopewell et al7 found that most authors complied with requests by reviewers in their revised version, but this was beyond the scope of this study. In addition, we included a sample of manuscripts describing drug RCTs and our results may therefore not be generalisable to other study designs or RCTs with other interventions.
Although peer review is generally assumed to raise the quality of submitted papers and to provide a mechanism for rational and fair editorial decision-making,33 reviewer comments may not automatically provide an objective reflection of the quality of articles. While the evidence on the effectiveness of peer review is limited,2 ,6 ,33 there is considerable evidence on its defects.2 ,34 In studies where major errors were inserted into papers that were subsequently sent to reviewers, none of the reviewers spotted all of the errors.4 ,35 In addition, it has been suggested that peer review is a subjective and, therefore, inconsistent process.34 Agreement between reviewers in their recommendations for manuscripts may be low.36 Nevertheless, peer review is seen by researchers as important and essential for scientific communication and as the best alternative currently available.34 ,37
In conclusion, peer reviewers identified fewer shortcomings regarding design and statistical analyses in industry-related trials, but commented more often on a lack of novelty in industry-sponsored trials. Negative trial results did not significantly influence the nature of comments other than appropriateness of the conclusion. Manuscript acceptance was primarily related to the research question and methodological robustness of the study. As some of the manuscripts’ shortcomings represent fundamental methodological weaknesses, better training on trial design and analysis may be appropriate, especially for non-industry trials. Other errors are more just omissions, including frequently reported shortcomings such as inadequate reporting of methods and incomplete reporting of study outcome data. These fixable errors can be avoided if authors pay more attention to reporting quality in manuscripts.
The authors thank BMJ, Annals of the Rheumatic Diseases, British Journal of Ophthalmology, Gut, Heart, Thorax, Diabetologia, and Journal of Hepatology for participating in this study. The authors thank Sara Schroter (BMJ) for her suggestions regarding the methodology of this study and for commenting on the content of this manuscript. The authors are grateful to Gerard Rongen (Radboud university medical center) for reviewing this manuscript.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1 - Online tables
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.