Statistics from Altmetric.com
Strengths and limitations of this study
This study combines quantitative and qualitative methods to investigate how systematic reviewers deal with research malpractice and misconduct.
It proposes clear and reproducible samples of systematic reviews from four major medical journals and the Cochrane Library.
The extracted data were confirmed by 70% of the authors of the systematic reviews analysed.
The systematic reviewers were not asked to confirm all their data, but only the information considered ambiguous.
There is currently no common definition of ‘research misconduct’. This may have led to an underestimation of its real prevalence.
Research misconduct can have devastating consequences for public health1 and patient care.2 ,3 While a common definition of research misconduct is still lacking, there is an urgent need to come up with strategies to prevent it.4 ,5 Fifteen years ago, Smith6 proposed ‘a preliminary taxonomy of research misconduct’ describing 15 practices ranging from ‘minor’ to ‘major’ misconduct. Some of these practices, however, are very common and may not be regarded by all as ‘misconduct’. Therefore, this research defines ‘malpractice’ as relatively common and minor misconduct, while the term ‘misconduct’ is used for data fabrication, falsification, plagiarism or any other intentional malpractices.
It has been shown that some of the 15 malpractices described by Smith, threaten the conclusions of systematic reviews. Examples of such malpractices include: avoiding the publication of a completed research,7 ,8 duplicate publications,9 selectively reporting on outcomes or adverse effects,10 and presenting biased results that are in favour of the sponsors’11 or the authors’12 interests. The impact of other malpractices, such as gift or ghost authorship, is less clear.
Rigorous systematic review methodology includes specific procedures that can counter-balance some of the research malpractice. Unpublished studies may, for example, be identified through exhaustive literature searches,13 and statistical tests or graphical displays such as funnel plots can quantify the risk of publication bias.14 Unreported outcomes may be unearthed by contacting the authors of original articles, and multiple publications based on the same cohort of patients can be identified and excluded from analyses.15 Authors of systematic reviews (further called: systematic reviewers or reviewers) can also use sensitivity analyses to quantify the impact sponsors’ and authors’ personal interests have on the conclusions of a review. Finally, it has been suggested that, as part of the process of systematic reviewing, ethical approval of included studies or trials should be checked on in order to identify unethical research.16 ,17 Systematic reviewers could hence act as whistle-blowers when reporting any suspected misconduct.18
The aim of this study is to examine whether systematic reviewers apply the aforementioned procedures, and whether they uncover and report on cases of misconduct.
The study first examines whether reviewers searched for unpublished studies or tested for publication bias, contacted authors to unearth unreported outcomes, searched for duplicate publications, analysed the impact of sponsors or possibles conflicts of interest of study authors, checked on ethical approval of the studies and reported on misconduct. The secondary objective was to examine whether four major journals and the Cochrane Library reported consistently on the issue.
The reporting of this cross-sectional study follows the STROBE recommendation.19 The protocol is available from the authors.
We conducted a cross-sectional analysis of systematic reviews published in 2013 in four general medical journals (Annals of Internal Medicine (Ann Int Med), The BMJ (BMJ), JAMA and The Lancet (Lancet)). A random sample of new reviews was drawn from the Cochrane Database of Systematic Reviews (Cochrane Library) in 2013, as Cochrane reviews are considered the gold standard in terms of systematic reviewing.
Setting and selection of systematic reviews
Systematic reviews were identified through a PubMed search in August 2014, using the syntax ‘systematic review [Title] AND journal title [Journal], limit 01.01.2013 to 31.12.2013’. A computer-generated random sequence was used to select 25 reviews published in 2013 in the Cochrane Library (http://www.cochranelibrary.com/cochrane-database-of-systematic-reviews/2013-table-of-contents.html).
Reviews were selected by one of the five authors (NE) on the basis of the review titles and abstracts. This was checked by another author (AC). To be eligible, reviews had to describe a literature search strategy and include at least one trial or study. Narrative reviews or meta-analyses without an exhaustive literature search were not considered.
From each systematic review, we extracted the following information: the first author's name and country of affiliation; the number of co-authors; the name of the journal; the title of the review; the number of databases searched; the number of studies and study designs included; the language limitations applied; whether or not a protocol was registered and freely accessible; and, finally, sources of funding and possible conflicts of interest of the reviewers.
Furthermore, we examined whether each of the selected reviews applied the following six procedures, they: (1) searched for unpublished trials; (2) contacted authors to identify unreported outcomes; (3) searched for duplicate publications (defined as a redundant republication of an already published study, with or without a cross-reference to the original article); (4) analysed the impact of the sponsors of the original studies on the conclusions of the review; (5) analysed the impact of possible conflicts of interest of the authors on the conclusions of the review; and (6) extracted information on ethical approval of included studies. We used the following rating system: 0=procedure not applied, 1=partially applied, 2=fully applied (table 1).
Finally, we collected information on whether the systematic reviewers suspected, and explicitly reported on, any misconduct in the included articles.
Data from the reviews were extracted by one author (NE), and copied into a specifically designed spreadsheet. Two of the co-authors checked the data (AC and DMP). We contacted all the corresponding authors of the reviews and asked them to confirm our interpretation of their methods of review. This included their method regarding the search for unpublished trials, their contacts with the authors, their search for duplicate publications and their identification of misconduct. When there was discrepancy between our interpretation and the reviewers’ answers, we used the latter. This was done by email, and a reminder was sent to those who had not replied within 2 weeks.
The capacity of systematic reviewers to identify misconduct is unknown. Our hypothesis was that 5% of systematic reviewers would identify misconduct. Therefore, we needed a minimum of 110 systematic reviews to allow us to detect a prevalence of 5%, if it existed, with a margin of error of 4% assuming an α-error of 0.05.
Descriptive results are reported as numbers (proportions) and median (IQR) as required. To check whether systematic reviews were different from one journal to the other, we performed all descriptive analyses separately according to title of the journal. χ2 or Kruskal–Wallis tests were applied to test the null hypothesis of homogeneous distribution of characteristics and outcomes. We compared reviews from reviewers who answered our inquiry with reviews from those who did not, and across journals. Since Cochrane reviews were expected to be different from those published in the journals, we performed separate analyses with and without Cochrane reviews. We did not expect missing data. Statistical significance was defined as an α-error of 0.05 or less in two-sided tests. Analyses were performed using STATA V.13.
Selection of reviews
We identified 136 references; 18 were excluded for different reasons, leaving us with 118 systematic reviews (Ann Int Med 39A1-39; BMJ 38B1-38; JAMA 12J1-12; Lancet 10L1-10; Cochrane Library 19C1-19) (figure 1, online supplementary appendix table 1A).
Characteristics of the reviews
The characteristics of the reviews are described in table 2, online supplementary appendix tables 1A and 2A. Approximately 75% of the first authors were affiliated to an English-speaking institution. The protocols of all the Cochrane reviews were registered and available. However, protocols were available for only 17 reviews from the journals.
Sources of funding were declared in 110 reviews. Among these 110, 24 declared that they had no funding at all. All the reviews declared presence or absence of conflicts of interest of the reviewers.
The median number of databases searched was four. Additional references were searched in systematic reviews published previously and through contacting experts and/or authors of the original studies. Forty-two (36%) reviews only considered the English literature, and 6 (5%) reviews searched in Medline only. Four (3%) reviews searched for English articles in Medline only.A(35,39),J(5,9) The median number of articles included per review was 28. Half of the systematic reviews included a mix of various study designs, while 39% included only RCTs (table 2).
Contact with reviewers
Out of the 118 reviews, we were able to contact 111 corresponding authors. No valid email address was available for seven. Eighty reviewers (72%) responded to our inquiries.
Among the 80 reviewers who responded, 8 (10%) provided information that changed our data extraction regarding the endpoint ‘search for unpublished trials or test for publication bias’. One reviewer declared that, contrary to our assumption, unpublished trials had not been searched in their review,B28 and seven claimed that unpublished trials had been searched although this was not reported in the published reviews.A(21,25,30,31),B(2,17,27)
Eleven reviewers (14%) provided information that changed our data extraction regarding the endpoint ‘contact with authors of original studies’. One declared that authors had not been contacted,B35 and 10 claimed that authors had been contacted although this was not reported in the published review.A(12,25,31,34),B(20,25,34),J(9,12),L1
Twenty-six reviewers (32%) provided information that changed our data extraction regarding the endpoint ‘duplicate publication’. Terms used included ‘duplicate’, ‘companion article’, ‘multiple publications’, ‘articles with overlapping datasets’, or ‘trials with identical patient population’. Three reviewers declared that, contrary to our assumption, they had not identified duplicate publications,A(25,26),J3 and 23 claimed having searched for duplicates although this was not reported in the published review.A(16,17,21,24,27,30,31,33),B(9,10,14,20,25,27),C(6,10,11,14,16,18),J6,L(6,9)
Five reviewers (6%) told us about suspected cases of misconduct that were not reported in the published review.
Characteristics of the reviews did not differ depending on whether or not we were able to contact their authors (table 2).
The median number of procedures applied in each review was 2.5 (IQR, 1–3). Eleven reviews (9%) applied no procedures at all, while no review applied all six procedures.
Search of unpublished trials and test for publication bias
Fifty-six reviewers (47%) either searched for unpublished trials or applied a statistical test to identify publication bias. Twenty-three reviewers (19%) did both. Unpublished studies were sought for in trial registries (eg, ClinicalTrials.gov or FDA database), or by contacting experts and manufacturers. The number of unpublished studies included in these reviews was inconsistently reported. Contacting the reviewers did not help us clarify this issue. Seven reviews (6%) only discussed the risk of publication bias, and 32 reviews (27%) did not mention it at all (table 3).
Contact with authors to unearth unreported outcomes
Seventy-three reviewers (62%) had contacted the authors of the original studies. Fifty-eight reviewers (49%) had searched for unreported results from the original articles. The reviews rarely reported on the number of authors contacted and the response rate. We were not able to clarify this issue in our email exchange with the reviewers (table 3).
Duplicate publications were sought for in 81 reviews (69%). Twenty-two reviewers confirmed that duplicates were not sought for, and 15 did not answer our enquiry. The number of duplicates identified was rarely mentioned. We failed to clarify this issue in our exchange with the reviewers. Ten reviews (8.5%) published the reference of at least one identified duplicate.A(1,10,18,38),C(1,4,6,12),J(2,9)
Twenty-seven reviews (23%) reported on the sources of funding for the studies. Six reviewers (5%) analysed the impact of sponsors on the results of the review.B(4,15,26,32),J12,L8 One reviewer claimed that sponsor bias was unlikely,L8 while three were unable to identify any sponsor bias.B(4,15),J12 Finally, two reviews identified sponsor bias (see online supplementary appendix table 3A).B26,32
Conflicts of interest of authors
Five reviewers reported on conflicts of interest of the authors of the studies.A(8,14),B20,C(8,14) None of them used this information to perform subgroup analyses. One review mentioned conflicts of interest as a possible explanation for their (biased) findings.A8 In three reviews, the affiliations of the authors were summarised and potential conflicts of interest clearly identified.B20,C(8,14) Finally, the appendix table of one of the reviews showed that one study might have suffered from a ‘significant conflict of interest’ .A4
Ethical approval of the studies
Three reviews looked at whether or not ethical approval had been sought for (see online supplementary appendix table 3A).B(27,37),C10 Two reviews explicitly reported that all included studies had received ethical approval.B(27 37) The third review reported extensively on which studies had or had not provided any information on ethical approval or patient consent.C10
Outcomes did not differ according to whether or not we were able to contact the reviewers (table 3).
Suspicion of misconduct
Two reviewers suspected research misconduct in the articles included in their review and reported it accordingly.B31,J12 Contacting the other reviewers allowed us to uncover five additional cases of possible misconduct. Four agreed to be cited here;A26,B33,C16,L1 while one preferred to remain anonymous.
Data falsification was suspected in three reviews.A26,C16,J12 One review, looking at the association of hydroxyethyl starch administration with mortality or acute kidney injury of critically ill patients,J12 included seven articles co-authored by Joachim Boldt. However, a survey performed in 2010, and focusing on Boldt's research published between 1999 and 2010, had led to the retraction of 80 of his articles due to data fabrication and lack of ethical approval.20 The seven articles co-authored by Boldt were kept in the review as they had been published before 1999. Nonetheless, the reviewers performed sensitivity analyses excluding these seven articles, and showed a significant increase in the risk of mortality and acute kidney injury with hydroxyethyl starch solutions that was not apparent in Boldt's articles.
The second review examined different techniques of sperm selection for assisted reproduction. The reviewers suspected data manipulation in one study since its authors reported non-significant differences between the number of oocytes retrieved and embryos transferred while the p-value, when recalculated by the reviewers, was statistically significant.C16
The third review (on management strategies for asymptomatic carotid stenosis) reported that misconduct had been suspected based on the ‘differences in data between the published SAPPGIRE trial and the re-analysed data posted on the FDA website’.A26 Although this information was not provided in the published review, it was available in the full report online (http://www.ahrq.gov/research/findings/ta/carotidstenosis/carotidstenosis.pdf).
Intentional selective reporting of outcomes was suspected in two reviews.B31 In one review that examined early interventions to prevent psychosis, the reviewers identified, discussed and referenced three articles that did not report on all the outcomes.B31 In the second review, the corresponding reviewer (who preferred to remain anonymous) revealed that he ‘knew of two situations in which authors knew what the results showed if they used standard categories (of outcome) but did not publish them because there was no relationship, or not the one they had hoped to find.’
Plagiarism was identified in one review examining the epidemiology of Alzheimer's disease and other forms of dementia in China.L1 According to the reviewers, they had identified a ‘copy-paste-like duplicate publication of the same paper published two or more times, with the same results and sometimes even different authors’. This information was not reported in the published review. The corresponding reviewer explained that they ‘had a brief discussion…about what to do about those findings and whether to mention them in the paper…We did not think that we should be distracted from our main goal, so we felt that it was better to leave it to qualified bodies and specialised committees on research malpractice to address this problem separately.’
Finally, one reviewer told us that “there were some ‘suspected misconduct’ in original studies on which we performed sensitivity analysis (best case—worst case)”. The review mentioned that some studies were of poor quality, but it did not specifically mention suspicion of misconduct.B33
The median number of studies included in the reviews that detected misconduct was 56 (IQR, 11–97), and was 28 (IQR, 12–57) in the reviews that did not detect misconduct. The difference did not reach statistical significance.
The reviews published in the four medical journals differed in most characteristics examined in this paper. The reviews published in the Cochrane Library differed from all the other reviews (see online supplementary appendix table 2A). There were also differences in procedures applied across the journals (see online supplementary appendix table 4A). The only three reviews that had extracted data on ethical approval for the studies were published in the BMJ (2) and in the Cochrane Library (1). Finally, one review from the BMJ and one review from JAMA explicitly mentioned potential misconduct.
Statement of principal findings
This analysis confirms some issues and highlights new ones. The risk related to double counting of participants due to duplicate publications and the risk of selective reporting of outcomes are reasonably well recognised. More than half of the reviews applied procedures to reduce the impact of these malpractices. The problem of conflicts of interest remains underestimated, and ethical approval of the original studies is overlooked. Although systematic reviewers are in a privileged position to unearth misconduct such as copy-paste-like plagiarism, intentional selective data reporting and data fabrication or falsification, they do not systematically report them. Finally, editors have a role to play in improving and implementing rigorous procedures for the reporting of systematic reviews to counter-balance the impact of research malpractice.
Comparison with other similar analyses
Our study confirms that systematic reviewers are able to identify publications dealing with the same cohort of patients.15 However, 20% of reviews under consideration failed to report having searched for duplicates. Only ten of them provided the references to some of the identified duplicates. It remains unclear whether reviewers do not consider duplicate publication worth disclosing or whether they are unsure on how to address the issue. Finally, there is no widely accepted definition of the term ‘duplicate’, which, in turn, adds to the confusion. For example, a number of reviewers used the term ‘duplicate’ to describe identical references identified more than once through the search process.
Selective publication of studies, and selective reporting of outcomes, have been examined previously.21–26 This led the BMJ to call for « publishing yet unpublished completed trials and/or correcting or republishing misreported trials ».27 Other ways to address this issue include registration of the study protocols,28 searching for unpublished trials8 ,29 and contacting authors to retrieve unreported outcomes.30 ,31 Our analyses show that 70% of systematic reviewers are aware of these malpractices, although 10% failed to report them explicitly. As described before,32 most reviewers did not report on the number of unpublished articles included in their analyses, the number of authors contacted and the response rate.
Despite the obvious risk of research conclusions favouring a sponsor,33 subgroup analyses on funding were rarely performed. Sponsor bias may overlap with other malpractices such as selective reporting,34 ,35 redundant publication9 or failure to publish completed research.35 It may also overlap with conflicts of interest of the authors of the original studies, an issue that remains largely overlooked in these systematic reviews. There is a general understanding that authors with conflicts of interest are likely to present conclusions in favour of their own interests.12 ,36 Although most journals now ask for a complete and overt declaration of conflicts of interest from all authors, this crucial information remains unclearly reported. This may explain why we found no reviews that performed subgroup analyses on this issue.
Ten years ago, Weingarten proposed that ethical approval of studies should be checked during the process of systematic reviewing.16 However, our study shows that only three reviews reported having done so. The need to report ethical approval in original studies has only recently been highlighted. A case of massive fraud in anaesthesiology,20 in which informed consent from patients and formal approval by ethics committee were fabricated, only shows how difficult a task it will be.
The most striking finding was that although seven systematic reviews suspected misconduct in original studies, five of them did not report it, one reported it without further comment and only one reported overtly on the suspicion. This illustrates that reviewers do not consider themselves entitled to make an allegation, although they are in a privileged position to identify misconduct. The fact that one reviewer preferred to remain anonymous further illustrates the reluctance to openly report on misconduct.
Strengths and weaknesses
We used a clear and reproducible sampling method that was not limited to any medical specialty. The data analysed had been confirmed by the reviewers. This allowed us to quantify the proportion of procedures that were implemented but not reported by the reviewers. Finally, to our knowledge, this is the first analysis of the procedures used by systematic reviewers to deal with research malpractice and misconduct. Qualitative answers of the reviewers were very informative.
We selected systematic reviews from four major medical journals and the Cochrane Library, which is considered the gold standard in terms of systematic reviewing. We can reasonably assume that the problems identified are at least as serious in other medical journals. Systematic reviews that were not identified as such in their titles were not included. However, including these reviews would not have changed our findings. Only one reminder was sent to the reviewers, since the response rate was reasonably high. Furthermore, the characteristics of the systematic reviews did not differ between reviews for which authors responded or not. We did not ask the reviewers to confirm all their data but focused on the information that we considered unclear. It is possible that some of the reviewers had indeed extracted information on sponsors’ and authors’ conflicts of interest, as well as ethical approval of the studies, but failed to report them. A number of procedures applied and instances of misconduct came to light through our personal contacts with the reviewers. Our results might therefore be underestimated. On the other hand, it is possible that some reviewers pretended having applied some procedures although they had not. This would have led to an overestimation of the number of procedures applied. The major weakness of this study lies in the lack of an accepted definition of ‘research misconduct’. It is possible that some reviewers might have hesitated to disclose suspected misconduct, leading to an underestimation of the prevalence of misconduct identified. Finally, we have used the ‘preliminary taxonomy of research misconduct’ proposed by Smith in 2000,6 and categorised all common minor misconduct as ‘malpractices’. Some may disagree with our classification.
Conclusions and research agenda
The PRISMA guideline has improved the reporting of systematic reviews.37 PRISMA-P aims to improve the robustness of the protocols of systematic reviews.38 The 17-items list mentions the assessment of meta-bias(es), such as publication bias across studies, and selective outcome reporting within studies. However, the list is not concerned with authors’ conflicts of interest, sponsors, ethical approval of original studies, duplicate publications or the reporting of suspected misconduct. The MECIR project defines 118 criteria classified as ‘mandatory’ or ‘highly desirable’, to ensure transparent reporting in a Cochrane review.39 Among these criteria, publication and outcome reporting bias, funding sources as well as conflicts of interest are highlighted as ‘mandatory’ to report on. However, ethical approval for studies is not. Most importantly, neither of the two recommendations explicitly describes what should be considered misconduct, what kind of misconduct must be reported, to whom, and how.
We have previously shown how systematic reviewers can test the impact of fraudulent data on systematic reviews,40 identify redundant research41 and identify references that should have been retracted.42 This paper suggests that systematic reviewers may have additional roles to play. They may want to apply specific procedures to protect their analyses from common malpractices in the original research, and they may want to identify and report on suspected misconduct. However, they do not seem to be ready to act as whistle-blowers. The need for explicit guidelines on what reviewers should do once misconduct has been suspected or identified has already been highlighted.18 These guidelines remain to be defined and implemented. The proper procedure would require the reviewer to request the institution where the research was conducted to investigate on the suspected misconduct, as the institution holds the legal legitimacy. Whether alternative procedures could be applied should be discussed. For example, they may include contacting the editor-in-chief of the journal where the suspected paper was originally published, or the editor-in-chief where the systematic review will eventually be published. Future research should explore the application of additional protective procedures such as checking for the adherence of each study to its protocol, or the handling of outlier results, and quantify the impact of these measures on the conclusions of the reviews. Finally, potential risks of false reporting of misconduct need to be studied.
The authors thank all the authors of the systematic reviews who kindly answered our inquiries, and Liz Wager for her thoughtful advice on the first draft of our manuscript.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.