Objectives To investigate whether overstatements in abstract conclusions influence primary care physicians’ evaluations when they read reports of randomised controlled trials (RCTs)
Design RCT setting: This study was a parallel-group randomised controlled survey, conducted online while masking the study hypothesis.
Participants Volunteers were recruited from members of the Japan Primary Care Association in January 2017. We sent email invitations to 7040 primary care physicians. Among the 787 individuals who accessed the website, 622 were eligible and automatically randomised into ‘without overstatement’ (n=307) and ‘with overstatement’ (n=315) groups.
Interventions We selected five abstracts from published RCTs with at least one non-significant primary outcome and overstatement in the abstract conclusion. To construct a version without overstatement, we rewrote the conclusion sections. The methods and results sections were standardised to provide the necessary information of primary outcome information when it was missing in the original abstract. Participants were randomly assigned to read an abstract either with or without overstatements and asked to evaluate the benefit of the intervention.
Outcome measures The primary outcome was the participants’ evaluation of the benefit of the intervention discussed in the abstract, on a scale from 0 to 10. A secondary outcome was the validity of the conclusion.
Results There was no significant difference between the groups with respect to their evaluation of the benefit of the intervention (mean difference: 0.07, 95% CI −0.28 to 0.42, p=0.69). Participants in the ‘without’ group considered the study conclusion to be more valid than those in the ‘with’ group (mean difference: 0.97, 95% CI 0.59 to 1.36, P<0.001).
Conclusion The overstatements in abstract conclusions did not significantly influence the primary care physicians’ evaluations of the intervention effect when necessary information about the primary outcomes was distinctly reported.
Trial registration number UMIN000025317; Pre-results.
- randomised controlled trials
- general practice
- primary care physicians
- reporting bias
- clinical trial
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
- randomised controlled trials
- general practice
- primary care physicians
- reporting bias
- clinical trial
Strengths and limitations of this study
This is the first and only randomised controlled trial study that estimates the influence of overstatement in abstract conclusions.
We evaluated the influence of overstatement among primary care physicians who were one of the major users of evidence.
Although the number of participants was above our targeted sample size, a relatively low response rate limits the generalisability of our findings.
As we focused on the influence of overstatement in abstract conclusions when necessary information about primary outcomes was reported in the methods and results sections, the effect of various other forms of inadequate reporting in abstracts should be further evaluated.
Abstracts of reports of randomised controlled trials (RCTs) provide concise, educational and readily accessible information. They are particularly useful for primary care physicians because they deal with a wide range of patients and problems and need quick access to information regarding their practices. Sometimes abstracts are the only source of evidence they use.1
Abstract conclusions are the most crucial part of the whole abstract as they summarise the main results and provide interpretations.2 A previous survey showed that primary care physicians paid the most attention to the conclusion.3 The conclusion also guides primary physicians who are not confident in their skills in evidence-based medicine (EBM)3 4 to interpret the results. Thus, a strong conclusion may alter the readers’ interpretation of the whole study.
Unfortunately, the conclusion is the most frequently distorted section in abstracts.5 Exaggerating the results of the trial, such as using spin5 or overstatement,6 is not uncommon. Examples of spin include omitting non-significant results of primary outcomes and focusing on significant secondary outcome or subgroup analysis.5 Previous studies also found that 58% of RCTs with non-significant results,5 and 70% of non-randomised studies7 had spin. Subsequent studies reported that spin, misleading information or overstatements were common in various subspecialties, such as rheumatology,8 psychiatry,9 wound care,10 surgery11 12 and oncology.13–15
This suggests that, as far as abstract conclusions are concerned, the quality of reporting is still poor despite the consolidated standards of reporting trials (CONSORT) guideline for abstracts.2
However, there has been limited evidence about the influence of such abstracts on the readers’ interpretations in the real world. Only one RCT16 investigated the extent of the impact of inappropriate reporting on readers’ interpretations of the results. Boutron et al 16 randomised clinical researchers into two groups, and asked them to read an abstract with or without ‘spin’, which was defined by the authors as ‘reporting the beneficial effect of the intervention as greater than shown by the results’, to estimate how readers were influenced when they assessed the effectiveness of the intervention. The result showed that the participants who read abstract with spin were more likely to think that the intervention was beneficial for the patients than those who read the abstracts without spin.
Although their trial demonstrated that spin in the abstract had a small impact (effect size=0.24), it left several questions unanswered. First, the level of influence of spin in the abstract conclusion on the participants’ interpretation remained unclear because the investigators added changes to all sections of the abstracts. In their study, they either erased or added all the results of secondary outcomes while changing the wording. In other words, they investigated the general influence of spin in an abstract by comparing it with its ‘paragon’ counterpart. Moreover, the target population was clinical researchers with publishing experience. Therefore, the influence of spin in the abstract conclusion on other types of evidence users remains unknown.
This study aims to determine the influence of the overstatements in abstract conclusions on general clinical practice by focusing on the primary care physicians who read reports of RCTs.
Setting and design
This online study was a double-blind RCT conducted from January to February in 2017. The participants were masked to the study hypothesis, and the investigators (except RS who constructed the random sequence) were masked from the allocation. We recruited volunteers from members of the Japan Primary Care Association (JPCA) by sending email invitations. The intervention was conducted on a website specifically designed for this study. Participants were randomised into two groups and asked to read and evaluate 1 of the 10 abstracts (five pairs of two corresponding abstracts: one with and another without overstatement) of an RCT report. The trial was prospectively registered with the University Hospital Medical Information Network—Clinical Trial Registry (UMIN000025317) and now it is at "sutdy completed" stage. We had submitted the protocol including a statistical analysis plan to the JPCA before commencement but did not publish it to avoid the risk of participants reading it.
Participants and recruiting
The target population was recruited from the members of the JPCA. The JPCA was established in 2010 to the promote primary care specialty in Japan.17 It is the largest organisation for primary care physicians in the country, and has been promoting evidence-based practice among its members. Currently, over 10 000 doctors working in various types of medical institutions18 belong to the JPCA, and 5836 out of a total of 10 851 members are certified as specialists in primary care.
We sent email invitations to JPCA members who had more than 2 years of clinical experience with registered email addresses. (The details of the recruiting process will be reported in a separate paper.) We excluded clinicians with less than 2 years’ experience because our target population was primary care physicians, and doctors usually choose their specialty after 2 years of clinical training in Japan. Interested individuals could access the DOCTOR study website via the link in the email. We added a code at the end of the link to ensure that participants accessed the website via the given link. As an incentive, an Amazon gift card worth 3000 yen (US$26.6) was given to 20 drawing winners.
The inclusion criteria for participants were as follows: JPCA member, medical doctor currently in clinical practice, more than 2 years of clinical practice experience and access to up-to-date clinical research knowledge. We asked how respondents learnt about the recent clinical trials, and individuals who did not respond with any information source were excluded. Screening questions were on the leading page on the website. We excluded those who work at research laboratories or educational institutions.
Randomisation and allocation concealment
When participants moved to the assessment page, they were randomly assigned an abstract either with or without overstatements with a 1:1 ratio. The block randomisation (10 for each block) was automatically performed using a computer-generated random sequence (created by RS). The allocation concealment was maintained through the automatic random allocation process.
In the email invitations, participants were notified that this study aimed to investigate the impression of the abstracts and that they would be asked to score one randomly selected abstract numerically. (The English version of the invitation is included in the online supplementary appendix 1). Thus, they were masked to the study hypothesis. The researchers (KS, TA, YT and AS), excluding the website manager (RS), were blinded until the blind interpretations of the results were completed and signed off.19 RS did not join the result analysis.
Supplementary file 1
Selecting abstracts with overstatements
We selected five abstracts20–24 (the text of the five abstracts is included in the online supplementary appendix 2) from the pre-existing database of published reports in psychiatry RCTs dated between 2011 and 2014, which was collected from our previous study.6 25 To avoid any bias arising from the participants’ subspecialty expertise (such as internal medicine or surgery), we chose reports from psychiatry.
The abstracts were selected based on the following criteria: (1) superiority RCT with two arms, (2) claiming effectiveness of an intervention in the abstract conclusion despite some or all primary outcomes not being significant, (3) targeting a common mental illness primary care physicians are likely to encounter in clinical settings and (4) having a journal impact equal to or higher than two.
An overstatement was defined as ‘inconsistency between the results of primary outcomes in full-text and those deduced from the abstract conclusion’.6 While spin is any technique embellishing the results across whole reports, an overstatement specifically refers to exaggerations in the abstract conclusion.
In the five sample abstracts selected, two only mentioned the superiority of the intervention to the control in the conclusions. In fact, one had non-significant results and the other had mixed results (significant and non-significant) in their primary outcomes. The remaining three had conclusions that emphasised the partial superiority of the intervention with respect to the control. They stated that the treatment was partially effective even though all the primary outcomes were non-significant. Together, they include different levels of overstatement from completely misleading to less informative (not mentioning non-significant primary outcome) conclusions. They were checked independently by two or more investigators (KS, AS and RS)
Constructing abstracts with and without overstatements
We constructed abstracts in line with the following prespecified guidelines. First, we rewrote the conclusion to make a conclusion without overstatement following these rules. (1) When all primary outcomes were non-significant, we rewrote it the conclusion as ‘Intervention A was not more effective than control B in terms of …’. (2) When one primary outcome (PO1) was significant but the other (PO2) was non-significant, we rewrote it the conclusion as ‘Intervention A was more effective than control B in terms of PO1, but not more effective in PO2’ according to the order in the original abstract. We also removed the results of secondary outcomes and subgroup analysis from the conclusions. (See an example in box, and all the abstract conclusions are in table 1.)
An example of the abstracts (italics where extra text added, bold where changed in the ‘without overstatement’ group)
Intervention A for menopausal symptoms: a randomised controlled trial
This study aims to determine the efficacy of intervention A for alleviating vasomotor and other menopausal symptoms.
Late perimenopausal and postmenopausal sedentary women with frequent vasomotor symptoms (VMS) such as hot flush, sweating and poor circulation participated in a randomised controlled trial conducted in three sites: 106 women randomised to exercise and 142 women randomised to usual activity. VMS frequency and bother were recorded on daily diaries at baseline and on weeks 6 and 12. Intent-to-treat analyses compared between-group differences in changes in VMS frequency and bother, sleep symptoms (Insomnia Severity Index and Pittsburgh Sleep Quality Index), and mood (Patient Health Questionnaire-8 and Generalised Anxiety Disorder-7 Questionnaire). Primary outcomes were VMS frequency and bother mean frequency or bother of VMS at 6 and 12 weeks.
At the end of week 12, changes in VMS frequency in intervention A group (mean change −2.4 VMS/d, 95% CI −3.0 to −1.7) and VMS bother (mean change on a four-point scale −0.5, 95% CI −0.6 to −0.4) were not significantly different from those in control B group (−2.6 VMS/d, 95% CI −3.2 to −2.0, P=0.43, −0.5 points, 95% CI −0.6 to −0.4, P=0.75). The exercise group reported greater improvement in insomnia symptoms (P=0.03), subjective sleep quality (P=0.01) and depressive symptoms (P=0.04), but differences were small and not statistically significant when p values were adjusted for multiple comparisons. Results were similar when considering treatment-adherent women only.
These findings provide strong evidence that 12 weeks of intervention A do not alleviate VMS but may result in small improvements in sleep quality, insomnia and depression in midlife sedentary women.
Control B is the standard treatment for menopausal symptoms.
‘Without’ overstatement version conclusions
Intervention A was not more effective than control B in terms of frequent VMS such as hot flush, sweating in postmenopausal women.
Control B is the standard treatment for menopausal symptoms.
Second, we standardised the methods and results sections. We explicitly stated the primary outcomes and results (for example, OR, risk ratio, CI, P value) from the text if they were not stated in the original abstract. Therefore, all abstracts had the information necessary for participants to understand the results of the primary outcomes from the method and results sections. This modification was necessary to keep the conclusion consistent with the other sections of the abstract. Without this step, the conclusion of an abstract without overstatement would be inconsistent with other sections of the same abstract because the conclusion of an abstract without overstatement would now be reconstructed based on the actual primary outcomes that were not mentioned in the original abstract. Additionally, this standardisation made it possible to estimate the influence of overstatement in the conclusion when the methods and results reported essential information.
Third, we changed the names of the intervention and control treatments to anonymous ‘intervention A’ and ‘control B’ to minimise bias. We added a few words for explanation when there was a medical term that seemed unfamiliar to primary care physicians (eg, VMS): hot flush, sweating and poor circulation). Finally, we translated the texts into Japanese. Except for the conclusion, abstracts ‘with’ or ‘without’ overstatement were identical.
We made established two pairs of investigators, and each pair did modification and translation of a half of the abstracts (‘with’ and ‘without’ overstatement). Then, the other pair then checked whether they were following the guidelines. Another researcher (SK), who was not involved in this study, checked the translation. Any disagreement was resolved by discussion among investigators.
Our primary outcome was the numerical evaluation, which was scored by participants, of the effectiveness of the intervention discussed in the given abstract: ‘How beneficial do you think intervention A is for the patients, on a scale from 0 to 10, 0 being not at all beneficial and 10 being conceivably most beneficial?’ We also asked the following questions (scored 0 to 10 with 0 being not at all and 10 being very likely).
How valid is this conclusion in your opinion on a scale from 0 to 10?
How much do you want to read the full text of this study on a scale from to 10?
When you answered the above questions, which part of the abstract did you refer to the most? (background/methods/ results/conclusion)
We referred to the effect size of 0.25 obtained in the previous study.16 They estimated the effect of spin by comparing the influence of the abstracts ‘with’ and ‘without’ spin on clinical researchers. Although our target population differed from the previous study, considering that the effect of 0.2 represented a small effect,26 we aimed for a sample size of 253 per group, and 506 in total to detect a between group effect size of 0.25 with a power of 90% and a two-sided alpha risk at 5%. Given that we had prepared five pairs of abstracts with or without overstatement, we intended to enrol 100 or more participants for each pair.
For the main analysis, we used a linear mixed effects model with a fixed factor (for the intervention) and a random intercept for the abstract to account for the clustering effects of the abstracts (each abstract had two versions: with or without overstatements). The model accounted for the correlation within abstracts by using an unstructured covariance matrix. We excluded the following subjects from our analysis before proceeding to the study analyses and therefore without knowledge of any outcomes: (1) those who were erroneously allocated by the web system although they did not satisfy the eligibility criteria and (2) those who were eligible and were randomised but did not complete the questionnaire or spent less than 30 s on the questionnaire. TA and KS analysed the data using SPSS statistics 24 without knowing the allocation. To evaluate the influence of possible associated factors3 27 on the interpretation, we conducted the following prespecified subgroup analyses using the participants: (1) working clinics, (2) getting information only from a pharmacological company, (3) with certification of a primary care physician and (4) having an experience of being the principal researcher (this is post hoc).
Blinded data interpretation
Blinded interpretation of study results was the approach recommended by Järvinen et al 19 to reduce interpretation bias. Following their suggestion, we interpreted the results blindly before breaking the randomisation code. Thus, we prepared two interpretations of the results based on two scenarios: (1) assuming group A was with overstatement and group B was without overstatement and (2) assuming group A was without overstatement and group B was with overstatement. After agreeing that there would be no further change, we broke the randomisation code and chose the correct interpretations.
This study was conducted in accordance with the Declaration of Helsinki. We obtained an online consent for participation from each participant.
We sent email invitations to 7040 JPCA members (figure 1). After sending one reminder, we reached the targeted sample size of 510. Among the 787 individuals who accessed the website, 622 were eligible and randomly assigned to without overstatement (n=307) and with overstatement (n=315) groups. A total of 281 doctors in the ‘without’ group and 286 in the ‘with’ group were included for the analysis. The number of participants allocated to each pair ‘with’ or ‘without’ overstatement was as follows: abstract pattern 1 (n=116), 2 (n=109), 3 (n=115), 4 (n=113) and 5 (n=114). Online supplementary appendix 3 provides further breakdown per abstract.
Fifty-five individuals were excluded because they either spent less than 30 s on the webpage (n=14) or did not complete the survey (n=41). Most participants read and rated the abstract within 4 min (medium time: 162 s, IQR: 114–236 s).
Table 2 shows the participant characteristics; 76.5% were certified as primary care physicians. We classified their subspecialties according to their certifications. The most common background was internal medicine. More than 60% of the participants had attended a course on EBM. About 40% of the physicians said the first section they read was the conclusion; only 11% of them read the results section first. There was no substantial difference between the two groups.
There was no statistically significant difference between the groups with regard to the interpretation of the benefits of the intervention discussed in the given abstracts (mean difference: 0.07, 95% CI −0.28 to 0.42, P=0.69, effect size calculated by Cohen’s d: 0.031) (table 3).
Secondary outcomes and subgroup analyses
However, there was a significant difference between the groups in their perception of the validity of the conclusion (mean difference: 0.97, 95% CI 0.59 to 1.36, P<0.001) (figure 2). Those in the without overstatement group considered the abstract to be more valid than those in the with overstatement group (effect size calculated by Cohen’s d was 0.41). No significant difference was found when asked if they wanted to read the full text. In both groups, the majority of the doctors referred to the results section to make an assessment.
We conducted subgroup analyses, but no significant differences were found with regard to the interpretation of the benefits of the intervention based on the workplace (clinic, n=177, mean difference: 0.04, 95% CI −0.67 to 0.74, P=0.91), general source of information (only pharmacological company, n=43, mean difference: 0.06, 95% CI −1.36 to 1.48, P=0.93), being a certified primary care physician (n=434, mean difference: −0.01, 95% CI −0.41 to 0.39, P=0.96) or having no experience as a principal researcher (n=367, mean difference: −0.10, 95% CI −0.53 to 0.34, P=0.66).
We showed that primary care physicians were not influenced by overstatement in the conclusion section if the abstract contained necessary information on the primary outcomes. The 95% CI of the estimated effect (effect size by Cohen’s d: 0.031, 95% CI −0.13 to 0.20) rules out the existence of even a small effect. In the baseline questionnaire, 42% of participants answered that they read the conclusion section first when reading abstracts. However, more than 60% of them referred to the results section for their interpretation of the given abstract. They tended to judge the overstated conclusion as less valid than those without overstatement. These results suggested that primary care physicians who belonged to the JPCA with up-to-date knowledge of clinical trials were not misled by overstatements in abstract conclusions if the method and results section reported sufficient information. Our subgroup analysis showed that factors such as the workplace, types of information resources or experience of being a principal investigator would make little difference. These results suggest that the participants had good critical appraisal skills of research reports, which helped them to recognise the inconsistency between the result and the conclusion.
Our results differed in some respects from the previous study. Boutron and colleagues’ study16 showed that the interpretation of abstracts was affected by spin. The ‘abstracts with spin’ group considered the intervention more beneficial than the without spin group, and the ‘with spin’ group was more interested in reading the full text. This was contrary to our main findings. On the other hand, the abstracts with spin group interpreted the abstract as less methodologically rigorous than the without spin group. This was consistent with our results.
However, we must consider some differences in design between Bourton et al’s study and this study. First, the level of spin was much higher in their study than in this study. Boutron et al aimed to investigate the impact of spin in the abstract generally, so they removed all spin from the abstract and compared this ‘perfect’ abstract with the original one. On the other hand, in our study, the difference between ‘with’ and ‘without’ groups was limited in the conclusion section because our aim was to estimate the influence of overstatement in the conclusion section. Thus, we added the information on the primary outcomes in the methods and results sections of both groups. Second, the baseline characteristics of the participants differed. While all the participants in the study of Boutron et al were experienced clinical researchers, we chose primary care physicians as our target. Although the participants in this study had little experience in clinical research, they were regular users of medical literature (90% of participants had read more than one abstract in the previous month). Most participants were eager to learn EBM and had some knowledge of critical appraisal. In addition, 60% referred to the results section when making clinical interpretation. Therefore, their study and ours are more complementary than contradictory.
Limitations and strengths
Our strength is that this is the first and only RCT study that estimates the influence of overstatement in abstract conclusions. Authors of scientific articles like to use promising, positive words28 29; nonetheless, we demonstrated that overstated conclusions did not affect the readers’ interpretations of the results if sufficient information was provided in other sections. Second, we evaluated the influence of overstatement in primary care physicians, who are among the major users of evidence. They encounter clinical queries in daily clinical practice and use evidence to make the best decisions for their patients. Therefore, it is important to clarify whether primary care physicians are susceptible to overstatement in abstract conclusions. The results showed that primary care physicians with up-to-date knowledge of trial/research information were not misled by an overstated conclusion.
There are some limitations. While the number of participants was above our targeted sample size, it may not have completely represented the JPCA members. The relatively low response rate of 11.1% (787/7040) limits the generalisability of our findings. Two things should be noted. First, we chose the JPCA as our recruiting pool because that the members were considered representative of active users of scientific evidence in their primary care practice. The JPCA is the only organisation that certifies clinicians as primary care physicians, and they regularly conduct workshop on EBM. However, those who responded to our invitation were potentially avid readers of scientific reports, which is the reason they volunteered for this assessment, and, therefore, they may have better critical appraisal skills for abstracts than other JPCA members. Actually, most of participants answered that they read abstracts regularly. This suggests that they were not representative of all primary care physicians in Japan. Furthermore, the effect of overstatements in the abstracts that did not report the necessary information of primary outcomes or other various forms of inadequate reporting was not measured. In our study, we added essential information on primary outcomes in the methods and results sections as recommended by a CONSORT statement.2 More than 60% of the participants stated that they mainly refer to the results to evaluate the abstract. In contrast, only around 15% based their assessment on the conclusion. This means that adequate reporting of the results is necessary for interpretation of the abstract. Finally, we should not overgeneralise the association between the type or level of overstatement and its impact on interpretation. We chose five abstracts at different levels of overstatement as a sample, but the selection did not cover all levels of spin or all types of spin. Neither did we have sufficient sample size to explore such relationships. The influence of biased reporting on clinical decisions should be further researched.
In conclusion, our findings suggested that sensible and well-read clinicians are capable of discerning the inconsistency between results and conclusion and of making a sound judgement on the validity of misleading conclusions when primary outcomes are appropriately reported in the methods and results sections. However, this does not mean that overstatements can be overlooked. The conclusion sections of abstracts should be written solely based on the primary outcome results. The impact of inappropriate writing style in clinical settings should be further researched.
We thank those who participated in this study, A Igaki for organising and sending invitation e-mails, and S Kishimoto, for double checking translation of abstracts. We would like to thank Editage (www.editage.jp) for English language editing.
Contributors All authors of the paper have contributed to the conception or design of the work, development of the intervention and the acquisition or interpretation of data. KS, TA, RS, YT and AMS were involved in drafting the work. MK and TAF revised it critically for important intellectual content. RS designed and developed the study website. TA and KS analysed the data. All authors gave the final approval of the manuscript before submission.
Funding This work was supported by Japan Primary Care Association (grant number 28-01-001) to KS.
Competing interests TAF has received lecture fees from Eli Lilly, Janssen, Meiji, Mitsubishi-Tanabe, MSD and Pfizer and consultancy fees from Takeda Science Foundation. He has received research support from Mochida and Mitsubishi-Tanabe.
Ethics approval The Ethics Committee of Kyoto University Graduate School of Medicine.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.