Objectives The objective of this study was to assess whether National Institute of Health Research (NIHR) Health Technology Assessment (HTA)-funded randomised controlled trials (RCTs) published in the HTA journal were described in sufficient detail to replicate in practice.
Setting RCTs published in the HTA journal.
Participants 98 RCTs published in the HTA journal up to March 2011. Completeness of the intervention description was assessed independently by two researchers using a checklist, which included assessments of participants, intensity, schedule, materials and settings. Disagreements in scoring were discussed in the team; differences were then explored and resolved.
Primary and secondary outcome measures Proportion of trials rated as having a complete description of the intervention (primary outcome measure). The proportion of drug trials versus psychological and non-drug trials rated as having a complete description of the intervention (secondary outcome measures).
Results Components of the intervention description were missing in 68/98 (69.4%) reports. Baseline characteristics and descriptions of settings had the highest levels of completeness with over 90% of reports complete. Reports were less complete on patient information with 58.2% of the journals having an adequate description. When looking at individual intervention types, drug intervention descriptions were more complete than non-drug interventions with 33.3% and 30.6% levels of completeness, respectively, although this was not significant statistically. Only 27.3% of RCTs with psychological interventions were deemed to be complete, although again these differences were not significant statistically.
Conclusions Ensuring the replicability of study interventions is an essential part of adding value in research. All those publishing clinical trial data need to ensure transparency and completeness in the reporting of interventions to ensure that study interventions can be replicated.
- STATISTICS & RESEARCH METHODS
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
An externally produced checklist was applied to all randomised controlled trials published in the National Institute of Health Research (NIHR) Health Technology Assessment journal series.
The sample size for a number of assessments is very small.
The checklist was only applied to the intervention arm of the trial, in the future it would be important to apply the checklist fully to both the control and intervention arm of trials.
A recent publication by Chalmers and Glasziou1 has suggested that as much as 85% of the US$100 billion spent on health research worldwide each year is potentially wasted due to four key problems of knowledge production and dissemination.
Several studies have specifically assessed the waste area for ensuring that the funded research is unbiased and usable by exploring the quality and usability of publications from funded health research. This is a key concern considering the role effective summaries of evidence have in facilitating knowledge transfer and enhancing the uptake of findings in clinical practice. While it is recognised that trial registration databases and scientific journals can be restrictive in terms of word allowance, various strategies have been proposed to improve the reporting of interventions in the published trials, including an ‘intervention bank’ to include manuals and fidelity tools linked to trial registration numbers’.2
Studies have highlighted concerns about the descriptions of interventions in final reports and publications. In one study, for example, 80 consecutive studies were selected for assessment of completeness from the journal Evidence-Based Medicine. Two general practitioners independently assessed whether they could use the treatment with a patient if they saw them the next day.3 Of these 80 published reports, 41 (51%) had elements of the intervention missing, particularly descriptions of process and information on handouts or booklets. The proportion of trials for which adequate information could be made available increased to 90% through the checking of references, contacting authors and undertaking additional searches.3
Similarly, Schroter et al4 developed, piloted and applied a checklist designed to assess the replicability of published treatment decisions to 51 trials published in the BMJ. This checklist was applied by the study team to a broad range of health topics and included seven items and a global eighth item to summarise completeness. This study reported that 57% (29/51) of the papers were not considered to be of sufficient description to allow replication, with the most poorly described aspects of the published trials being the sequencing of the technique and physical/information materials.4 A further study5 has used the checklist developed by Schroter et al to assess the completeness of non-pharmacological intervention description and reported that only 39% were adequately described.
Rates of replicability of interventions vary considerably in the published literature depending on the complexity of the treatment and the assessment criteria. For example, three studies assessed compliance with item 4 of CONSORT in published research in the areas of weight loss,6 brain tumours7 and Hodgkin's lymphoma.8 Item 4 of CONSORT specifically asks for precise details concerning treatments intended for all groups and how and when they were administered. These studies reported that over 90% of study findings were replicable. In contrast, however, one study assessed whether there was sufficient information on what happens before, during and after treatment for back pain, and revealed that only 13% of the trials were replicable.9
The National Institute of Health Research (NIHR) Health Technology Assessment (HTA) Programme commissions and funds primary research and evidence synthesis on the effectiveness, costs and broader impact of healthcare treatments and tests for those who plan, provide or receive care in the NHS. It aspires to enable all funded projects to complete and publish in the programme's own journal HTA, freely available on the programme's website (http://www.hta.ac.uk). Reports published in the journal series are peer reviewed, are in the public domain and contain a full record of the study. Unlike typical peer-reviewed journals, there are no word or size limitations for the full report and unlimited appendices, thus enabling more detail to be included in the publication; an average report is approximately 50 000 in length. Given the importance of complete and replicable reporting of findings and the opportunities the NIHR HTA journal present, this study aimed to assess whether randomised controlled trials (RCTs) with single trials published in HTA were described in sufficient detail.
All RCTs published in the NIHR HTA journal (from January 1999 until March 2011) were selected for inclusion in the study. Of the 109 reports published in this time period, 11 were excluded as they had reported more than one RCT within a single HTA journal. Ninety-eight single trial RCTs were therefore included in the study.
Piloting the checklist
Five NIHR HTA-funded RCTs were selected to represent a range of interventions (surgery, psychology, devices and pharmaceutical) to pilot the checklist initially developed by Schroter et al.4 The checklist was applied independently by three assessors to assess the level of agreement between assessments. A κ score was produced for each individual trial (range 0.15–0.7) with an average of 0.225 across all five trials. Disagreement was due to differing interpretations of the checklist questions. The initial checklist was modified to separate two of the questions into their individual components. In the published checklist, the recipient question stated ‘Is it clear who is receiving the intervention?’ and ‘Do you know all that you need to about the patients? (eg, which drugs they are taking, what they were told, etc)?’. We felt that this required several pieces of information for a single question and therefore we separated question 2 into the three components, as shown in table 1. Similarly, the material question, ‘Are the physical or informational materials used adequately described?’ was separated into two components, which is shown in question 7 of table 1. In addition, the assessors discussed the type and level of information expected to be present in order to answer a question as complete. The modified checklist was applied by the three assessors to a further five NIHR HTA-funded RCTs, resulting in higher levels of agreement (κ scores for each report ranged from 0.3 to 0.7, with an average of 0.6 for all trials).
The main study
The final modified checklist was applied to a wider sample of NIHR HTA-funded RCTs. All RCTs published in the NIHR HTA journal (from January 1999 to March 2011) were selected for inclusion with in the study. One checklist was completed for the intervention group of each trial published. Each item in the checklist was answered by either a yes, no or not applicable response. We did not apply the full checklist to the Control Group, but, unlike the published checklist,4 we did make a general assessment as to the completeness of control group information within question 9 of the checklist. However, responses to this question were not on a detailed assessment of all components of the control group, unlike the intervention group itself. Question 8 summarises whether there are any aspects of the intervention missing based on the responses to the previous seven questions.
Each trial was assessed independently by two assessors. Fifteen per cent of the published reports (15/98) were discussed due to disagreements of the scoring mainly around checklist item 7. All disagreements were discussed by the team and were resolved by consensus.
Three assessors carried out the assessments. Each trial was allocated to two assessors who independently applied the criteria. None of the assessors had medical or clinical experience; however, they have higher health degrees and work full time in health research and in evaluating clinical research. All NIHR HTA reports were examined by using a stabilised process, initially scanning the executive summary, followed by the methods, using key word search terms to scan the whole document and appendices and finally undertaking a detailed reading of the entire report if relevant information could not be found.
The checklist for each trial was completed using an electronic, stand-alone access database. The checklists completed by all three assessors were then merged and exported into Excel and IBM SPSS V.19 for data analysis. IBM SPSS software was used to conduct all descriptive and inferential analyses. The χ2 test was used for all comparisons (statistically significant at p<0.05). If any cell had an expected count less than 5, the Fisher's exact test was used.
The modified checklist was applied to 98 RCTs published by the HTA journal series from January 1999 until March 2011. The interventions within each published trial were classified by the following intervention types: pharmaceutical, radiotherapy, surgery, diagnostic, education and training, service delivery, psychological, vaccines and biological, devices, physical therapy, exercise, complementary therapy, mixed or complex, and other. The intervention classification was provided by Schroter et al4 as part of the original checklist. Table 2 shows the number of trials within the journal series for each intervention by type.
Applying the modified checklist to NIHR HTA-funded RCTs revealed that components of the intervention description were missing in 68 of the 98 reports (missing 69.4%). Table 3 contains selected examples against six of the items in the checklist to illustrate complete and poorly described interventions.
Intervention descriptions were therefore complete in 30.6% of reports. Certain criteria had high levels of completeness, such as baseline characteristics (94.9%) and descriptions of settings (91.8%), which were complete for over 90% of reports. However, other criteria were notably less complete, particularly patient information with only 58.2% having an adequate description (table 4).
Differences in completion rates were noted between the 14 types of interventions. For example, descriptions of interventions were more complete for drug interventions than for non-drug interventions with 33.3% and 30.6% levels of completeness, respectively. The χ2 test showed that this difference was not statistically significant (p=0.77). Furthermore, this was not the case with certain criteria, such as baseline characteristics (drugs 93.3%, non-drugs 94.4%) and provider information (drugs 73.3, non-drugs 77.8%) where levels of completeness were higher in non-drug trials than in drug interventions.
Descriptions of interventions were found to be least complete for psychological interventions with only 27.3% of RCTs in this area being complete. The χ2 test revealed that this difference was not statistically significant when compared with drug interventions (p=1.00). Again, there were a few occasions where certain criteria had the highest levels of completeness of all intervention types, in particular with baseline characteristics and provider information with 100% and 90.9% of completeness, respectively (table 4).
The modified checklist included a question about the completeness of the control group. This was not a detailed evaluation of all the components of the control group but a broad assessment of whether the description appeared to be complete or not. Given the interpretative nature of this question, control group information were not included with the full data. The data revealed that 51% of RCTs had complete descriptions of control groups.
Statement of principal findings
This study has revealed that 30.6% (30/98) of studies with a single trial published in the HTA journal have a full description of the intervention. The interventions described in the published RCTs performed well against certain criteria, such as baseline characteristics (with 95% having an adequate description), but less well on other criteria, such as patient information (with 58% having an adequate description). Drug trials were slightly more complete than non-drug trials and psychological interventions with 33.3% of journals having a complete intervention description, although these differences were not statistically significant.
Strengths and weaknesses
The strengths of this study are that externally generated and tested criteria were applied to evaluate the effectiveness of intervention descriptions in NIHR HTA Programme-funded RCTs. However, there were limitations. First, none of the assessors applying the criteria were medically trained; however, assessors were not commenting on the suitability of an intervention for use in practice but were discussing on whether the aspects of the description would be required for use in practice. There is a possibility that someone with medical training would score the projects differently. In a previous work, the authors have been medically trained. Second, the authors of the reports were not contacted to provide additional information beyond that provided in the publication. Previous studies have demonstrated that contacting the research teams or additional searches for intervention details does increase the completeness of intervention descriptions.3 ,5 However, it is questionable whether having to undertake additional searches outside the publication effectively enhances the ease of replicating study findings.
A limitation of the checklist used is the type of data being collected. While all the criteria are dichotomous (in that they are all yes/no answers), the justification behind this categorisation has different degrees of interpretation. This could have resulted in overly harsh assessments of completeness for certain criteria. For example, the recipient criterion is clear (are inclusion/exclusion criteria present) while greater interpretation is required for the materials criterion which requires the assessor to determine whether the description of the physical materials is adequate and therefore open to interpretation. Certainly, the completion rate for materials was among the lowest across all studies with 58% and 69% completion rates for informational and physical materials, respectively. By using this checklist we were able to suggest further refinements to the criteria used within it, such as separating out the recipient criteria and the material criteria.
A further limitation of the study was that the checklist was not fully applied to the control group of the published trials. This would have provided a more complete picture of how well controls are described within a study. Another limitation was that the number of journals assessed for completeness was very small for certain assessments (eg, only 11/98 journals reported psychological interventions). Therefore, it is possible that certain findings of completeness rate occurred by chance.
Meanings of the study
It is tempting to make comparisons with others studies assessing the usability of intervention descriptions. In particular, Glasziou et al3 reported that 41/80 (51%) of the published reports of single randomised trials and systematic reviews in popular journals were complete compared with 30/98 (30.6%) completeness of NIHR HTA-funded RCT trials. Similarly, interventions in NIHR HTA reports appeared to be described less well than the 51 trials published in the BMJ assessed by Schroter et al4 where 43% (22/51) of the articles were considered to be of with sufficient description to allow replication. While these comparisons are interesting, it is important to note that it is not possible to make any meaningful comparison on the relative performance of each output, as the Glasziou et al3 study looked at journal articles and we looked at the HTA journal series which are aimed at different audiences and the questionnaire used was different between the studies. This is because the nature of outputs varies considerably between studies as does the assessment criteria. It is notable, for example, that Schroter et al4 used eight indicators (7 main checklist items and a global completeness eighth item) in their checklist, compared with the 12 criteria used in this study.
However, this study does reflect findings from similar studies conducted elsewhere. For example, the criteria highlighted as being particularly poorly described in Schroter's study were physical/informational materials, which reflected findings in this study where patient information and physical materials were also lacking in completeness. Similarly, the fact that NIHR HTA Programme-funded drug interventions were typically better described than non-drug interventions reflected the findings in Glasziou et al3 where over 60% of reports on drug treatments were initially deemed to be complete compared with just under 30% of non-drug treatments.
In addition to the more detailed guidance provided to authors, the HTA journal requests that authors of RCTs include the headings set out in the revised CONSORT checklist and flowchart and provide details of CONSORT in its guidance for authors. Item 5 of the CONSORT statement says ‘The interventions for each group with sufficient details to allow replication, including how and when they were actually administered’ and there are extensions of the CONSORT statement to address the additional complexity around the reporting of non-pharmacological interventions. The CONSORT extensions are not currently a requirement for non-pharmacological studies but as these extensions are more widely requested, it is hopeful that the reporting of interventions will improve and be fully described.
A number of studies have investigated the completeness of intervention descriptions in a single disease area by assessing the compliance of RCTs with the intervention item (item 4) of the CONSORT statement.6–8 While these studies reported that over 90% of study findings were replicable, it is likely that this is an over estimation as they do not assess the question of whether there was enough information to allow replication. In contrast, however, one study assessed whether there was sufficient information on what happens before, during and after treatment for back pain, and revealed that only 13% of the trials were replicable.8
Understanding the extent to which interventions in the published studies are described sufficiently to inform clinical decision-making is a key concern in adding value in research agenda. As Chalmers and Glasziou1 have suggested, poorly described interventions form one of the four main pillars of research waste. The criteria identified by Schroter et al4 and developed in this study are helpful in highlighting the specific areas of where intervention descriptions can be improved.
Several areas for further research are indicated by this study. Further testing on the criteria can be undertaken to assess the repeatability of the criteria. For example, the reports sampled in this study could be reassessed by someone with clinical experience to assess the level of agreement. Alternatively, Glasziou's selected papers in his original study could be assessed by non-clinical teams to examine the level of agreement. The checklist has only been applied to single trial studies; future research into the applicability of it for multitrial studies should be investigated.
The characterisation of the control group is a key area for future research, as research involving trials to date has focused on the description of interventions with a treatment group; however, the detail of the control arm is equally important as in many cases the control arm is often described as ‘usual care’ but this does not take into account variations by centre.10 A recent paper reported on the development of a tool for extraction of data in systematic reviews and includes an element on intervention design.11 The tool has been applied to the intervention and control groups of systematic reviews. The applicability of the tool across primary research could be investigated and used to further strengthen the checklist that we have used.
Ensuring the replicability of study findings is an essential part of adding value in research. It is important for health research publishers to be transparent in the usability of study reports and areas of improvement. This study applied a checklist that can be used to indicate where the descriptions of interventions can be improved to enhance replication in clinical practice. Serious consideration should be given on how this might be used to improve intervention reporting in the future. The results of this study have been shared with the editorial Board of the HTA journal to investigate how interventions can be better reported within a journal series.
The authors would like to acknowledge Professor James Raftery and the Metadata team for providing the database and the trial details used in the study and for Professor Paul Glasziou for his advice during the study.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Contributors The study was conceived and designed by LD, RM, SA and FH, and undertaken by LD, SA and FH; AY and DW supported the data analysis. All authors read and approved the final manuscript. LD is the guarantor.
Funding This study was supported by the NIHR Evaluation, Trials and Studies Coordinating Centre through its Research on Research Programme. The views and opinions expressed are those of the authors and do not necessarily reflect those of the Department of Health, or of NETSCC.
Competing interests All of the authors are employed by the University of Southampton to work at least part time for NETSCC. In particular, RM is employed as the Head of NETSCC and has worked for NETSCC (and its predecessor organisation) in senior roles on and off since 1996. He was an editor of the Health Technology Assessment journal (1997–2007) and a founder editor for other journals in the new NIHR Journals Library (2011–2012).
Ethics approval None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data on the included trials are available on request from the corresponding author.