Objectives Meta-analysis based on individual patient data (IPD) from randomised trials is superior to using published summary data since it facilitates subgroup and multiple variable analyses. Guidelines and funders expect that researchers share IPD for bona fide analyses, but in practice, this is done variably. Here, we report the experience of obtaining IPD for two collaborative analysis studies.
Setting Two linked studies required IPD from published randomised trials. The leading researchers for eligible trials were approached and asked to share IPD including trial characteristics, patient demographics, baseline clinical data and outcome measures.
Participants Participants in eligible randomised controlled trials included patients with or at risk of cognitive decline/vascular events.
Primary and secondary outcome measures Numbers (%) of trials where the leading researcher responded favourably/negatively or did not respond. If negative, reasons behind the response were collected. If positive, methods used to share IPD were recorded.
Results Across the two studies, 391 completed trials were identified. Email addresses for researchers were found for 313 (80%) of the trials. One hundred and forty-eight (47%) researchers did not respond despite being sent multiple emails. Following contact, positive initial responses were received from 92 researchers, resulting in IPD being shared for 78 trials. Eighty-seven (28%) researchers declined to share data; justifications were recorded. The median time from first request to accessing data in one study was 241 (IQR 383.3) days. IPD sources included: direct from researcher, via academic trial funders repository and a website requiring remote analysis of commercial data. Where data were shared, a variety of methods were used to transfer data.
Conclusion Sharing of IPD from trials is desirable and a requirement of many funding bodies. However, accessing IPD faces multiple challenges including refusals to share, delays in access to data and having to perform analyses on a remote website.
Trial registration Not applicable.
- statistics & research methods
- vascular medicine
- delirium and cognitive disorders
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Both studies identified a large number of eligible trials across two separate disease areas, resulting in a large number of researchers being contacted.
Trials included had academic and commercial sponsors, no restriction on country of origin and covered a wide timeframe (1985 to 2017).
Results are likely to reflect a common experience in data sharing in vascular and dementia research.
A limitation of this work is that only email or letter was used to contact authors due to available resources.
It was not possible to survey non-responders to data requests, so any potential reasons for this should be considered as speculation.
The purpose of clinical research is to improve treatment and outcomes for patients, but also to do so ethically and efficiently, with the minimum possible number of patients subjected to experimental treatments in trials. Sharing of individual patient data (IPD) can allow multiple questions to be answered using the same data, thus preventing the unnecessary exposure of further patients to experimental treatments. There is an increasing expectation that researchers should share trial data when requested. Funding bodies such as the US National Institutes of Health and UK Medical Research Council1 2 now require researchers to agree to share data before funding is granted. Further, the International Committee of Medical Journal Editors3 also now require researchers to indicate their intention to share data in order for the trial paper to be published, as promoted by journals such as the British Medical Journal and Public Library of Science (PLoS) journals.4 5 With respect to commercial trials, some sponsors openly share their trial data.6–8 However, even with these policies in place, investigators may still face challenges in accessing IPD.9 In a survey of academics who were corresponding authors for clinical trials, 88% said that they were in favour of data sharing, with 75% believing that it should be mandatory.10 However, only 18% of the researchers surveyed said their funder required them to share their data and of these, only 57% had actually shared data.10
The Stroke Trials Unit at the University of Nottingham has coordinated several studies that depend on pooled analyses of clinical or preclinical IPD: Optimising the Analysis of Acute Stroke Trials (OAST) collaboration,11–14 Dipyridamole In Stroke Collaboration,15 16 Community Occupational Therapy for Stroke,17 NXY-059 for experimental stroke18 and Progesterone for experimental stroke.19 None of these projects could obtain IPD for all included trials. Two further studies are ongoing: Optimising the analysis of cognition collaboration (OA-Cog) and the Optimising the analysis of vascular prevention trials collaboration (OA-Prevention). These studies aim to identify the most efficient statistical techniques for analysing either cognition or vascular prevention data, respectively. Their analyses will be performed using IPD from randomised controlled trials involving patients with, or at risk of developing, either cognitive impairment or dementia, or vascular events such as stroke or myocardial infarction.
We report our experience of identifying and contacting chief investigators, and obtaining and analysing trial IPD for the OA-Cog and OA-Prevention studies.
OA-Cog and OA-Prevention
Both studies finalised protocols and received research ethics committee waivers prior to commencing the data sharing phase. Strategies for identifying relevant trials and requesting IPD were similar and followed those used previously (OAST study).11–14 First, relevant trials were identified through searches of the Cochrane Database of Systematic Reviews and PubMed using specific keywords related to the particular study (eg, OA-Cog ‘Dementia’ and ‘Alzheimer’s disease’; OA-Prevention ‘Stroke prevention’). The resulting outputs from these searches were then evaluated for relevance by PS and LJW. Detailed search strategies are not essential for this paper and will be reported as part of the individual studies.
Trials were eligible for inclusion if the primary analysis was statistically significant with a p value<0.05, showing either benefit or harm, or were included in a systematic review showing statistically significant benefit or harm overall. Trials were excluded if they were not randomised, were neutral (p≥0.05) within a neutral review, did not collect outcome data required in the studies (for OA-Cog this was global cognition score and for OA-Prevention severity of vascular event data) or met at least two of the following conditions: <50 participants; published before 1995; full publication of the primary analysis was unavailable.
Once a trial was deemed eligible, the lead researcher or corresponding author was contacted by email, letter or via an online application form. If an email address was not provided in the trial publication or the email address was no longer active, attempts were then made to identify current contact information. Methods included internet searches for more recent publications by the same author, or for other authors from the primary publication, or by contacting the researcher’s institution as listed in the publication. If no response was received within 4 weeks, a reminder email was sent to the same email address. Reminders were sent up to four times with a minimum of 4 weeks between each reminder.
Researchers were requested to share data on: patient demographics (eg, age, sex), medical history (eg, history of hypertension), baseline clinical data (eg, blood pressure, stroke type, functional status), outcome measures (eg, global cognition scale score, quality of life, vascular events and event dates) and trial characteristics (eg, randomisation group, length of follow-up). Once received, the IPD were combined into one master database with each trial given a unique identifier.
Data sharing mechanisms
Multiple mechanisms exist for sharing data. IPD may be published within the main paper (only relevant if the trial has a small sample size), published directly on the web with open access, submitted by the lead researcher directly as a data file either by email or via USB/CD (typically with encryption to protect confidentiality), accessed through an academic collaboration data repository or on a website hosting both the trial data and statistical packages so that analyses have to be performed on the remote server.
Outcomes and statistical analyses
The main outcome for this analysis was receipt of IPD for inclusion in the OA-Cog and OA-Prevention studies. Other outcomes included response (positive or negative) and non-response rates. We compared the following trial characteristics between those that did or did not make IPD available: year of publication, sample size and sponsor type. We also collected reasons for not sharing IPD, where given. For OA-Prevention only, we collected data regarding time from initial contact date to receipt of trial data. Data are tabulated and shown as number and per cent or median with 25th and 75th centiles.
Identifying eligible trials
Of 391 trials identified as being relevant to the OA-Cog or OA-Prevention studies, an active email address for the lead researcher or a corresponding author was provided in the primary publication for 194 (50%, figure 1) and was either not given or inactive for 197 trials. Of the 197 trials, different active email addresses were identified for 119 from secondary publications, other authors, the lead researcher’s institution or web searches. No email address was found for researchers of 78 trials. Invitation emails were sent to the lead researcher of the resulting 313 trials. Thirteen trials were eligible for both studies; the lead researcher was contacted using a joint email for two of these trials.
Of 313 invitations sent, we received positive initial responses from 92 researchers (29%), from whom 78 trial datasets were obtained. In spite of further reminders, 14 researchers did not follow through on their initial willingness to share data. One hundred and forty-nine (48%) researchers did not respond despite evidence that the email address was still in use and multiple reminders being sent. Eighty-six (27%) researchers declined to share their data (table 1); reasons included ‘operational constraints’, ‘non-availability of data’ and non-availability of the researcher (death, retirement, moved into another scientific field). Of the 13 trials who were contacted for both studies, four researchers responded positively and shared their data. However, four of the researchers did not respond at all to either of the studies and the remaining five agreed to share data with one study but not the other.
The median time from sending the first email to successfully receiving data in the OA-Prevention study was 242 (25th, 75th centile: 21.5, 405) days. This varied between the different methods for sharing data and data source (commercial website/lead researcher/academic repository; figures 2 and 3). Time from initial contact to receiving a copy of data was shorter if sent via email or USB (median 86, 25th, 75th centile 20–405 days), but was longer if shared via an online platform (median 336, 25th, 75th centile 315–357 days) (figure 2). If the data source was a commercial website, the median time from first request to data being shared was 336 (25th, 75th centile: 273, 336) days, for lead researcher it took 169 (25th, 75th centile: 6.5, 495.75) days and for academic repository 23 (25th, 75th centile: 20, 221.5) days (figure 3).
Characteristics of trials received versus not received
In a comparison between positive, negative and non-responders, there was evidence of a difference in year of study publication (p=0.027). Non-responses tended to be from newer studies (median (25th, 75th centile): 2005 (2000, 2008)) compared with positive (2003 (1999, 2007)) and negative responses (2002 (1999, 2005)). However, there was no evidence to suggest an association between year of study publication and whether IPD were eventually shared (data shared median year 2003 (1999, 2007) vs data not shared 2004 (2000, 2008), p=0.29). Median (25th, 75th centile) sample size for trials where data were shared (969 (349, 4444)) was larger than those not shared (344 (96, 1405), p=0.0002). However, there was no strong evidence that study size was associated with positive (973 (349, 4444)), negative (453.5 (157, 1101)) or no response (286 (68, 1439), p=0.061) from study authors. Trials sponsored by the pharmaceutical industry were less likely to be shared compared with those sponsored by an academic institution or another source (17% vs 46%, p<0.0001).
Quality of data shared
The quality of the data shared with the two studies was mixed. Both studies found that on more than one occasion data were sent missing key variables (eg, randomisation allocation) that had been requested in the original application. This meant that further contact with the lead researcher was required to confirm why this data was missing and whether it was available for sharing. This was a time-consuming issue and also led to one trial being excluded as it could not be analysed. Occasionally, data were also sent with the variable headings and codes written in another language, without a data dictionary. This made it harder to identify the variables needed for the studies and increased the time it took to merge with the other IPD. It is also possible that this may have led to incorrect assumptions being made when converting the data into a standard format. Another approach used was to share all the trial’s data in its raw format. If shared with a well-written data dictionary, this was not a problem, however, this was not always the case. One message that both projects discerned from this process is that even though inclusion of a well-written and linked data dictionary is key in data sharing, this is variably done in practice.
Mechanisms of data sharing
The various mechanisms available for sharing data and the advantages and disadvantages associated with them are given in table 2. In some cases, data are shared directly with, and analysed by, the researcher. A number of pharmaceutical companies have committed to sharing their data via hosted servers: Clinical Study Data Request data repository,20 Yale University Open Data Access project.21 In contrast to other sharing methodologies, analyses are performed on the host server with analyses having to be performed using specified software (SAS or Stata).
IPD in main publication
IPD may be published as part of the primary publication although this is only feasible for small trials. For example, data from Graf’s trial22 were extracted from the publication and reformatted into an analysable dataset.
Published data on the web
Data from completed trials may be published online so that they can be accessed free of charge, with no application to the data holder required. Data from the IST trial have been published in this way along with a data dictionary, which describes the dataset, including variable names and codes.23 The trial dataset was downloaded directly from the supplementary material of the publication and was immediately ready to analyse.
Researcher provides data on request
Researchers may store data at their institution and oversee the data sharing process each time there is a request. Typically, this required an email to the lead researcher, including details about the study. If agreed, transfer of data was made under a data sharing agreement with signature by both the data holder and the user. The researcher then provided the data in an analysable format such as an Excel spreadsheet or SPSS data file. In most cases, a description of the dataset was provided along with the data. Researchers were usually willing and able to provide answers to follow-up queries.
Purchase of a CD of data
IPD may be on a CD/DVD that can be bought on-line. The data from the National Institute of Neurological Disorders and Stroke (NINDS) alteplase trial24 was shared by this method. Once the data holder had agreed to share their data, the encrypted CD was sent by recorded post.
Academic collaboration data repository
A number of academic collaborations exist to facilitate data sharing of randomised controlled trials in a particular disease area. The collaboration is approached via email and an on-line application form; the form requests details of the study’s hypothesis, analysis methods, which trials are required, and contact details of those wishing to have access to the data. The application is reviewed by the collaboration’s steering committee and, following approval, the data are provided in an analysable format. Examples include the NINDS25 and NHLBI26 repositories and the Alzheimer’s Disease Collaborative Study (ADCS).27 Some repositories include IPD without randomisation details so that on-treatment analyses may not be possible, for example, VISTA.28
A number of pharmaceutical companies share their data via externally hosted servers: Clinical Study Data Request data repository (www.clinicalstudydatarequest.com),20 Yale University Open Data Access project (https://yoda.yale.edu).21 Examples include the PRoFESS and JASAP trials. Access is requested via an on-line application form, this requiring information on the study’s overview, rationale, aims and analysis methods, plans for dissemination of results and names of those wishing to access data. Once approved, a three-way contract between the commercial site, the company who own the data and the institution wishing to obtain the data is required. The process of negotiating the contract took several months, after which access to the remote analysis platform was granted. Some time was needed to set up access to the system and for the data user to become familiar with it. Statistical analysis code (written using SAS V.9.3 or V.9.4, or Stata V.15) was uploaded and run on the remote server. Download of results (typically point estimates, CIs and significance values) had to be approved by the data-owning company (presumably to prevent a direct export of the IPD). Approval could take a few hours to a few days, after which the results could be exported. During the studies, one website (https://yoda.yale.edu) changed the host statistical software from SAS to Stata requiring statistical analysis software to be re-written. In the case of another website (www.clinicalstudydatarequest.com), two of the pharmaceutical companies that had agreed to share data with one of the studies decided to withdraw all data from the repository, causing issues with data access and required revision of the original contracts.
The ongoing OA-Cog and OA-Prevention studies require data from completed trials to be shared. Despite difficulties in accessing data, there were researchers who were enthusiastic and forthcoming with support for our studies and with sharing their data for 78 trials. There are now several different mechanisms for data sharing available to researchers, a number of which were utilised to share data with our studies. Unfortunately, there was a high non-response rate with 149 (48%) lead researchers not responding to an email invitation despite multiple attempts to make contact. A large number of researchers also declined to share the data from their completed trial with the studies.
The difficulties we experienced have been encountered by others conducting research projects that require IPD from completed trials to be shared. For instance, Fleetcroft et al9 reported that of 30 eligible trials, 6 did not have available contact information for the lead researcher. Of the remaining 24, 18 (75.0%) did not respond and 5 (20.8%) declined to share their data. This could indicate that data sharing is not a high priority for many researchers. It is now expected that researchers should make available a copy of their published trial data when requested. Journals, such as the BMJ,4 now require researchers to make their trial data available for sharing in order for the primary results to be published. In addition, some also require that authors include a data availability statement within the final publication (PLoS5). Additionally, funding bodies such as the US National Institutes of Health and UK Medical Research Council1 2 also require researchers to agree to share their data within a given timeframe of completion of the trial before funding is granted. Such policies will help with future trials, but do not incentivise researchers of older trials to share their data.
Non-response from lead researchers was a major barrier to accessing IPD from trials. It may help that journals now require a contact email address with every publication. However, this does not solve the problem, as these studies have discovered. While the general consensus is that IPD from trials should be made available for sharing, but in reality requests for data are often missed or ignored. It is important that researchers are able to access older data. If the lead researcher moves on it should be the responsibility of the institution to continue support for potential data sharing. Ideally, it would be mandatory for the data custodian’s contact details to be made available in the primary publication and possibly in the trial’s registry information. Lack of resources for preparing data for sharing (table 1) is a common reason not to share IPD, as also seen in a survey of trialists.10 Some methods for sharing IPD require few resources for the data owner once the initial effort of preparing the data for sharing is done, for example, when utilising commercial websites and academic trial repositories. Storing the data within a repository can also take the burden of approving a request away from the lead researcher. However, this can be costly and may not be appropriate for all trialists. Resources should be set aside to enable proper archiving and sharing of data to take place once a trial is completed. To help trialists, it would be beneficial for a universal set of standards for data preparation, storage and sharing to be developed and promoted by organisations that mandate data sharing. Another concern many researchers have is that data from their study may not be used appropriately. It is the responsibility of the requester to provide full details regarding why the trial data are required; how it will be used and stored; how it will be published and whether the trialists will be co-authors and how and when it will be destroyed. A formal data sharing agreement/contract should be used for data security purposes. Patients may also have concerns about how their data will be used in the future. It is vital that consent from patients includes information about whether their data may be shared with other researchers and in what context. The MRC have published guidance for trialists on data sharing and informed patient consent.20
Strengths and limitations
A strength of this research is that both studies identified a large number of eligible trials across two separate disease areas, resulting in a large number of researchers being contacted. In addition, the requested trials had both academic and commercial sponsors, no restriction on country of origin and covered a wide timeframe (1985 to 2017); therefore, our results are likely to reflect a common experience in data sharing in vascular and dementia research. Limitations include using only email or letter to contact authors due to available resources. Telephone or face to face contact methods were not utilised and might have resulted in a different outcome for some trials. Next, it was not possible to survey non-responders to data requests, so any potential reasons for this should be considered as speculation. This is unfortunate, as it means that the true barriers to data sharing are not, at this moment in time, fully known.
In summary, there are multiple hurdles that still need to be overcome in order to facilitate more collaboration and sharing between researchers. Lack of response from researchers and unwillingness to share data make pooling studies such as these difficult to deliver in practice. Based on our experiences, data sharing still appears to depend on the enthusiasm of the lead researcher, despite it being widely agreed that IPD should be easier available. This troubling trend not only means that the results of trials cannot be checked by an independent researcher but that further patients can be subjected to experimental treatments when the data are already available. Patients have taken a risk in volunteering their time to participate in research and as researchers we owe it to them to gain as much knowledge as possible from the data they have provided. If the patient has consented to further use of their data, then the researcher has a responsibility to ensure this is done. However, whether expectations by funders and journals for investigators and authors to share data will improve this situation remains to be seen.
PS and LJW are joint first authors.
Contributors All authors are responsible for the design of the two studies. PS and LJW conducted the analyses and drafted the manuscript. PMB and AAM commented on the analyses and drafts of this paper and have seen and approved the final version.
Funding LW was funded in part by UK MRC ENOS (G0501797) and NIHR TARDIS (10/104/24). PS was funded in part by UK MRC ENOS (G0501797) and the Alzheimer’s Society and Stroke Association PODCAST (TSA 2008/09).
Competing interests PMB is Stroke Association Professor of Stroke Medicine and is an NIHR Senior Investigator. PMB was Chief Investigator for the ENOS, TARDIS and PODCAST trials, which were shared with the two studies.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data sharing not applicable as no datasets generated and/or analysed for this study. No additional data available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.