Article Text
Abstract
Objective To determine data sharing and number of publications coming from research databases that have been given a favourable opinion by UK National Health Service (NHS) Research Ethics Committees (RECs).
Design Cohort study.
Inclusion criteria & setting All research databases listed on the UK Health Research Authority’s Assessment Review Portal (HARP) that had received a favourable ethics opinion as of January 2018.
Main outcome measures Publications and data access requests are either listed on HARP or notified through subsequent email correspondence.
Results Out of 354 eligible databases, 34% had granted access requests and 40% had produced at least one peer-reviewed paper or conference abstract/talk. We could not establish contact with 9% of databases, and 19% reported no access requests or publications. Only 9% of databases were up to date with all annual reports. Email responses from database owners showed a range of attitudes towards data sharing.
Conclusion Less than half of research databases that have received a favourable opinion from NHS research ethics committees share their data and produce publications. There is also considerable variability in the operation of research databases and understanding of the purpose of research databases. This work was hampered by incomplete records due mainly to researchers not submitting annual reports.
- qualitative research
- ethics (see medical ethics)
- audit
- information management
- statistics & research methods
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- qualitative research
- ethics (see medical ethics)
- audit
- information management
- statistics & research methods
Strengths and limitations of this study
By using the UK Health Research Authority’s (HRA) Assessment Review Portal (HARP) database, we were able to identify all research databases using UK National Health Service data that were registered with the HRA as of January 2018.
We were able to identify both publications and access requests for the majority of databases.
We identified numerous incomplete records in HARP.
Research teams were not consistent in their definition of a research database, and it is likely that many relevant databases may not be registered with the Health Research Authority.
Introduction
As data analysis processes continue to evolve, research databases represent increasingly important resources within healthcare research, yet there is evidence that they are currently underused.1 In the UK, a research database is defined as:
…a structured collection of individual-level personal information, which is stored for potential research purposes beyond the life of a specific research project with defined endpoints. Research purposes in this context refers to analysis of data to answer research questions in multiple projects.2
The Health Research Authority (HRA) is the administrative body that convenes and organises research ethics committees (RECs) authorised to review studies involving human participants that take place within the National Health Service (NHS), as well as falling under certain legislation.3 Although most of the HRA’s functions apply to research undertaken in England, its role of coordinating policy and managing the Integrated Research Application System (IRAS) gives it close links to the other devolved UK nations (Scotland, Wales and Northern Ireland) including access to records for audit and service improvement purposes. Through IRAS, the HRA flags research database applications and provides a specific question set for researchers wishing to have their arrangements for collection, storage and use of data reviewed (including arrangements for the release of non-identifiable data for analysis by external researchers). This requirement is outlined in the UK-wide Governance Arrangements for Research Ethics Committees (GAfREC) policy whereby the:
…collection of personal information from past or present users of health or social care services, or use of previously collected information from which individual users of these services could be identified, either directly from that information or from its combination with other information in, or likely to come into, the possession of someone to whom the information is made available always requires an ethics review, however the review of more generalised database projects by ethics committees:
… may have benefits by facilitating programmes of research using information on human subjects without a need for specific project-based applications. Applicants may seek generic ethical approval extending to specific projects undertaken using the data, subject to conditions agreed with the REC.4
Consequently, in the UK, research databases differ from other types of research projects in that they are normally intended to be used multiple times, over a longer period of time, and perhaps by different research teams wishing to test a variety of hypotheses. When reviewing research databases, RECs consider the access arrangements being made available to researchers wishing to interrogate the database, including arrangements for subsequent publication of research results. Indeed, this means there is an implicit assumption that research databases will be used to generate many more publications than a normal research project.
In order to test this assumption, and to benchmark UK performance with other national studies,1 the HRA invited us to audit UK research database applications made through the IRAS. This request formed part of the wider ‘transparency agenda’ being pursued as a statutory duty by the HRA, but further encouraged by organisations such as the AllTrials campaign5 and the REWARD Alliance.6 A previous audit by the HRA showed that only one-third of regular projects reviewed by RECs publish their results,7 raising a subsequent concern that research database projects may also be underperforming in terms of publishing outputs.
Methods
The initial inclusion criterion for this audit was projects flagged as research databases on the HRA Assessment Review Portal (HARP) as of 1 January 2018. The number of eligible databases was then reduced using the following criteria:
Favourable ethics opinion.
Not a duplicate record or renewal request.
Not a Welsh application.
A Microsoft Access Database was created with an entry for each research database and information contained on HARP along with any uploaded annual or final reports was used to populate the database fields listed in the online supplemental table 1. Following the creation of the Access Database, primary contacts for all research databases were emailed (using the text in the online supplemental table 2) and asked to disclose the number of access requests and publications. Responses to this initial email were used to complete or update fields in the Access Database. Second emails were sent 5 weeks later to those who had not responded to the first email. A third and final emails were sent a further 6 weeks later (11 weeks after the initial email) to those who had not responded to the first two emails. Emails were loaded into NVivo8 and a content analysis was conducted by two investigators who subsequently discussed and agreed on consensus categories.
Supplemental material
Where conflicting information on a research database was noted from annual reports and subsequent email responses, the information from the email response was considered more up to date. The annual report template form was modified in 2011 adding a number of new fields, although some researchers continued using the older version of the form after this date. Reports on the old form did not contain all the information required for this audit leading to missing categories for some research database records.
Results
A total of 453 research databases were initially identified, but then reduced to 354 eligible databases after excluding 4 duplicates, Welsh databases (because only titles and reference numbers were included in HARP with no point of contact), and 90 HARP entries that were renewals of previous applications. These latter entries were difficult to initially identify as the titles and chief investigators were often not identical to the original studies. Many of these duplicates/renewals were only identified following email contact with researchers who complained that they had received two emails for the same database. Once identified, all duplicate applications were combined, and renewal applications were combined with their parent (initial) application, but the start date of the initial application was retained. The final list of 354 unique research databases had initial application dates ranging from May 2002 (when the first electronic records were compiled) to December 2017. The combination of data obtained from HARP and information obtained from annual and final reports was sufficient to fully populate the Access Database fields in 60 (17%) cases. Even following the three email contacts, complete records were only obtained for 223 (63%) of the research databases. Forty-four (12%) invalid email addresses were identified following the first email to primary contacts, and when the second contact was subsequently used, only 11 further responses were received. This left 33 (9%) databases that we were unable to contact. A few responses were received from individuals no longer involved with the research databases who provided updated contact details due to personnel changes.
Annual and final reports
The HRA stipulates that approved research databases submit annual reports for the duration that the database is collecting data and a final report if the database is closed.4 Figure 1 shows a summary of annual reports that had been uploaded to HARP prior to contacting researchers by email. Fifty-four (15%) research databases were less than a year old (meaning no annual report was yet due), and 108 (31%) had all or at least one annual report(s) on file. This left 192 (54%) research databases with no annual report on file despite these being due (none filed 39%+received not filed 15%). Thirteen research databases dated prior to 2012 had no information on whether any reports had been received or requested. HARP did contain evidence (in the form of reminder letters held on file) that annual reports had on occasion been asked for, but such chasing emails/letters had not been sent or recorded in a systematic manner. Similarly, there were 54 research databases where an annual report was noted as ‘received not filed’. Here it seemed that although a letter was filed on HARP acknowledging receipt of an annual report, no electronic report was present, although such reports may have been reviewed by the REC in hard copy but then not subsequently scanned and added to HARP. Of the 108 research databases with annual reports, only 32 (9% of all research databases in this study) were up to date with all reports.
Annual reports contained on HARP.
Most research databases did not have completion dates and thus were open ended. Final reports were present for 16 (5%) research databases and, following email contact, a further 4 (1%) research databases stated that they had closed. It is impossible to determine how many of the 33 research databases without valid contact details were now closed and thus due a final report.
Amendments
Amendments are different from annual reports as they can be submitted at any time and normally notify changes of methodology or notification of a significant event(s). One hundred and ten (31%) of research databases had at least one amendment recorded on HARP. Changes to database paperwork (such as version numbers, additional posters or advertising materials, changes of job title and so on) were the most common reason for an amendment with modifications of inclusion criteria, adding additional data linkages or including new participant groups, the next most common. Other less frequent amendments included changes in personnel, changes in process (different data capture methods or procedures), changes to location of the database and addition of new sites. No research databases reported any serious data breach.
Data access requests
The number of data access requests were known for 245 (69%) of the research databases. Of these, 123 (35% of total) reported no access requests, leaving all the access requests coming from only 122 (34% of total). Although the mean number of requests from these were 7.9, this was skewed by two outliers with 237 and 142, respectively. Of the 1948 total number of access requests, 1818 (93% of access requests) were granted. There were 52 requests noted as ‘pending consideration’ and 2 ‘withdrawn’. As over 90% of access requests were granted overall, we considered the ‘pending consideration’ requests as granted and the ‘withdrawn’ as not granted. Data summarising access requests and requests granted are presented in figure 2.
Data access requests received and granted grouped by frequency. Note, the ‘access requests granted’ columns are sometimes higher than the ‘access requests received’ columns because databases receiving multiple access requests did not always grant all of them.
Publications resulting from research databases
The publication status was determined for 230 (65%) of the research databases. ‘Publication’ was defined to include presentations, conference abstracts and articles submitted for publication in professional journals. Eighty-eight (25%) reported no publication, with 142 (40%) declaring the 1868 publications. This gave a mean number of publications for all research databases with known publication status of 8.1, but this average is skewed by one major outlier with 315 publications, and a further two with over 80 publications. Thirty-one (9% of total) research databases had only one publication. Distribution of the number of publications coming from the research databases is shown in figure 3.
Distribution of publications coming from the research databases.
Age of research databases
Previous research looking at publication rates of projects reviewed by HRA RECs indicated that most projects take at least 4 years before a resulting publication in a peer-reviewed journal is produced.7 It might also be expected that the older a research database is the more likely it will be for other researchers to know about it and thus make a data access request. Here, the number of research databases approved per year is shown in figure 4, although it should be noted that some databases may have been in operation prior to the HRA application date. The mean age of all the research databases was 4.7 years, whereas the mean age of research databases with at least one publication was 5.8 years. Interestingly this compares with the mean age of a research database with at least one access request being 6.5 years. Figure 5 shows the total number of publications and access requests granted by the age of the database.
Number and year of research database applications. HARP, Health Research Authority’s Assessment Review Portal.
Cumulative numbers of publications (grey line) and access requests (black line) granted for all databases by age of databases (eg, at 5 years there are 641 access requests and 828 publications for all databases 5 years old and younger).
Relationship between response to the audit, data access, publication and age
MedCalc9 was used to calculate ORs. There was a strong negative relationship between registration prior to 2012 and response to the audit (OR=0.52; p=0.005; 95% CI 0.32–0.82). There was no significant relationship between age and publication status (OR=1.27; p=0.27; CI 0.82–1.97). As previous evidence suggests publication becomes more likely after 4 years,7 10 we looked to see if a similar pattern emerged here by splitting the data into research databases younger and older than 4 years but did not find any significant relationship (OR=1.28; p=0.26; 95% CI 0.83–1.99). However, research databases with at least one data access request granted were significantly more likely to report at least one publication (OR=13.77; p<0.0001; 95% CI 7.75–24.45). Out of the 354 research databases, 18 made some mention of patient or participant involvement (PPI) in their annual reports. This was strongly associated with having at least one publication or data request or both (OR=18.7; p<0.005; 95% CI 2.46–142.12).
Observations from correspondence with investigators
A total of 95 replies were received in response to our first email, 56 from the second, and a further 77 from the third. Responses often included comments explaining or further clarifying answers to the three questions asked. A representative sample of responses are summarised in the online supplemental information. Following coding, responses were grouped into two main categories: ‘Database access and sharing’ and ‘Database management’ as outlined in the online supplemental table 3.
Discussion
HARP is the authoritative database of all studies reviewed and given a favourable opinion by UK RECs. However, one important finding from this study was that the data contained in HARP were incomplete and in some cases inaccurate. The main reason for this was the failure by researchers to send in required reports. There was also evidence that reports had been received in hard copy, perhaps viewed by the REC, but then not subsequently scanned and filed on HARP. Although it must be acknowledged that the HRA can only populate HARP with the information it is given by researchers, this study seems to provide evidence to support the argument that more could be done by the HRA to ensure that their records are complete and accurate. Information about data access requests received, granted, and publications relating to the database, could only be obtained for 60 (17%) of research databases based purely on the information in HARP, rising to 226 (64%) following email contact with the research teams. Concerningly, we did not have valid contact details for 33 (almost 10%) of the research databases, and although we gave up after three attempts, the HRA may need to follow these up further with the research sponsor. It was interesting, although perhaps to be expected, that the older databases were statistically less likely to reply to emails. Combining the 226 where we were able to obtain the necessary data items, with the 33 that could not be contacted, we were still left with 95 (27%) databases where even following email contact, not all the data we required was gathered.
As the concept of a research database includes storing and making data available for longer periods of time, it was not surprising that only a small number had provided final reports (indicating that the database was closed or closing). The email responses that we received indicated a number of reasons for closing databases including lack of funding, failure to gather the intended information, or changes in policies/legislation/clinical practice making the research database no longer relevant. However, although there are legal restrictions on the storage of identifiable patient information (through legislation such as the European General Data Protection Regulation), concerns regarding reproducibility and the importance of ‘Open data’ 11 mean that archiving of anonymised datasets either by sponsors or perhaps through other national or international arrangements is increasingly becoming expected. Further guidance from the HRA on what to do with ‘closed’ research databases could be useful.
Despite research databases existing to store and share data, 67 (19% of total) reported that they had neither a publication nor allowed data access to other researchers. Of the remaining, 116 (33%) had granted access requests and (a mostly overlapping) 142 (40%) had produced publications (the discrepancy from 100% is due to having no information for 30%, and a smaller number with only partial information). The mean numbers of data access requests (7.9) and publications (8.1) per database (where these figures were known) could be viewed as indicating that the 30% or so of research databases that share data or publish are doing very well; however, these averages are distorted by a small number of very successful databases such as the I-DSD (International Disorders of Sex Development) Research Database with 237 granted access requests and 14 publications.12 Similarly, the Searchable Online Database for MRC UK Brain Banks Network reported 142 granted access requests and 315 publications. Another large research database, the South London and Maudsley NHS Foundation Trust Biomedical Research Centre Case Register (SLaM BRC)13 had 104 access requests granted, and although they named only a few publications, they did advise that an online search would undoubtedly find more. This suggests that for the larger databases the number of publications recorded here could be an underestimate. Interestingly, the features of these more successful research databases included long-running support from large institutions and research councils, coupled with charity and institutional funding. They also seemed to show evidence of collaborative working with many contributing sites and participant involvement initiatives.
Calculating ORs did not demonstrate a link between age and data access or publications, but an increase in publications compared with access requests for databases aged between 4 and 8 years (figure 5) supports observations from other studies7 10 14 that it takes researchers about 4 or so years to obtain and analyse results, and then produce their first publication. However, there were fewer research database applications in the 2013–2015 period (figure 4), perhaps distorting our results. ORs did, however, demonstrate a strong correlation between the granting of at least one access request and producing at least one publication (OR=13.77; p<0.0001; 95% CI 7.75–24.45). Interestingly, the average age of a database with one publication was 5.8 years, whereas the average for at least one granted access request was 6.5 years, indicating that many publications came from the database owners themselves. This may reflect the time taken to set up the database in the first place whereupon following the first publication other researchers become aware of the database and subsequently request access.
This fact that only 34% of research databases reported granting access requests, and 40% reported publications, is concerning ethically. This may be especially from the perspective of research participants who have initially given consent for their data to be included in a research database with the belief that their data would be shared widely and thus be of use to multiple projects. Although the email responses from researchers did provide some valid reasons for not sharing data or publishing papers (for instance the research database being designed as part of a feasibility study, as a prospective participant registry or concerns around the possibility of reidentifying participants if the data were combined with other information held by third party researchers), more could be done to encourage researchers to at least acknowledge the database in their other work or publications,15 and thus remain accountable to the participants who contributed.
Analysis of the email responses also indicated a certain level of confusion over what constitutes a research database. In one case, the researchers admitted that they had flagged their work as a research database in error, in another case, an application was not renewed when the research team realised that an ongoing favourable ethics opinion was not required for their specific type of study, and in other cases applications that had previously been flagged as another type of study were subsequently reflagged as research databases or vice versa. One database reported that they only chose to register as a research database to enable them to share information with a funder, and others admitted that they found it easier to apply as a research database rather than as a specific project so that they could share their data with collaborators and also use it for many different projects that had not yet been designed. Here the implication was that by calling their work a research database, it would allow them more flexibility to use their own data.
Along with incorrect flagging, other reasons given for not sharing or publishing data included a lack of resources in terms of staffing or the funding required to promote the database as a resource. Here it was interesting to note that some of the research databases with the most access requests granted did charge to cover costs, and advertised these costs along with their access arrangements via their websites.12
A number of studies justified the lack of access or publications by the amount of time required to gather enough data to make analysis worthwhile. Though this might be expected for databases within the first few years since application, some much older studies also used this excuse. This echoes evidence from elsewhere regarding no standard definition of what a reasonable time to prepare for data sharing might be,16–18 although it may also be a consequence of some extremely long-running cohort studies being included in our sample.
One promising finding from this study was the high percentage of data access requests that were granted (93%). Here it was interesting to note that some databases reported screening requests or working with people wanting to make potential requests to ensure that the requests were suitable. Others reported lengthy application processes or publicising very specific approval criteria to try and reduce the number of rejected requests.
Study limitations
The major limitation of this study was the incomplete records on the HARP database along with the absence of annual reports. Furthermore, a pragmatic decision was made to limit the questions sent to researchers in our subsequent emails rather than send a more extensive survey or questionnaire. This resulted in often ambiguous replies from researchers making it difficult to complete all the fields in our dataset. An improvement to our methodology would therefore have involved sending a formal questionnaire or data entry form, perhaps similar to the templates produced by the HRA for final and annual reports.
We also only looked at studies that had been prospectively labelled as databases. It would be interesting to determine how many other types of studies subsequently decided to establish databases as part of their open access/data sharing arrangements. This would not be a trivial task as it would involve writing to all chief investigators registered on HARP (many tens of thousands), but would potentially identify further relevant databases.
We also accepted a wide definition of the term ‘publication’ to include peer-reviewed publications, conference abstracts, posters and presentations. This was a potentially contentious decision as although peer-reviewed research papers are the ‘gold standard’ of scientific publication, there are a variety of other dissemination methods that are appropriate depending on the situation.19 For instance, the recent emphasis on ‘Patient and Public Involvement’ (PPI) has tried to encourage researchers to produce bulletins and research summaries that are lay friendly and accessible.20 Although this should not be the only way research is disseminated, it is entirely valid for the purpose of maintaining accountability with research participants. It would therefore be a valuable future piece of work to determine what ‘appropriate’ or ‘sufficient’ publication/dissemination may look like for a research database. Interestingly, the databases in our study that produced newsletters and bulletins as part of their PPI work were more likely to report publications and share their data with other researchers.
Acknowledgments
We thank Dr Janet Messer for support during this research and feedback on our initial report to the HRA.
Footnotes
Contributors SEK devised the project, arranged for funding and supervised ST. ST conducted the research and drafted the manuscript. MB supervised the qualitative elements of the research and coded the email responses in addition to ST. All authors contributed to the final manuscript.
Funding This work was funded by the Health Research Authority (#15182) in the form of tuition fees and a small stipend for ST to complete a MRes under the supervision of SEK.
Competing interests SEK is chair of the Hampshire A HRA research ethics committee, the MOD research ethics committee, and a member of the HRA’s Confidentiality Advisory Group (CAG). He is also an academic and the ethics advisor at the University of Portsmouth. ST is a lay member of the Hampshire B HRA research ethics committee. MB has no competing interests.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Ethics approval As an audit of existing records (secondary data) held by a regulator this work did not require an ethics review. Both SEK and ST have regular access to HARP records as part of their roles on HRA research ethics committees.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data may be obtained from a third party and are not publicly available. This study analysed data held in confidence by the UK Health Research Authority (HRA). Permission to access the original data can be requested by contacting the HRA.