rss

<< Back to Resources for authors

Resources for data management and sharing

Where should data be stored?

At present there is no major repository for clinical data, but Dryad, has declared its willingness to accept medical datasets. You can start the deposition process while submitting to BMJ Open. Supplementary or underlying data related to your paper are accepted. Dryad will provide you with a DOI for your dataset to aid citation and provide a permanent link to the data. (Note Dryad hosts data using a CC0 licence - check that this is suitable for the data that you are depositing.)

The DataCite organisation has a growing list of repositories for research data.

Why share data?

Faster progress in improving health, better value for money and higher quality science were the three key benefits stressed by the UK's Medical Research Council, the Wellcome Trust, the US National Institutes of Health and others, who articulated their commitment to sharing data in a joint statement in 2011, and have developed policies and tools to assist their researchers to do so.

The UK Data Archive's comprehensive 'Managing And Sharing Data' document states how sharing data can encourage enquiry and debate, promote innovation and collaboration, maximise transparency and accountability, improve research methods, reduce the cost of unnecessary research duplication, increase the impact of research and credit to the researcher, and provide education and training resources.

As well as these 'public good' arguments, some researchers argue that there is also a citation advantage to be had from sharing data.

BioSharing includes a catalogue of data sharing policies and standards (reporting requirements, terminologies and exchange formats).

How to share data

Firstly, ensure people know it exists and is available. This is one reason why all BMJ Open articles include a data sharing statement, to help publicise the existence of data sets.

For data to be exploited to its maximum potential it is necessary for it not just to be accessible but intelligible and searchable. This is where standards for data preservation are required.

Standards cover what should be included in the dataset, 'ontologies' or controlled vocabularies for annotating datasets, and exchange formats, for facilitating sharing. A video explaining what an ontology is can be viewed here.

Researchers in other fields of science have been sharing data for years now, but standards for preserving and sharing medical data are still emerging.

Pragmatic and technical guidance on how to go about preparing your data suitably is available from various sources. A few are listed below.

UK Data Archive

Managing and Sharing Data (2011) is 'designed to help researchers and data managers...produce highest quality research data with the greatest potential for long-term use'.

Digital Curation Centre

The DCC provides advice on how to store, manage and protect digital data. Their site includes tools and applications, MRC data plan FAQs, information on data management plans, a list of funders policies, legal information and a developing series of 'how-to' guides.

BioSharing

The site includes a growing catalogue of standards to help ensure that 'experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated'.

Wellcome Trust

Provides Guidance for researchers: developing a data management and sharing plan

UK Medical Research Council

Provides tools and resources, including their Data and tissues toolkit and their Cohort dataset directory, plus a short glossary of common data-sharing terms.

National Cancer Research Institute

The NCRI Informatics Initiative 'supports the development of data standards and promotes a culture of data-sharing to facilitate storage and dissemination of research data'.

US National Institutes of Health

Resources include examples of data sharing plans alongside more general policy documents

In 2010 the BMJ published this paper on preparing raw clinical data for publication, addressing specifically the issue of de-identifying datasets.

How is data cited?

There is no standard for citing data or data sets yet, but consensus is building around the use of persistent identifiers such as the DOI (digital object identifier) already familiar to journal publishing, along with more conventional bibliographic information (authors/creators of the data set, year of 'publication', title).

Don't forget to sign up for content alerts to receive selected information relevant to your specialty interests and be the first to know when the latest research is published.