<< Back to Resources for authors
Resources for data management and sharing
Where should data be stored?
At present there is no major repository for clinical data, but Dryad, has declared its willingness to accept medical datasets. You can start the deposition process while submitting to BMJ Open.
Supplementary or underlying data related to your paper are accepted. Dryad will provide you with a DOI for your dataset to
aid citation and provide a permanent link to the data. (Note Dryad hosts data using a CC0 licence - check that this is suitable for the data that you are depositing.)
The DataCite organisation has a growing list of repositories for research data.
Why share data?
Faster progress in improving health, better value for money and higher quality science were the three key benefits stressed
by the UK's Medical Research Council, the Wellcome Trust, the US National Institutes of Health and others, who articulated their commitment to sharing data in a joint statement in 2011, and have developed policies and tools to assist their researchers to do so.
The UK Data Archive's comprehensive 'Managing And Sharing Data' document states how sharing data can encourage enquiry and debate, promote innovation and collaboration, maximise transparency
and accountability, improve research methods, reduce the cost of unnecessary research duplication, increase the impact of
research and credit to the researcher, and provide education and training resources.
As well as these 'public good' arguments, some researchers argue that there is also a citation advantage to be had from sharing data.
BioSharing includes a catalogue of data sharing policies and standards (reporting requirements, terminologies and exchange formats).
How to share data
Firstly, ensure people know it exists and is available. This is one reason why all BMJ Open articles include a data sharing
statement, to help publicise the existence of data sets.
For data to be exploited to its maximum potential it is necessary for it not just to be accessible but intelligible and
searchable. This is where standards for data preservation are required.
Standards cover what should be included in the dataset, 'ontologies' or controlled vocabularies for annotating datasets,
and exchange formats, for facilitating sharing. A video explaining what an ontology is can be viewed here.
Researchers in other fields of science have been sharing data for years now, but standards for preserving and sharing medical
data are still emerging.
Pragmatic and technical guidance on how to go about preparing your data suitably is available from various sources. A few
are listed below.
|
UK Data Archive |
Managing and Sharing Data (2011) is 'designed to help researchers and data managers...produce highest quality research data with the greatest potential for long-term use'. |
|
Digital Curation Centre |
The DCC provides advice on how to store, manage and protect digital data. Their site includes tools and applications, MRC data plan FAQs, information on data management plans, a list of funders policies, legal information and a developing series of 'how-to' guides. |
|
The site includes a growing catalogue of standards to help ensure that 'experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated'. |
|
|
Wellcome Trust |
Provides Guidance for researchers: developing a data management and sharing plan |
|
UK Medical Research Council |
Provides tools and resources, including their Data and tissues toolkit and their Cohort dataset directory, plus a short glossary of common data-sharing terms. |
|
National Cancer Research Institute |
The NCRI Informatics Initiative 'supports the development of data standards and promotes a culture of data-sharing to facilitate storage and dissemination of research data'. |
|
US National Institutes of Health |
Resources include examples of data sharing plans alongside more general policy documents |
In 2010 the BMJ published this paper on preparing raw clinical data for publication, addressing specifically the issue of de-identifying datasets.
How is data cited?
There is no standard for citing data or data sets yet, but consensus is building around the use of persistent identifiers
such as the DOI (digital object identifier) already familiar to journal publishing, along with more conventional bibliographic
information (authors/creators of the data set, year of 'publication', title).
Don't forget to sign up for content alerts to receive selected information relevant to your specialty interests and be the first to know when the latest research is published.







