Article Text
Abstract
Objective This project aimed to develop and propose a standardised reporting guideline for kidney disease research and clinical data reporting, in order to improve kidney disease data quality and integrity, and combat challenges associated with the management and challenges of ‘Big Data’.
Methods A list of recommendations was proposed for the reporting guideline based on the systematic review and consolidation of previously published data collection and reporting standards, including PhenX measures and Minimal Information about a Proteomics Experiment (MIAPE). Thereafter, these recommendations were reviewed by domain-specialists using an online survey, developed in Research Electronic Data Capture (REDCap). Following interpretation and consolidation of the survey results, the recommendations were mapped to existing ontologies using Zooma, Ontology Lookup Service and the Bioportal search engine. Additionally, an associated eXtensible Markup Language schema was created for the REDCap implementation to increase user friendliness and adoption.
Results The online survey was completed by 53 respondents; the majority of respondents were dual clinician-researchers (57%), based in Australia (35%), Africa (33%) and North America (22%). Data elements within the reporting standard were identified as participant-level, study-level and experiment-level information, further subdivided into essential or optional information.
Conclusion The reporting guideline is readily employable for kidney disease research projects, and also adaptable for clinical utility. The adoption of the reporting guideline in kidney disease research can increase data quality and the value for long-term preservation, ensuring researchers gain the maximum benefit from their collected and generated data.
- data standardisation
- data reporting
- FAIR
- H3ABioNet
- kidney disease
- reporting guideline
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
The reporting guideline references and is mapped to standardised collection measures and biomedical ontologies found in the Phenotypes and eXposures Toolkit, the Ontology Lookup Service and Bioportal, respectively.
The reporting guideline was reviewed by global domain-specialists.
Limited survey feedback was received from Asian and South American countries.
An eXtensible Markup Language (XML) schema was developed to promote user-friendliness and maintain the relationship between the reporting guideline’s sub-sections.
The XML schema does not inherently incorporate mapped ontologies; therefore, strategies are being investigated to automate the process.
Introduction
‘Big Data’ and bioinformatics have become crucial components of modern biomedical research and healthcare.1 In biomedical research, ‘Big Data’, commonly characterised by volume and variety, refers to data sets which are too large or complex to be analysed using traditional methods, often requiring the use of computational analyses to derive biological meaning.2 Biomedical ‘Big Data’ has many potential fields of application, including personalised medicine, predictive modelling and clinical decision support, and disease and safety surveillance.3 In the field of nephrology, the use of ‘Big Data’ has led to the improved understanding of the pathological processes and the underlying aetiologies associated with kidney disease,4 as well as the increased identification of genetic kidney diseases5 6 and novel diagnostic methods for these diseases.7
Challenges associated with the data lifecycle, including the collection, management, storage and analysis of data, hamper the use and potential benefit of ‘Big Data’. These factors are compounded by additional biomedical research challenges, such as the inability to recruit sufficient sample sizes, as well as the lack of research capacity, funding and infrastructure, especially in low-income regions.8 9 Additionally, the use of ill-defined ontologies, data dictionaries and data management plans, contribute to data incompatibility and prevents researchers from reaping the maximum benefit from their collected and generated data.10
Standardising clinical and research data collection, reporting, management or storage can combat these challenges, supporting the effective integration of ‘Big Data’ and bioinformatics in biomedical research,11 12 enhancing data compatibility, interoperability, reproducibility and reuse,12 and facilitating data sharing and collaboration.11 The use of biomedical reporting standards and ontologies facilitate data standardisation by promoting the use of or adherence to common terminology and(or) reporting criteria.10 13 To this end, several initiatives have been driving data standardisation efforts in biomedical research. The consensus measures for Phenotypes and eXposures (PhenX) Toolkit (www.phenxtoolkit.org), has proposed phenotype collection tools for harmonised data collection, although some tools are limited in terms of applicability to low-resource settings.14 Similar aims are being driven by FAIRsharing (www.fairsharing.org),15 a dynamic standards database which aims to promote FAIR (findable, accessible, interoperable and reusable) principles,16 and the Global Alliance for Genomics and Health (www.ga4gh.org), a policy-framing and technical standards-setting organisation. Multiple kidney-associated ontologies which define known kidney diseases and assist routine data studies and case identification have previously been developed, including the chronic kidney disease ontology,17 as well as the renal subsections in the gene ontology, the human phenotype ontology18 and the Systematized Nomenclature of Medicine-Clinical Terms.19 However, no reporting guideline has previously been constructed for kidney disease clinical and research data reporting.
The Human Heredity and Health in Africa’s (H3Africa) Bioinformatics Network’s (H3ABioNet, www.h3abionet.org)20 21 Data & Standards work package aims to develop domain-specific data reporting standards and data dictionaries, applicable to the H3Africa consortium, in order to specifically address the data management concerns in low-resource and low-income regions, affected by global health concerns, but lacking capacity to address these concerns. By consolidating and harmonising several published ontologies, collection standards and reporting standards, and consulting domain-specialists through an open online survey, the project drew from the experience of previous standardisation initiatives and aimed to develop a multipurpose reporting guideline which focused on the reporting of both clinical and research data within the kidney disease field, entitled, ‘The Minimum Information Required Guideline for Kidney Disease: Research and Clinical Data Reporting (Version 1.0)’.
Methods
Patient involvement
No patients or public were included in the methodology of the study. The survey employed in the study was strictly distributed to domain-specialists, hereafter defined.
Development of draft
Following the review of previously published literature and standards, several recommendations for the ‘Minimum Information Required Guideline: Kidney Disease Research and Clinical Data Reporting’ standards were proposed. The standards included were separated into two streams, the standards relevant to the collection of clinical data and those relevant to the reporting of research data. The standards relevant to clinical data collection included the H3Africa Standard Case Report Form (www.h3abionet.org/data-standards/datastds), the CKDO and various collection measures hosted on PhenX. The standards relevant to research data reporting included various experimental reporting guidelines hosted on FAIRsharing, such as MIAPE, MIDE, MIRAGE, MINSEQE, MIAME and more, from which common study-specific and experiment-specific elements were derived. Based on these recommendations, a reporting standard was drafted, which divided the proposed recommendations into three subsections; participant (patient), study-level and experiment-level information. The developed draft aimed to be both comprehensive and adaptable for both acquired and inherited kidney diseases, containing and querying elements specific to one or both types. Thereafter, recommendations (henceforth referred to as elements) were manually defined using ontologies found through the BioPortal search engine,22 the Ontology Lookup Service at the European Bioinformatics Institute23 and the Zooma annotation tool.
Online survey
To remove any existing reporting inconsistencies, domain-specialists, consisting of kidney disease researchers and clinicians, were consulted to review the proposed elements using an online survey. Domain-specialists were defined as both clinicians and researchers that have been involved in kidney disease research for at least a year, as part of an existing collaborative kidney disease research group or network (including the H3Africa Kidney Disease Research Network, the Australian KidGen Collaborative and Renal Genetics Flagships, Kidney Research UK and The Renal Network) and contacted via email. Domain-specialists were asked to evaluate, harmonise and consolidate the proposed elements, as well as identify which elements represented essential (E) or optional (O) information, and propose additional elements. Elements were classified as either E or O based on the E% percentage of E votes received. This percentage was calculated by dividing the total number of E votes by the number of votes made for a given element. Elements with lower than 50% were classified as O, while elements higher than 70% were classified as E, and elements within the 50–70 E% were classified with discretion based on correlations with the previously developed reporting guidelines and standard collection measures. Additional suggestions, not included in the draft, made by respondents were similarly classified.
Because the survey was constructed to be open, no limitations were set with regard to the number of participants, and respondents were encouraged to distribute the survey within their own networks. The online survey was developed, and study data were collected and managed using Research Electronic Data Capture (REDCap),24 hosted at The Centre for Proteomic and Genomic Research. The online survey consisted of 4 sections and 77 fields (online supplementary file 1). REDCap was employed for security and maintenance purposes.
Supplemental material
Development of eXtensible Markup Language
To supplement usability and user-friendliness, an associated eXtensible Markup Language (XML) schema was designed to carry all the data and metadata within the reporting guideline and allow data exchange between dissimilar systems. The XML schema defines the rules of validation for each element, as well as the datatype, atomic units and validation rules for each element, to ensure reporting correctness. Additionally, due to its user-friendliness and availability to research institutions worldwide, the XML schema was designed for implementation in REDCap.
Results
The online survey was completed by 53 international domain specialists. Of these respondents, 29% were working as clinicians, 14% were working as researchers and 57% were working as dual clinician-researchers. The majority of respondents had between 10 and 20 years’ experience in the field (41%), while 37% of respondents had more than 20 years’ experience in the field, 13% had more than 5 years’ experience’ in the field and 9% had less than 5 years’ experience in the field. The majority of respondents were based in Australia (35%), followed by Africa (33%), North America (22%), Europe (9%) and Asia (2%). The raw survey results can be found in online supplementary file 2. Figures 1 and 2 illustrate the survey response to the proposed elements. Furthermore, respondents also proposed additional elements which shaped the final structure of the reporting guideline, including, but not limited to, congenital conditions, histopathology, language, physical activity and more.
Supplemental material
Survey response to proposed participant-level information.
Survey response to proposed study- and experiment-level information.
The Minimum Information Required Guideline: Kidney Disease Research and Clinical Data Reporting is summarised in table 1.
The Minimum Information Required Quideline: Kidney Disease Research and Clinical Data Reporting (version 1.0)
The quintessential information reported using the standard can be separated into three fields; participant-level, study-level and experiment-level information. The standard further divides elements into essential and optional information. Optional elements refer to information which is not necessary for the interoperation of studies within the same field but useful for integrating studies from varying disease fields. Participant-level information contains 13 subsections of varying essential and optional elements, including demographics, lifestyle factors, anthropometrics, blood pressure, adverse drug reactions, Urine-Related Test Index, kidney disease history, sample-specific information, kidney disease-related information, prescribed medication, non-prescribed medication and therapy. Study-level information includes various elements which describe the details of a given study, including essential elements such as study ID, research institute and study design, and optional elements such as study duration, study start date and Pubmed unique identifier. Finally, experiment-level information includes various elements which describe the various experiments within a given study, including essential elements such as biospecimen type, instrumentation employed, sample management protocol, quality control protocol and experimental aim, and optional elements such as output location which describes where the data will be saved.
The complete reporting guideline can be obtained from both the H3ABioNet website (www.h3abionet.org/data-standards/datastds) as well as FAIRsharing (https://fairsharing.org/bsg-s001385/), specifying each element’s data type, collection format and (or) accepted values, and related ontologies and standards. Herein, the Ontology ID column contains the most appropriate ontology which the element is mapped to while the Concordant Ontologies and Concordant Standards columns describe ontologies and standards which include similar data elements. These lists are not meant to be comprehensive or exhaustive, but to illustrate the utilisation and overlap with existing resources. A comprehensive guideline explaining how to employ the reporting guideline, along with the associated REDCap XML schema, locally can be found in online supplementary file 3. In addition, online supplementary file 4 contains an example entry of the reporting guideline, and online supplementary file 5 contains an illustration of the REDCap XML schema.
Supplemental material
Supplemental material
Supplemental material
Discussion
The Minimum Information Required Guideline: Kidney Disease Research and Clinical Data Reporting is a freely accessible, harmonised reporting guideline which can be employed or adapted for kidney disease research and healthcare and categorises information as essential or optional, as well as participant, study and experiment specific. Standardising how this information is captured, deposited, shared and published in a comparable and consistent manner is crucial for researchers to better understand a given study and subsequently interpret the data generated and conclusions made. The primary intent of the reporting guideline is to encourage harmonised data collection when launching new projects within the kidney disease research field. Ultimately, this will enhance the overall research community’s capacity for conducting high-quality, interoperable and reusable research, adding long-term value to the collected clinical data and generated research data and encouraging more collaborative efforts worldwide. Similarly, the reporting guideline can also be employed retrospectively for data abstraction from existing or ongoing studies when reporting to a larger database, enabling the previously mentioned efforts.
Although certain elements within the standard can be incorporated into a case report form, the reporting guideline contains elements that need to be completed specifically by healthcare or research professionals, therefore the reporting standard is designed for use by research clinicians and healthcare workers, researchers, data managers and bioinformaticians involved in kidney disease research. The reporting guideline was not developed to replace the case report form but rather to provide a set of data reporting rules for researchers to adhere too. Defining the information as essential or optional permits the reporting guideline to be adaptable for both acquired and inherited kidney disease research, therefore elements such as congenital conditions and histopathology are defined as optional. The reporting standard goes beyond listing ‘minimum required’ data elements and aims to provide a comprehensive data dictionary, with standardised response options, which can be adapted for broad use. Therefore, employing the reporting standard allows comprehensive characterisation of research studies being conducted in the kidney disease research field, as well as the experiments and participants within these studies, supporting integrative analysis and improved biological interpretation.
The reporting standard is also accompanied by an associated REDCap XML schema. This was done to enable user friendliness and broad adoption of the standard as a data capturing and governance tool, allowing accurate and seamless duplication and reuse.25 XML has been used extensively for describing data in many applications for storage or transport.25 The language, by its design, allows for extensibility and self-description. Its openly documented standards, wide adoption and support in many applications and existing tools make it a good first choice for describing scientific data that is exchanged between healthcare systems.25 It has previously been used in health reporting for such purposes.26 27 Currently, ontologies cannot be intrinsically linked to the guideline elements within the REDCap XML. In the future, we aim to provide base XML schemas which are adaptable for broad implementation on various data capturing platforms. This will allow us to link the guideline elements to the mapped ontologies. Ultimately, the ontologies serve to promote FAIR reporting by adding an underlying layer of metadata and understanding to the overall dataset.
Broad adoption of the developed reporting standard has the potential to significantly reduce data and reporting inconsistency and redundancy across systems, promoting collaboration and(or) interoperability between projects.28 29 Promoting such large-scale use could allow for improved data mapping in clinical registries, improving data quality and interoperability.30 As previously exhibited in oncology research, broad adoption of a reporting standard can maximise the value and impact of research studies as well as the associated research data.31 This is because research redundancy is reduced, and interpretable research outputs and comprehensive datasets are produced.31 A given standard may be more widely adopted if advocated by databases, funding bodies and scientific journals, geared towards kidney disease research, specifically.
The Minimum Information Required Guideline: Kidney Disease Research and Clinical Data Reporting aims to promote FAIR reporting and will therefore be added to the FAIRsharing database, as this allows for continuous record maintenance and improvement, providing a point of contact for the standard, as well as related support material (https://doi.org/10.25504/FAIRsharing.fCAD2Z).15 Bearing in mind the diverse target group the reporting standard aims to accommodate, various methods of implementation will be investigated to provide comprehensive solutions for collaborative efforts. Additional elements will be investigated for incorporation into the standard, including environmental factors, dyslipidaemia and diet.
To promote the adoption of the reporting guideline, we hope to employ the reporting guideline within our own consortia studies, and advocate use on an international platform. Ultimately, the reporting guideline has the potential to support both the H3Africa community as well as the kidney disease research community at large with current and future research.
Acknowledgments
The authors would like to acknowledge the kidney disease researchers, clinicians and hospital departments who participated and contributed to the consolidation and finalisation of the reporting guideline. Patients were not involved in the reported project.
References
Footnotes
Twitter @ZahraMungloo
Contributors The current project was designed by JK and supervised by NM. LZ, MC, CVW and AM were the primary developers of the reporting guidelines and drafted the manuscript. KJ, SB, ZMD and CP drafted and corrected the manuscript. MT and MM were the primary developers of the XML files. All authors read and approved the final manuscript.
Funding H3ABioNet is supported by the National Institutes of Health Common Fund under grant number U41HG006941. The content of this publication is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The Australian Genomics Health Alliance is supported by the National Health and Medical Research Council under grant number APP1113531.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.