Article Text

Protocol
Design and methodological characteristics of studies using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases: protocol for a meta-research study
  1. Ferrán Catalá-López1,2,3,
  2. Jane A Driver4,5,6,
  3. Matthew J Page7,
  4. Brian Hutton3,8,
  5. Manuel Ridao9,
  6. Clara Berrozpe-Villabona10,
  7. Adolfo Alonso-Arroyo11,12,
  8. Cristina A Fraga-Medín13,
  9. Enrique Bernal-Delgado9,
  10. Alfonso Valencia14,
  11. Rafael Tabarés-Seisdedos2
  1. 1 Department of Health Planning and Economics, National School of Public Health, Institute of Health Carlos III, Madrid, Spain
  2. 2 Department of Medicine, University of Valencia/INCLIVA Health Research Institute and Centro de Investigación en Red de Salud Mental (CIBERSAM), Valencia, Spain
  3. 3 Knowledge Synthesis Group, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
  4. 4 Geriatric Research Education and Clinical Center, Veterans Affairs Boston Healthcare System, Boston, Massachusetts, USA
  5. 5 Division of Aging, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
  6. 6 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
  7. 7 School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
  8. 8 School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
  9. 9 Instituto Aragonés de Ciencias de la Salud, Red de Investigación en Servicios de Salud en Enfermedades Crónicas (REDISSEC), Zaragoza, Spain
  10. 10 Miguel Servet University Hospital, Zaragoza, Spain
  11. 11 Department of History of Science and Documentation, University of Valencia, Valencia, Spain
  12. 12 Unidad de Información e Investigación Social y Sanitaria, University of Valencia, Spanish National Research Council, Valencia, Spain
  13. 13 National Library of Health Sciences, Institute of Health Carlos III, Madrid, Spain
  14. 14 Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
  1. Correspondence to Dr Ferrán Catalá-López; ferran_catala{at}outlook.com

Abstract

Introduction Health services generate large amounts of routine health data (eg, administrative databases, disease registries and electronic health records), which have important secondary uses for research. Increases in the availability and the ability to access and analyse large amounts of data represent a major opportunity for conducting studies on the possible relationships between complex diseases. The objective of this study will be to evaluate the design, methods and reporting of studies conducted using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases.

Methods and analysis This is the protocol for a meta-research study. We registered the study protocol within the Open Science Framework: https://osf.io/h2qjg. We will evaluate observational studies (eg, cohort and case–control) conducted using routinely collected health data for investigating the associations between cancer and neurodegenerative diseases (such as Alzheimer’s disease, amyotrophic lateral sclerosis/motor neuron disease, Huntington’s disease, multiple sclerosis and Parkinson’s disease). The following electronic databases will be searched (from their inception onwards): MEDLINE, Embase and Web of Science Core Collection. Screening and selection of articles will be conducted by at least two researchers. Potential discrepancies will be resolved via discussion. Design, methods and reporting characteristics in each article will be extracted using a standardised data extraction form. Information on general, methodological and transparency items will be reported. We will summarise our findings with tables and graphs (eg, bar charts, forest plots).

Ethics and dissemination Due to the nature of the proposed study, no ethical approval will be required. We plan to publish the full study in an open access peer-reviewed journal and disseminate the findings at scientific conferences and via social media. All data will be deposited in a cross-disciplinary public repository.

  • epidemiology
  • neurology
  • psychiatry
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This meta-research study will provide an overview of the methodological and reporting quality of studies conducted using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases.

  • This protocol increases transparency and completeness to the methods and definitions used in our planned meta-research study and that are applied to studies conducted using routinely collected health data in this area.

  • We do not plan to include grey literature, only research articles published in peer-reviewed journals.

  • Use of language restriction to English might exclude additional studies published in other languages.

Introduction

Cancer and neurological disorders, particularly age-related neurodegenerative diseases, are recognised as major causes of death and disease burden worldwide.1–3 Multiple epidemiological studies4–14 and reviews15–21 have examined the epidemiological associations between cancer and neurodegenerative diseases. A growing body of evidence suggests that neurodegenerative diseases may occur less frequently in cancer survivors, and vice versa.18 22–24 For example, some studies have found that cancer survivors have a decreased risk of Alzheimer’s disease and that people with Alzheimer’s disease have lower rates of cancer incidence.4–7 Other studies have suggested an inverse relation between Parkinson’s disease and most cancers.8–10 A link between cancer and neurodegeneration (the so-called, ‘inverse comorbidity’) is plausible as they share several genes and biological pathways.22–26 Non-biological factors (such as behaviours, diagnostic patterns or medications) might account also for some of these possible connections. It is also probable that spurious associations and inaccurate estimates might arise due to chance, bias and/or confounding in epidemiological studies available in the medical literature.27–29

Health services generate large amounts of routine health data (so-called ‘real world data’, such as administrative databases, disease registries and electronic health records), which have important secondary uses for research and evaluation. Increases in the availability of routine health data, and the ability to store, process, link, access and analyse large amounts of data represent a major opportunity for conducting studies on the possible relationships between complex (serious) diseases and other health events with abundant collected data.30–34 Using such high-scale data sources often involves challenges for research design, conduct and reporting of studies;35 36 for example, the description of databases’ characteristics, record linkage methodology and any validation of the codes or algorithms used to select the study population.

Unfortunately, poorly conducted or reported studies may be associated with increased potential for biases measures of association, limiting their usefulness. Several methodological research studies37–40 have previously underscored that the reporting of epidemiological studies is inconsistent. For example, Hemkens et al 40 investigated the quality of reporting in studies conducted using routinely collected health data on any clinical or epidemiological topic. A search of PubMed in June 2013 served to include a random sample of 124 articles published in 2012. The majority of studies (73.4%) focused on epidemiology. The reporting quality varied, with only 60.5% reporting the characteristics of data sources, 74.2% providing details of selection criteria of participants, 31.5% using the study design in the title or abstract (eg, ‘cohort’, ‘case–control’, ‘routinely collected data’ or ‘registry data’), 29.3% reporting methods of data linkage and 2.4% indicating data availability/sharing.

To the best of our knowledge, no systematic reviews of all relevant studies have specifically examined the methodological or reporting of research evaluating the epidemiological associations between complex diseases using routinely collected data. We present herein the protocol for a case meta-research (also known as ‘research of research’ or ‘meta-science’)41 study of studies conducted using observational routinely collected data, that can help better understanding research concerning the cancer and neurodegeneration ‘inverse comorbidity’ model.22–24

The objective of this study will be to evaluate the design, methods and reporting of epidemiological studies conducted using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases.

Methods and analysis

This is the protocol for a meta-research study. Our study protocol is part of a knowledge synthesis research programme on the epidemiological evidence for the associations between cancer and central nervous system disorders, which includes an ambitious ongoing umbrella review (a systematic collection and assessment of multiple systematic reviews and meta-analyses).19

This study protocol has been registered within the Open Science Framework (https://osf.io/h2qjg). Although the protocol is for a meta-research study, and not a systematic review of health interventions, our protocol is reported in accordance with the reporting guidance from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement42 43 with not applicable indicated for items not pertaining to meta-research studies (see online supplemental appendix 1). Methods and definitions have been chosen in consultation with methodological work,40 44–52 including guidance on preparing Cochrane Methodology Reviews.52

Supplemental material

Eligibility criteria

Detailed eligibility criteria have been developed according to the following: participants, study design, types of data, types of exposures and outcomes of interest, setting and language of publication.

Participants

We will include studies examining the human population (regardless of age and sex).

Study design

Eligible studies will include observational epidemiological studies including prospective cohort, retrospective cohort (also known as historical cohort studies) and case–control studies. We will include case–control studies regardless of whether the authors reported clear time frame of when the events occurred. However, when extracting data from these studies we will record whether (or not) the time frame was clear. Randomised controlled trials will be unavailable for our research question. Cross-sectional studies will be excluded because they cannot be used to infer causality due to the temporal link between cancer and neurodegenerative diseases cannot be established. We will also exclude reviews, meta-analyses, case series, case reports, in vitro studies and animal studies.

Types of data

Eligible studies can use any type of routinely collected health data. Routinely collected health data are defined as data collected for purposes other than research or without specific a priori research questions developed before collection.51 53 These would include a range of resources for research (eg, patient registries, disease registries), health planning (eg, administrative data), clinical management (eg, primary care databases, pharmacy data), documentation of clinical care (eg, electronic health records repositories) or epidemiological surveillance (eg, cancer registries, and other public health reporting data).51 53

Types of exposures and outcomes

Eligible studies must investigate the associations between cancer and neurodegenerative diseases. Neurodegenerative diseases54 will include: Alzheimer’s disease (International Classification of Diseases (ICD)−9: 331.0, 290.1; ICD-10: F00, G30), amyotrophic lateral sclerosis/motor neuron disease (ICD-9: 335.20; ICD-10: G12.2), Huntington’s disease (ICD-9: 294.1, 333.4; ICD-10: F02.2, G10), multiple sclerosis (ICD-9: 340–340.9; ICD-10: G35-G35.9) and Parkinson’s disease (ICD-9: 332–332.9; ICD-10: G20-G21.0, G21.2-G22.0). All malignant neoplasms (ICD-9: 140–209; ICD-10: C00-C97) and any site-specific cancer will be considered. We will include: (1) studies in which neurodegenerative disease was the exposure of interest and cancer incidence (eg, new case or hospitalisation) was the outcome and (2) studies in which cancer was the exposure of interest and incidence of a neurodegenerative disease (eg, new case or hospitalisation) was the outcome. Prognostic studies studying neurodegenerative diseases and mortality among patients with cancer or cancer and mortality among patients with neurodegenerative diseases will be excluded. Studies not presenting quantitative data on the associations between cancer and neurodegenerative diseases (eg, relative risks (RR) with 95% CIs, numbers of cases/population, observed and expected cases) or sufficient data for an association to be calculated will be excluded.

Setting

There will be no restriction by study setting.

Language of publication

Publications of studies will be limited to peer-reviewed journal articles written in English with an abstract available.

Information sources and search strategy

To provide a reliable summary of the literature, we will search the following electronic databases (from their inception onwards): MEDLINE through PubMed (National Library of Medicine, Bethesda, Maryland, USA), Embase though Elsevier platform (Elsevier B.V., Amsterdam, The Netherlands) and the Web of Science Core Collection (Clarivate Analytics, Philadelphia, Pennsylvania, USA). The initial literature searches in MEDLINE, Embase and the Web of Science will start on 15 November 2022.

Our main literature search will be peer-reviewed by two senior health information specialists using the Peer Review of Electronic Search Strategies (PRESS) checklist.55 The search strategy will include a broad range of terms and keywords related to ‘neurodegenerative diseases’, ‘cancer’, ‘epidemiological studies’ and ‘routine data/electronic health records/administrative data’. The search will integrate a filter for electronic health records provided by the National Library of Medicine.56 A draft search strategy for MEDLINE is provided online in the online supplemental appendix 2.

Supplemental material

To ensure literature saturation, the reference lists of studies selected for inclusion will be scanned for additional studies. We will also scan the reference lists of related systematic reviews and meta-analysis identified through the search. In addition, citation searches (eg, Science Citation Index Expanded via the Web of Science) will be carried out for studies selected for inclusion.

Screening

All articles identified from the literature searches will be screened by two researchers independently using the software Rayyan (Rayyan Systems, Cambridge, Massachusetts, USA).57 First, titles and abstracts of articles returned from initial searches will be screened based on the eligibility criteria outlined earlier. Second, full texts will be examined in detail and screened for eligibility. A form for screening full-text articles will be designed in Microsoft Excel (Microsoft, Seattle, Washington, USA) and pilot tested on a random sample of 10 articles. Third, references of all considered articles will be hand-searched to identify any relevant report missed in the search strategy. Any discrepancies here and throughout will be resolved through consultation to a third researcher, if necessary. A flow chart showing details of studies included and excluded at each stage of the selection process will be provided.

Sample size

We will not perform any sample size calculations since our meta-research study will include all the available studies that would meet the eligibility criteria.

Data extraction

Data for each of the included studies will be abstracted by two researchers, independently, and potential conflicts will be resolved through discussion. We will use predesigned forms that will be piloted initially on a small number of included articles. The data extracted from each article will be comprehensive in scope as we are addressing multiple characteristics of included studies. Full articles and supplementary materials with data and analyses will be examined for general and methodological characteristics, statements of publicly available full protocols and data sets, conflicts of interest and funding disclosures. We will review the final versions of the articles available online. All data will be extracted into Microsoft Excel spreadsheets.

The standardised data extraction form will include the following information of interest:

General characteristics, including study objective(s) and rationale

  • First author.

  • Year of publication.

  • Name of journal, and journal impact factor (eg, according to the latest Journal Citation Report at the time of data extraction).

  • Study design (eg, cohort or case–control).

  • Country.

  • Setting (eg, single-country or multi-country).

  • Time frame within which the study took place.

  • Study objective(s).

  • Main rationale for using routinely collected data (eg, increase study power, validation of findings in a second data source, other, not clearly stated).

  • Number of participants.

  • Characteristics of participants (eg, proportion of women, mean or median age).

  • Selection criteria of participants.

  • Details on exposures and outcomes (eg, new cases or hospitalisations).

Methodological characteristics

  • Characteristics of the analysed data sets.

  • Type of data (eg, administrative data, electronic health records/electronic medical records, registry, other).

  • Number of data sources (eg, single data source, two data sources, three data sources, four data sources, five or more data sources).

  • Details on methods of study population selection (eg, codes or algorithms used to identify subjects/participants).

  • Details on any validation (eg, of the codes or algorithms) used to select the study population (if applicable).

  • Type of data linkage across databases (eg, person-level, institutional-level, other, none).

  • Methods of record linkage of databases (eg, deterministic, probabilistic, machine learning, other, none).

  • Methods of linkage quality evaluation (if applicable).

  • Use of any flow diagram or other graphical display to demonstrate the data linkage process (if applicable).

  • Variables used for analyses listed and described in sufficient detail.

  • A complete list of codes and algorithms used to classify exposures, and outcomes.

  • A complete list of codes and algorithms used to classify potential confounders (eg, treatments administered, including chemotherapy).

  • Details on the data cleaning methods used in the study.

  • Statistical methods (eg, linear regression, logistic regression, Poisson regression, Cox regression, other).

  • Confounder control techniques (eg, crude/unadjusted analysis, multivariable analysis, propensity scores, matching, instrumental variables, other).

Main results and limitations

  • Unadjusted RR estimates, and if applicable, confounder-maximally adjusted RR estimates with the precision (eg, 95% CIs) from the included studies.

  • Discussion of the implications of using data that were not created or collected to answer the specific research question.

  • Discussion of potential biases (eg, misclassification, unmeasured confounding, missing data or changing eligibility over time).

Transparency and openness

  • Citation of a reporting guideline, such as the REporting of studies Conducted using Observational Routinely-collected Data (RECORD) statement51 (no citation, citation without reporting checklist, citation with reporting checklist).

  • Open access article or availability of free access in PubMed Central (PMC) based on assignment of a specific ID (PMCID) (yes, no).

  • Protocol/registration mentioned (no protocol, indicated that protocol was available on request, full protocol publicly available, full protocol publicly available and preregistered).

  • Mention of raw data availability (no data sharing, indicated that raw data were available on request, full access to raw data for reanalysis).

  • Mention of access to programming code used to perform analyses (no access, indicated that code was available on request, full access for reanalysis).

  • Type of data repository used, if appropriate (eg, Open Science Framework, Mendeley, Zenodo, Dryad, journal repository or other).

  • Funding (no statement, no funding, public, private, other, combination of public/private/other).

  • Conflicts of interests (no statement, statement no conflicts exist, statement conflicts exist).

Adherence to reporting standards

We will assess reporting quality and completeness of included studies against the RECORD statement (https://www.record-statement.org/).51 RECORD represents the current best practice reporting standard for studies using observational routinely collected health data. The RECORD statement consists of a checklist of items (see online supplemental appendix 3) that supplement or modify the STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) statement (https://www.strobe-statement.org/),58 59 which focused on the reporting of observational studies. The aim will be to assess whether included studies conformed to reporting recommendations included in the RECORD statement.

Supplemental material

We will operationalise all items of the checklist into dedicated questions that can be answered with ‘yes’, ‘no’ or ‘partly’, indicating adequate (‘yes’) or inadequate (‘no’) reporting. We will use the ‘partly’ answer when not all aspects are adequately reported, for example, when several eligibility criteria existed, but some of them are described, and others are not. This approach is consistent with previous methodological studies.40 In addition, we will accept references to other publications as adequate descriptions.

Methodological quality (or risk of bias) assessment

The methodological quality (or risk of bias) of included studies will be evaluated using the Newcastle-Ottawa Scale (NOS) for observational studies.60 Using the NOS tool, each study is judged on eight items, categorised into three groups: the selection of the study groups (eg, representativeness), the comparability of the groups (eg, matching in the design and/or confounders adjusted for in the analysis) and the ascertainment of either the exposure or outcome of interest for case–control or cohort studies, respectively. Stars are awarded for each item, and the highest methodological quality (or low risk of bias) studies are awarded up to nine stars. We will consider studies with 0–3, 4–6 and 7–9 stars to represent low, moderate and high quality, respectively. The methodological quality (or risk of bias) for each study will be independently assessed by two investigators. Discrepant scores will be resolved by discussion.

Summarising the evidence

We will summarise design, methods and reporting characteristics of the included studies with tables and graphical tools (eg, bar charts, forest plots). This will be done by constructing a clear descriptive summary on the included studies based on a common analytical framework on the study populations, study design, details of exposures and outcomes, key information about the methods, data sources, estimation procedures or accessibility of materials and raw data.

Data will be summarised as frequency for categorical items or median and IQR for continuous items. We will not perform a meta-analysis of pooled estimates since it is out of the scope of the planned meta-research study. Heterogeneity of included studies will be discussed narratively.

Additional analyses/subgroups

If sufficient studies report results separately, we plan to summarise design, methods and reporting characteristics of the included studies by types of exposures (eg, Alzheimer’s disease, amyotrophic lateral sclerosis/motor neuron disease, Huntington’s disease, multiple sclerosis and Parkinson’s disease), outcomes (eg, total cancer vs site-specific cancer), sex (male only vs female only) and study design (cohort vs case–control).

Software considerations

All analyses will be performed using Stata V.17 or higher (StataCorp LP, College Station, Texas, USA).

Patient and public involvement

The draft protocol was revised on receiving feedback from all the research team (including methodologists, scientists and healthcare professionals). Patients or the public were not involved in the setting of the research question, nor in developing plans for design/writing of our protocol. Patients or the public will not be asked to advice on the interpretation or writing up of findings.

Ethics and dissemination

Due to the nature of this study, no ethics approval is required as no human subjects will be involved. We plan to publish the full meta-research study in an open access peer-reviewed journal and disseminate the findings at scientific conferences and via social media (Twitter, and author affiliated websites).

Discussion

Using routinely collected data for research may represent a powerful approach to evaluate the epidemiological associations between complex diseases. However, such applications come with novel challenges and may create novel problems. Some biases are inherent to the observational designs but potential issues such as misclassification, unmeasured confounding or missing data are of particular importance when using routinely collected data. To the best of our knowledge, the planned meta-research study will be the first attempt to investigate the methodology and reporting of all studies conducted using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases. Guidance from the Cochrane Methodology Review Group,52 and from the Synthesis Without Meta-analysis reporting guideline61 will be followed during all the research process. The proposed meta-research study will be reported in accordance with the reporting guidance provided in the PRISMA 2020 statement (http://www.prisma-statement.org/).62 63 Any amendments made to our protocol when conducting the analyses will be outlined and reported in the final manuscript. All data underlying the findings reported in the final manuscript will be deposited in a cross-disciplinary public repository, such as the Open Science Framework (https://osf.io/).

There are several strengths and limitations of our planned methods. We will comprehensively evaluate the methodological and reporting quality of studies conducted using observational routinely collected health data for investigating the link between cancer and neurodegenerative diseases. We anticipate that we will identify knowledge gaps to be filled by new research considering that some methodological and reporting characteristics in studies using routinely collected health data will be poorly covered in the medical literature. A key challenge is that based on knowledge from previous studies,39 40 we anticipate identifying studies using different study designs, populations, outcomes and analyses with a variable quality of reporting.

Finally, we anticipate the study could be relevant to a variety of audiences (eg, research authors, health professionals, funders, journal editors). Moreover, the proposed meta-research study might offer insight into future research agendas for new studies conducted using routinely collected health data for investigating the epidemiological associations between cancer, neurodegeneration or other medical conditions and risk factors. In our opinion, a better understanding of the links between complex diseases might lead to new or improved forms of prevention and treatment.

Ethics statements

Patient consent for publication

References

Supplementary materials

Footnotes

  • Twitter @mjpages, @bh_epistat, @crisfragamedin, @alfons_valencia, @chromosome8

  • Contributors All authors contributed to conceptualising and designing the study. FC-L drafted the manuscript. JAD, MJP, BH, MR, CB-V, AA-A, CAF-M, EB-D, AV and RT-S commented for important intellectual content and made revisions. All authors read and approved the final version of the manuscript. FC-L accepts full responsibility for the finished manuscript and controlled the decision to publish.

  • Funding FC-L and RT-S are supported by the Institute of Health Carlos III/CIBERSAM. MJP is supported by an Australian Research Council Discovery Early Career Researcher Award (DE200101618). MR and EB-D are partially funded by the Spanish Health Services Research on Chronic Patients Network (REDISSEC)/Institute of Health Carlos III.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.