Article Text

Original research
Measuring concordance of data sources used for infectious disease research in the USA: a retrospective data analysis
  1. Maimuna S Majumder1,2,
  2. Marika Cusick3,
  3. Sherri Rose3
  1. 1Computational Health Informatics, Boston Children's Hospital, Boston, Massachusetts, USA
  2. 2Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
  3. 3Department of Health Policy, Stanford University, Stanford, California, USA
  1. Correspondence to Dr Maimuna S Majumder; maimuna.majumder{at}childrens.harvard.edu

Abstract

Objectives As highlighted by the COVID-19 pandemic, researchers are eager to make use of a wide variety of data sources, both government-sponsored and alternative, to characterise the epidemiology of infectious diseases. The objective of this study is to investigate the strengths and limitations of sources currently being used for research.

Design Retrospective descriptive analysis.

Primary and secondary outcome measures Yearly number of national-level and state-level disease-specific case counts and disease clusters for three diseases (measles, mumps and varicella) during a 5-year study period (2013–2017) across four different data sources: Optum (health insurance billing claims data), HealthMap (online news surveillance data), Morbidity and Mortality Weekly Reports (official government reports) and National Notifiable Disease Surveillance System (government case surveillance data).

Results Our study demonstrated drastic differences in reported infectious disease incidence across data sources. When compared with the other three sources of interest, Optum data showed substantially higher, implausible standardised case counts for all three diseases. Although there was some concordance in identified state-level case counts and disease clusters, all four sources identified variations in state-level reporting.

Conclusions Researchers should consider data source limitations when attempting to characterise the epidemiology of infectious diseases. Some data sources, such as billing claims data, may be unsuitable for epidemiological research within the infectious disease context.

  • EPIDEMIOLOGY
  • Health policy
  • Epidemiology
  • Public health
  • Health informatics

Data availability statement

Data may be obtained from a third party and are not publicly available. All data relevant to the study are included in the article or uploaded as online supplemental information. All data in the present study are available online with the exception of the Optum Clinformatics Data Mart.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data availability statement

Data may be obtained from a third party and are not publicly available. All data relevant to the study are included in the article or uploaded as online supplemental information. All data in the present study are available online with the exception of the Optum Clinformatics Data Mart.

View Full Text

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • MSM and MC are joint first authors.

  • MSM and MC contributed equally.

  • Contributors MSM and SR conceptualised the research question and methodology. MC curated the data and conducted the analysis. All authors contributed to the final draft in writing, reviewing, and editing. SR, as guarantor, accepts full responsibility for the work, had access to the data, and controlled the decision to publish.

  • Funding This project is supported in part through the NIH Director’s New Innovator Award DP2-MD012722. MC is supported by the T32HS026128 grant from the Agency for Healthcare Research and Quality.

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.