Article Text

Original research
Approaches for combining primary care electronic health record data from multiple sources: a systematic review of observational studies
  1. Daniel Dedman1,2,
  2. Melissa Cabecinha3,
  3. Rachael Williams1,
  4. Stephen J W Evans4,
  5. Krishnan Bhaskaran2,
  6. Ian J Douglas2
  1. 1Clinical Practice Research Datalink, Medicines and Healthcare Products Regulatory Agency, London, UK
  2. 2Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
  3. 3Research Department of Primary Care and Population Health, University College London, London, UK
  4. 4Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
  1. Correspondence to Daniel Dedman; Daniel.Dedman{at}


Objective To identify observational studies which used data from more than one primary care electronic health record (EHR) database, and summarise key characteristics including: objective and rationale for using multiple data sources; methods used to manage, analyse and (where applicable) combine data; and approaches used to assess and report heterogeneity between data sources.

Design A systematic review of published studies.

Data sources Pubmed and Embase databases were searched using list of named primary care EHR databases; supplementary hand searches of reference list of studies were retained after initial screening.

Study selection Observational studies published between January 2000 and May 2018 were selected, which included at least two different primary care EHR databases.

Results 6054 studies were identified from database and hand searches, and 109 were included in the final review, the majority published between 2014 and 2018. Included studies used 38 different primary care EHR data sources. Forty-seven studies (44%) were descriptive or methodological. Of 62 analytical studies, 22 (36%) presented separate results from each database, with no attempt to combine them; 29 (48%) combined individual patient data in a one-stage meta-analysis and 21 (34%) combined estimates from each database using two-stage meta-analysis. Discussion and exploration of heterogeneity was inconsistent across studies.

Conclusions Comparing patterns and trends in different populations, or in different primary care EHR databases from the same populations, is important and a common objective for multi-database studies. When combining results from several databases using meta-analysis, provision of separate results from each database is helpful for interpretation. We found that these were often missing, particularly for studies using one-stage approaches, which also often lacked details of any statistical adjustment for heterogeneity and/or clustering. For two-stage meta-analysis, a clear rationale should be provided for choice of fixed effect and/or random effects or other models.

  • epidemiology
  • statistics & research methods
  • public health

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors I confirm that all authors made substantial contributions to the study (detailed below), agree with and give final approval of the content of the current version, and agree to be accountable for all aspects of the work and for resolving questions related to any part of the work if they arise. DD: study concept and design, including search strategy; data extraction proformas and study database design; screening of titles and abstracts, and data extraction; analysis; interpretation of results; and drafting of manuscript. MC: screening of titles and abstracts, and data extraction; interpretation of results; critical review of manuscript and approval of final version. RW, KB and ID: study concept and design, including search strategy; interpretation of results; critical review of manuscript and approval of final version. SE: interpretation of results, critical review of manuscript and approval of final version.

  • Funding This research was conducted as part of a postgraduate doctoral degree funded by the Clinical Practice Research Datalink. KB holds a Sir Henry Dale Fellowship funded by Wellcome and the Royal Society (grant number 107731/Z/15/Z).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available upon reasonable request. The study database and data-extraction proformas are available on request from DD.