Article Text

Download PDFPDF

Protocol for a scoping review of post-trial extensions of randomised controlled trials using individually linked administrative and registry data
  1. Tiffany Fitzpatrick1,
  2. Laure Perrier2,
  3. Andrea C Tricco3,4,
  4. Sharon E Straus3,
  5. Peter Jüni5,6,7,
  6. Merrick Zwarenstein8,9,
  7. Lisa M Lix10,11,
  8. Mark Smith11,
  9. Laura C Rosella4,9,12,
  10. David A Henry4,7,9
  1. 1Ontario Strategy for Patient-Oriented Research (SPOR) SUPPORT Unit, Toronto, Ontario, Canada
  2. 2Gerstein Science Information Centre, University of Toronto Libraries, Toronto, Ontario, Canada
  3. 3Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada
  4. 4Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
  5. 5Applied Health Research Centre (AHRC), Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada
  6. 6Department of Medicine, University of Toronto, Toronto, Ontario, Canada
  7. 7Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
  8. 8Department of Family Medicine, Schulich School of Medicine, University of Western Ontario, London, Ontario, Canada
  9. 9Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
  10. 10Department of Community Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
  11. 11Manitoba Centre for Health Policy, University of Manitoba, Winnipeg, Manitoba, Canada
  12. 12Public Health Ontario, Toronto, Ontario, Canada
  1. Correspondence to David Henry; David.Henry{at}


Introduction Well-conducted randomised controlled trials (RCTs) provide the least biased estimates of intervention effects. However, RCTs are costly and time-consuming to perform and long-term follow-up of participants may be hampered by lost contacts and financial constraints. Advances in computing and population-based registries have created new possibilities for increasing the value of RCTs by post-trial extension using linkage to routinely collected administrative/registry data in order to determine long-term interventional effects. There have been recent important examples, including 20+ years follow-up studies of trials of pravastatin and mammography. Despite the potential value of post-trial extension, there has been no systematic study of this literature. This scoping review aims to characterise published post-trial extension studies, assess their value, and identify any potential challenges associated with this approach.

Methods and analysis This review will use the recommended methods for scoping reviews. We will search MEDLINE, EMBASE and the Cochrane Central Register of Controlled Trials. A draft search strategy is included in this protocol. Review of titles and abstracts, full texts of potentially eligible studies and data/information extraction will be conducted independently by pairs of investigators. Eligible studies will be RCTs that investigated healthcare interventions that were extended by individual linkage to administrative/registry/electronic medical records data after the completion of the planned follow-up period. Information concerning the original trial, characteristics of the extension study, any clinical, policy or ethical implications and methodological or practical challenges will be collected using standardised forms.

Ethics and dissemination As this study uses secondary data, and does not include person-level data, ethics approval is not required. We aim to disseminate these findings through journals and conferences targeting triallists and researchers involved in health data linkage. We aim to produce guidance for investigators on the conduct of post-trial extensions using routinely collected data.


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

View Full Text

Statistics from

Strengths and limitations of this study

  • Provides the first review of post-trial extension studies, specifically those using secondary data sources, to identify long-term participant outcomes.

  • Aims to produce an authoritative summary of progress in this research area, propose a common language/terminology for the field and identify research gaps and challenges with post-trial linkage.

  • Provides guidance to those planning post-trial extension work, including optimal key terms for maximising research dissemination, and advice on logistical, methodological and ethical considerations.

  • The present lack of indexing terms for these studies increases the possibility that some studies may be missed by our search criteria.

  • The heterogeneity of content areas covered by this methodology may provide challenges in synthesising the results into succinct conclusions or recommendations.


Randomised controlled trials (RCTs) are widely considered the gold standard for generating clinical evidence. High-quality RCTs require considerable human and financial resources and, in some cases may require commitments spanning many years. Despite these investments, follow-up of patients typically ceases after the trial is completed as planned and the results have been released to inform clinical practice or regulatory decisions. At this point it can be said that a trial becomes ‘dormant’, as the data are no longer used for new research.

However, dormant trials offer considerable potential for new research, including reanalysis to confirm primary trial outcomes, analysis of new clinical outcomes and follow-up of participants to assess the potential long-term benefits and harms of interventions.1 According to a scoping review by Ebrahim et al,2 reanalysis appears to be a fairly uncommon event. Their review identified only 37 examples of trial reanalyses.2 However, they did not look at additional uses of study data, such as post-trial extensions to assess long-term outcomes. Given growing demand for transparency in research, open data and data sharing, and tight fiscal budgets for research, the reanalysis, validation and reactivation of dormant trials is an attractive strategy to maximise the value gained from large investments in RCTs.

Arguably, the Diabetes Control and Complication Trial (DCCT) study group is a trailbreaker in the reuse of trial data and post-trial follow-up.3 Since the completion of their original trial in the early 1990s, the group has been involved in over 200 analyses using data from the original DCCT or its follow-up study, the Epidemiology of Diabetes Interventions and Complications (EDIC).4 Likewise, in recent years there have been several high-profile examples of trial extension studies, where authors follow participants for decades after trial closure using linkage to secondary data sources to assess potential long-term effects. For example, the 25-year follow-up of the Canadian National Breast Cancer Screening Study (CNBCSS) was published in 2014. Long-term follow-up was achieved by linkage to population-based cancer registry and vital statistics data.5 This study found that annual mammography in women aged 40–59 did not reduce mortality from breast cancer when compared with usual care.5 Another notable example is the 20-year follow-up of the West of Scotland Coronary Prevention Study (WOSCOPS), which showed a lifetime benefit of improved survival and reduced cardiovascular disease following only 5 years of statin therapy.6

These are only two prominent recent examples of trials that have been extended beyond the original planned term using secondary linked data sources, both of which provided important clinical findings. It is not clear, however, how many trials have been extended in this way. On the surface, this appears to be a fairly uncommon practice, particularly in relation to the number of high-impact RCTs that are completed each year. While some trials plan to study outcomes using secondary data sources as part of the original protocol, and some are designed as ‘registry trials’, few seem to have been extended as an independent exercise after the original study was completed.

There are several reasons why an investigator would be interested in conducting a post-trial extension study. Most obvious would be the desire to estimate the potential long-term benefits or adverse effects of an intervention.7 Specifically, it is possible to map the accrual or loss of benefits over time with greater certainty. For instance, the extension of the WOSCOPS revealed a ‘legacy’ effect of a relatively short period of statin treatment, still visible after 20 years.6 Post-trial extensions could also provide longer observations regarding the natural history of a disease among a well-characterised control group, the persistence of adherence to therapy, and the patterns and effects of co-interventions introduced after trials end.

While highlighting the potential benefits of post-trial extension, we acknowledge that it is uncertain whether the efforts and potential challenges will always be worthwhile, particularly in terms of novel findings that have important clinical or policy relevance. But the feasibility and low cost makes these exercises attractive. Advances in the availability of linkable population data sets and analytical capacity make this option more feasible than in the past, so we believe it will become an increasingly common activity. The relatively low cost of post-trial linkage studies is illustrated by WOSCOPS. The original trial cost ∼£20 million to complete, whereas the follow-up study was conducted for only £15 000 ($19 000; €17 500).8

It will be important for researchers to consider potential methodological, logistical and ethical challenges associated with post-trial extension studies. For instance, post-trial extensions are essentially observational, with management decisions influenced by practice guidelines rather than trial protocols. This introduces a potential for unmeasured time-varying confounding with threats to internal validity. This will be especially important in the case of interventions that are found to be effective and are rapidly adopted by control group members after trial termination. This could attenuate any long-term effects identified by the follow-up study, especially if cross-overs cannot be identified in the available secondary data sources and cannot be accounted for the analysis. Logistical challenges will include unavailability of data, storage in paper records and lack of linkable fields in the research data, for instance when records have been de-identified, but the linkage key is no longer available. Finally, it is unclear how research ethics boards will respond to requests to approve linkage of individual patient data when this was not specified in the original clinical trial consent form.

In light of these considerations the principal objective of this scoping review is to quantify and characterise published post-trial extension studies, assess their value, and identify potential research gaps, logistical and ethical challenges associated with this approach. On completion of this review, we hope to report on several issues:

  1. The number of published post-trial extension studies that have been completed using health administrative or registry data;

  2. The types of outcomes assessed in these studies and how well they were detected (ie, the type(s) of data and if there was any validation process);

  3. The main challenges associated with conducting post-trial extensions (eg, achieving data linkage, obtaining ethics approval, obtaining agreement from the original investigators, analytical challenges);

  4. The extent to which the original published trial findings were altered with extended follow-up;

  5. The costs and time involved in performing post-trial extensions; and

  6. The likely clinical and policy implications of new information generated through post-trial extensions.


We will undertake a scoping review to examine the literature covering post-trial extensions of RCTs using linked secondary data sources, such as health administrative or population registry data and electronic medical records, to ascertain long-term clinical outcomes of participants. The scoping review methodology was selected to map the literature in this emerging area.6 ,9–10 Importantly, a scoping review will enable a broad examination of the nature, extent and range of the research activity and help identify gaps in the current literature. Specifically, we will follow the standard methodological guidelines for conducting scoping studies specifically those set out by the Joanna Briggs Institute handbook, and will report according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).9–11

Our protocol draws on the framework outlined by Arksey and O'Malley,10 which outlines the following six-stage approach to conducting a scoping review: (1) developing the research question; (2) identifying all relevant studies; (3) selection of studies; (4) data extraction; (5) summary and reporting of results; and (6) consultation.

Study selection

Search strategy

The draft search strategy (for MEDLINE) was developed by an information specialist (LP), who was instrumental in maximising its sensitivity and specificity and ensuring its feasibility. It will be assessed by a second information scientist according to the Peer Review of Electronic Search Strategies (PRESS) checklist12 and will be refined as dictated by early results. The search strategy includes MeSH and text words related to RCTs , post-trial and long-term follow-up studies, and data linkage (including specific database types). Notably, since this is an emerging area of interest with little specialisation, there are currently no dedicated indexing terms for this study type. Further, there is little standardisation in how this study type is described. As such, the search strategy relies on a variety of non-standard search terms; for example, ‘X-year follow-up’.

The search strategy will be replicated in the EMBASE and the Cochrane Central Register of Controlled Trials databases. The search will be limited to English language articles; no other restrictions will be placed on the search strategy.

We will search for additional articles by using the Related Articles feature in PubMed for articles included in the scoping review and those deemed highly relevant by the core research team. This strategy was selected because previous work has shown that the Related Articles feature in PubMed can identify relevant studies with a relatively low screening burden of new records per review.13 We decided to do this additional search as our initial scan identified several relevant studies that did not explicitly refer to data linkage in their abstract and would subsequently be missed by our search strategy. Finally, we will leverage the personal libraries and content knowledge of our clinical and epidemiological expert authors to identify any additional studies for inclusion.

Our MEDLINE search strategy can be found in online supplementary appendix A. On completion, the searches from each of the above databases will be documented and references imported into a reference management software program, where duplicates will be removed. All references will be stored and shared using a reference management software program (eg, Reference Manager, V.12).

Study selection

Prior to screening, we will conduct a calibration exercise with a sample of 50 retrieved citations to assess the reliability of our level 1 screening of title and abstracts. We will aim for agreement of at least 95% before beginning title and abstract screening. Subsequently, the abstracts and titles of all retrieved references will be independently reviewed by two authors to identify potentially eligible studies for inclusion. Disagreements will be put forth for full-text review. The full text of all eligible citations will also be examined in detail by two independent reviewers. In cases of disagreement, consensus will be reached through discussion or be resolved by a third reviewer.

Studies will be eligible for inclusion if they satisfy all of the following criteria:

  1. Population: any definable (human) patient population, including both children and adult populations. All countries and subpopulations will be included;

  2. Intervention: any health-related intervention applied at the individual level, such as pharmaceutical interventions, lifestyle modification, screening practices, etc;

  3. Outcomes (primary or secondary): any health-related event, such as onset of disease, specific complications, health system outcome or death that are considered likely to be detectable in administrative/registry data;

  4. Design: limited to clinical trials that involve either individual or cluster randomisation of individuals to an intervention or control group. We will include randomised cross-over designs, adaptive designs and factorial designs. We will not include quasi-experimental designs, self-controlled studies or those using interrupted time series analysis outside of a RCT;

  5. Extension methods: follow-up of trial participants using secondary data sources (ie, data not collected in the course of the original trial, data identified from existing sources such as health administrative data electronic medical records or vital statistics) where the extension was not originally planned (eg, where linkage to vital statistics to identify deaths occurring outside of the trial period was not explicitly mentioned in the original trial publication);

  6. Timeframe: any;

  7. Publication: any scientific reports of trial extensions, including abstracts.

We will exclude studies with the following characteristics:

  1. Extension of trials occurring within the year following the completion of the original trial as, arguably, trial extension was most likely within the authors' original intentions (even where not explicitly indicated); for example, additional 6 months of follow-up of participants immediately following their completion of the original trial;

  2. Non-English language studies.

Data extraction and management

Two review authors will extract data from included studies using Covidence, systematic review software developed in partnership with the Cochrane Collaboration. Disagreements will be resolved by discussion between the two review authors until a consensus is reached or will be resolved by a third author. Data and other information from studies will be extracted in a predefined form following the framework outlined in table 1. The extraction form will be pilot-tested on a random sample of five selected studies and refined accordingly.

Table 1

Data extraction framework

We recognise that several of the outcomes of interest will not be determined by examination of numeric results (see table 1) and will require a degree of interpretation and judgement. We will perform a textual analysis to identify themes that correspond to these outcomes in a sample of the extended trials. We will map key words to these themes in the pilot sample of extended trials. We will use electronic searching to find the key words in all of the studies, but recognise that this will identify only potentially relevant sections of text and interpretation will still be needed. The number of studies will not be large, so we believe this will be manageable.

We will attempt to contact authors for additional details when not otherwise reported in the published manuscript. Similarly, we will search the published literature databases listed above and the grey literature to identify other potential sources of relevant information related to the included post-trial linkage studies; for example, conference proceedings where qualitative information, such as any practical challenges associated with the linkage, or the costs associated with the study, may be reported.

Data analysis and synthesis

Given the nature of this review, we will report only descriptive information to provide a narrative synthesis of the findings from the included studies in this review. Specifically, our methodology will include quantitative (eg, proportions) analyses to describe the types of trials for which extension studies have been conducted and summarise any practice implications or methodological challenges. For example, we will summarise whether the original findings of the trial were altered by additional follow-up and if the post-trial authors were able to report on new clinical outcomes identified through the linked administrative data sources.


To the best of our knowledge, this will be the first review of post-trial extension studies, specifically those using secondary population-based sources of outcome data, to identify the long-term outcomes in trial participants. The findings of this scoping review will help describe the growing field of post-trial extension studies using linked administrative and registry data and inform recommendations to support the further development of this field.


View Abstract


  • Twitter Follow Laura Rosella @LauraCRosella

  • Contributors TF and DAH conceptualised the study and drafted the protocol. LP developed the detailed search strategy and critically revised the manuscript. ACT and SES provided expert scoping review methodological guidance. PJ, MZ, LML, MS and LCR offered conceptual, methodological and content feedback. All authors have read and approved the final manuscript.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement In the event that will be any unpublished summary data created from this scoping review, any additional information will be made available to those interest by request to the corresponding author.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.