Objectives With the increasing interest in personalised medicine, the use of subgroup analyses is likely to increase. Subgroup analyses are challenging and often misused, possibly leading to false interpretations of the effect. It remains unclear to what extent key organisations warn for such pitfalls and translate current methodological research to detect these effects into research guidelines. The aim of this scoping review is to determine and evaluate the current guidance used by organisations for exploring, confirming and interpreting subgroup effects.
Design Scoping review.
Eligibility criteria We identified four types of key stakeholder organisations: industry, health technology assessment organisations (HTA), academic/non-profit research organisations and regulatory bodies. After literature search and expert consultation, we identified international and national organisations of each type. For each organisation that was identified, we searched for official research guidance documents and contacted the organisation for additional guidance.
Results Twenty-seven (45%) of the 60 organisations that we included had relevant research guidance documents. We observed large differences between organisation types: 18% (n=2) of the industry organisations, 64% (n=9) of the HTA organisations, 38% (n=8) of academic/non-profit research organisations and 57% (n=8) of regulatory bodies provided guidance documents. The majority of the documents (n=33, 63%) mentioned one or more challenges in subgroup analyses, such as false positive findings or ecological bias with variations across the organisation types. Statistical recommendations were less common (n=19, 37%) and often limited to a formal test of interaction.
Conclusions Almost half of the organisations included in this scoping review provided guidance on subgroup effect research in their guidelines. However, there were large differences between organisations in the amount and level of detail of their guidance. Effort is required to translate and integrate research findings on subgroup analysis to practical guidelines for decision making and to reduce the differences between organisations and organisation types.
- scoping review
- personalised medicine
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
It is the first overview of research guidance on exploring, confirming and interpreting subgroup effects by key organisations.
All organisations included in the review were contacted in an attempt to receive the most up-to-date guidance documents.
Although organisations were included from different continents, the search was limited to English guidelines or translated documents.
Due to the wide variety of organisations, a systematic review of all organisations and documents was not feasible.
Individuals may respond to medical interventions in different ways due to genetic, environmental, demographic, disease, behavioural, or comorbidity variations that strengthen or weaken the outcome of a treatment. Patients with similar variations can be grouped in so-called subgroups of patients that respond better or worse to medical interventions. In order to tailor medical treatment to an individual, personalised medicine aims to provide patients with the most relevant and optimal treatment, possibly resulting in better patient outcomes and/or lower medical costs.1 2 Interest in subgroup effects is likely to grow as more information becomes available from biomarkers, companion diagnostics, high-resolution imaging, function tests, and genetic information. However, investigating subgroup effects can be complex and challenging as there is a natural tension between the urge to detect clinically relevant subgroup effects and avoiding incorrect claims.3 4
Given both the complexity and relevance of investigating subgroup effects, many papers have been published in the scientific literature on this topic, both methodological and statistical, with new methods to adequately detect subgroups.5–7 Although reaching a consensus regarding the optimal method can be challenging,8 several published reviews tried to provide guidance by focusing either on the current practice of subgroup analyses, giving an overview of (competing) statistical approaches or providing direction on best practices.3 9 10 However, it remains unclear to what extent key organisations (stakeholders with respect to the issue of subgroup analyses) translate recommendations from methodological research into practical guidance. These key organisations may be interested in subgroup effects as they either have a regulatory function, are part of an academic organisation, produce medicines and/or other treatments or generate evidence on the (cost-)effectiveness of treatments used in decision making required to determine whether treatments may be reimbursed or introduced on the market. We assume that these organisations provide guidance on subgroup research for their members, to assist them in this kind of analyses or research.
To identify and evaluate the nature of the guidance on exploring, confirming and interpreting subgroup effects during these steps, we have performed a scoping review. A scoping review is a useful instrument that can be used to generate an overview on a broad topic to highlight the nature of what is being discussed, the overlap in information and what is not discussed.11 The aim of this scoping review is to discover how key organisations use findings and recommendations from methodological research in their research guidance on exploring, confirming and interpreting subgroup effects. We focus specifically on their definition of subgroup effects, level of detail in the guidance, the inclusion of factors affecting the credibility of subgroup effects and the promoting or discouraging of specific statistical approaches.
We investigated how subgroup effect research has been included in research guidelines written by a wide range of key organisations worldwide. Since numerous organisations exist worldwide, it was not feasible to analyse all of them. Therefore, we restricted the inclusion to the major types of stakeholders in this field (see criteria below) and adopted a scoping review design.2 11 We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist in reporting our results.12
Organisations eligibility criteria
We identified four main types of organisations that represent major stakeholders with a clear interest in subgroup effects of medical treatment: (1) industry (developing the medical treatment), (2) health technology assessment (HTA) organisations (evaluating the cost-effectiveness), (3) academic/non-profit research organisations (evaluating effectiveness) and (4) regulatory bodies (health market authorisation and reimbursement). Our goal was to capture all organisations that are involved throughout the lifecycle of a medical treatment from concept to implementation.
We selected a limited number of organisations focusing on the following regions: global (worldwide organisations), North America (USA and Canada), Europe (UK, Germany and The Netherlands) and Asia-Pacific (Japan and Australia). Other regions were not considered because of their limited number of HTA organisations/regulatory bodies, due to language restrictions and to limit the scope of this review.13
Patient and public involvement
Patients and/or public were not involved.
Initially, we adopted a stakeholder list of Makady et al describing 23 key organisations.14 We then gradually expanded that list of potential organisations by contacting methodology experts in the field of subgroup analysis to identify specific organisations within these organisation types and completed the list by means of a literature search. The methodological experts were identified via the Cochrane collaboration methods groups, key opinion papers on this topic and HTA organisations. The key opinion papers are listed in online supplementary S1. Next, we searched for similar organisations across regions (eg, Food and Drug Administration (FDA) in USA vs European Medicines Agency (EMA) in Europe). We excluded organisations if: (1) they work in specific medical areas (eg, food safety, surgery, cardiology, oncology or infectious diseases); (2) their mission was not related to the purpose of this review, (3) they only collect guidelines from other organisations, rather than produce their own or (4) they only had non-English documents. For each organisation, we used the following strategy to determine whether they had official documents in which they provide research guidance: (1) using the search engine of their official website, (2) Google, (3) PubMed and (4) citations. Documents were considered eligible if they contained methodological research guidance and included ‘subgroup’, or terms analogous to the definition of subgroup research: investigate treatment effect according to baseline characteristics, covariate treatment interaction, heterogeneity, effect modifying factors, split the population into subgroup or explicitly mention particular characteristics should be analysed or presented separately (eg, gender). The research strategy is included in online supplementary S2.
Additionally, all organisations were contacted by email or contact form, asking whether they have any research guidance on subgroup effects or an updated version of the ones we already found. If there was no response after one reminder email, we discontinued the search for that specific organisation.
A guidance document was considered to be official when it clearly named the organisation, displayed the logo prominently or included statements about publication rights. We excluded books, talks or lectures written by individual author(s) even when they worked or conducted projects for the organisation. When an organisation published multiple versions of the same document, we only included the most recent version.
Out of the included documents, we extracted data on the following topics: general information about the organisation, purpose and context of the guidance documents, definitions and concepts of subgroup analysis challenges and recommendations for and against specific methods. Data extraction was performed in two stages: first, all documents were scanned to detect whether they mention the predefined topics. Second, we analysed the documents in more detail. Any information on subgroup research was extracted. Subsequently, the 10 credibility criteria described by Sun et al to investigate the credibility of authors’ claims of subgroup effects were used to determine if and how subgroup analysis credibility criteria were reported.5 These criteria were used because they were up to date, widely accepted, well defined, and the multiple criteria enabled us to compare the documents. Although these criteria were developed to assess the credibility of subgroup effects in randomised controlled trials, we used these to determine whether these topics were mentioned in any document, including documents aimed at non-randomised trials or systematic reviews. The criteria were based on previous published subgroup credibility criteria that were focused on subgroup analyses in general and are thus also applicable to other types of data.15 16 The data extraction method, including the definitions of general and detailed information, are described in online supplementary S3.
Using descriptive statistics, similarities and differences between the types of organisations and regions were analysed. The frequency of similar recommendations was analysed and compared. We analysed the frequency at which the credibility criteria for subgroup analysis were mentioned and calculated the average number of criteria mentioned in the documents for the four organisation types.5
From the list of organisations of Makady et al and initially received suggestions, 40 organisations were included. Thirty-four organisations were added after expert consultation and inclusion of counterpart organisations. In total, after removing organisations outside the scope of our search strategy, we found 60 organisations operating globally or in one of the prespecified regions or countries that matched our inclusion criteria (figures 1 and 2).
In the end, we included 11 industry organisations, 14 HTA organisations, 21 academic/non-profit research organisations and 14 regulatory bodies divided over global organisations (23%), Europe (32%), North America (27%) and Asia-Pacific (18%). Out of the 60 organisations, 27 (45%) had relevant research guidance documents and 26 (43%) responded to our inquiry. Five organisations responded but stated that they had no documents and six organisations of whom we received no response had documents online available. The availability of documents differed among organisation types, with 18%, 64%, 38% and 57% for industry, HTA, academic/non-profit and regulatory organisations, respectively.
Looking at regional differences, Europe and North America had relevant research guidance documents in half of the organisations we included, with 53% and 50% of the organisations, respectively, while Asia-Pacific had research guidance in 27% of the organisations. In total, we retrieved two documents from industry organisations,17 18 19 documents from HTA organisations,19–37 12 documents from academic organisations38–49 and 19 documents from regulatory organisations.50–68 Eleven documents were excluded because they were not in line with our research objective, including presentations, advertisement guidelines or political documents. The list of the included organisations and documents can be found in online supplementary S4.
The general characteristics of the included documents are described in table 1. The mean and median year of publication of the documents were 2011 and 2014 (range: 1988–2018), respectively. HTA and regulatory organisations had the most documents (n=19) and industry organisations the least (n=2). A separate section on subgroup research was present in 0%, 42%, 17% and 42% of the industry, HTA, academic and regulatory documents, respectively. Both, a definition of subgroups and heterogeneity, appeared more often when a separate subgroup section was available. Regulatory bodies included the most clinical trial guidance of the organisation types (n=19, 37%) but had no guidance on non-randomised studies. In total, out of the 52 documents, five were aimed at non-randomised studies (10%), three by HTA and 2 by academic organisations. HTA organisations were the only type of organisation that had documents for cost-effectiveness studies (n=10, 19%). Both included documents from industry organisations were aimed at clinical trials.
Types of subgroup effects
The term subgroup effects can be related to a range of different concepts. Therefore, organisations typically distinguish between different types of subgroup effects, such as subgroup effects from exploratory analysis, subgroup effects from confirmatory analysis, qualitative effects and quantitative effects.10 69 These concepts are elucidated in box 1.
Summary of types of subgroup effects and challenges in subgroup analyses
Types of subgroup effects:
Confirmatory: subgroup effects from confirmatory analyses are subgroup effects that have been detected in previous trials and require confirmation of the effect in another population that is sufficiently powered to detect the subgroup, increasing credibility of the subgroup effect.
Exploratory: subgroup effects from exploratory analyses are subgroup effects that are detected during post hoc analysis in a trial or review, without prior knowledge of the effect or predefining the direction of the subgroup effect.
Qualitative subgroup effects: treatment effects that are beneficial to one subgroup but harmful in another.
Quantitative subgroup effects: difference in size of treatment effect between subgroups but not the direction. For example, a treatment can be beneficial across different age ranges but to a larger extent for the elderly (as compared to adolescents).
Challenges in subgroup analyses:
Type I error (false positive): statistical tests for detecting subgroup effects typically involve a null hypothesis that assumes absence of treatment–covariate interaction. The significance level is the probability of rejecting the null hypothesis when it is true. For instance, when the significance level is set to 0.05 (5%), then, if there is indeed no treatment-covariate interaction, there is a 5% chance of incorrectly rejecting the null hypothesis. This probability to make a type I error can be decreased by lowering the significance level, which also decreases the power to detect a genuine treatment-covariate interaction.
Type II error (false negative): a type II error occurs when the null hypothesis is false (ie, treatment–covariate interaction is present) but is not rejected. The probability to make a type II error is related to the power and can be reduced by increasing the significance level or the sample size.
Multiplicity: when multiple (related) hypothesis tests are performed in a single dataset, the probability of at least one type I error increases. This is a common problem when assessing effect modification by more than 1 covariate. If the study is not properly powered to detect treatment–covariate interaction, it can also increase the probability of making at least one type II error (failure to reject a false null hypothesis: a false negative). Multiplicity problems can be resolved by statistical multiplicity correction (eg, based on family-wise error rates) or reducing the number of (exploratory) subgroup analyses (which need to be prespecified).
Power: the power is the probability that an effect is found assuming there is an effect. When a study is not powered for subgroup analyses, splitting the study population into different subgroups could result in a lack of power. This means that the risk of a type II error increases (false negative finding).
Ecological bias: ecological bias may occur in systematic reviews when the presence of treatment–covariate interaction is assessed across studies (rather than within studies). For instance, meta-regression can be used to assess whether published treatment effect estimates are associated with a particular population characteristic (eg, the proportion of males in each trial). Although the presence of such trial-level association is commonly treated as evidence of treatment–covariate interaction, this is often misleading. For instance, even if gender does not have any effect on treatment efficacy, a trial-level association between treatment effect and gender may still appear when trials that included relatively more males adopted a higher treatment dosage for the active comparator.
Subgroup effects from exploratory analyses were mentioned most often by the organisations, in 23 (44%) of all documents, divided in 0%, 58%, 25% and 47% for industry, HTA, academic and regulatory organisations, respectively. Guidance for exploratory subgroup analysis mostly advised to compare different treatment effects in gender or age.
Subgroup effects from confirmatory subgroup analyses were described in 13 documents (25%), divided in 0%, 42%, 17% and 16% for industry, HTA, academic and regulatory organisations. One document suggested to report the 95% prediction interval in confirmatory subgroup analysis next to the regular 95% CI in order to reflect the variation of the treatment/subgroup effect in a different setting.
Qualitative subgroup effects and quantitative subgroup effects were rarely mentioned. Only HTA organisations mentioned both concepts in three different documents (14% of all HTA guidelines, 6% of all documents).
Regarding the actual clinical relevance of the subgroup effect, no document in our review highlighted the importance of reporting absolute differences between subgroups. That is, a significant test result (to detect a subgroup) may not lead to a change in treatment decisions, whereas in the absence of a significant result a clinically relevant subgroup may still exist.
Challenges in subgroup analysis
Performing adequate subgroup analysis can be difficult due to several statistical and methodological challenges. Informing researchers and other stakeholders about the challenges in subgroup analysis can be a way to improve the outcome of the analyses. The most common challenges are: type I errors, multiplicity problems, lack of power and ecological bias (explained in box 1).
Typically, these challenges were not regularly mentioned or discussed. Overall, the risk of a false positive (type I error) and multiple testing problem (multiplicity) were most frequently mentioned, by 12 (23%) and 13 (25%) documents, respectively. Some documents suggest that limited testing or multiplicity adjustment is required. Next to the multiplicity and false positive issue, five documents (10%) mentioned the lack of power to detect real subgroup effects due to small sample sizes, increasing the risk of false negatives. Ecological bias was mentioned in three documents (6%). Lack of power and ecological bias was only mentioned by HTA and academic organisations.
In 1992, years before the publication of most documents included in our review, Oxman and Guyatt suggested seven criteria to assess subgroup effects.16 Sun et al updated the list resulting in 10 criteria for randomised trials to evaluate the credibility of subgroup analyses.5 These include: (1) hypothesis specified a priori, (2) subgroup variable a baseline characteristic, (3) subgroup stratification factor at randomisation, (4) independent interactions, (5) consistent with previous studies, (6) consistent across outcomes, (7) small number of tests, (8) significant p value of interaction test, (9) indirect evidence to support apparent effect and (10) direction of subgroup effect prespecified. We assessed how the documents reflect the credibility criteria. In total, 34 documents (62%) mentioned one or more credibility criteria. On average, HTA documents included 4.1 criteria, while the averages for industry, academic and regulatory organisation documents were lower with 1.5, 2.3 and 1.8 criteria, respectively (as shown in figure 3).
Of the 10 criteria, ‘Hypothesis specified a priori’ and ‘Subgroup variable a baseline characteristic’ were addressed most frequently, in 33 (63%) and 29 (56%) of the documents, respectively. ‘Direction of subgroup effect prespecified’ was mentioned the least (six times, 12%).
Statistical recommendations were often lacking, and when present, not up-to-date with current standards. The recommendations were limited to a formal test of interaction using a threshold of p < 0.05 (16 documents (31%)) and/or stratified analyses (nine documents (17%)). One document described the option to use Bayesian models in a subgroup analysis.30 In total, 37% of the documents provided statistical recommendations. There were no recommendations against certain statistical tests or methods.
There are several checklists/guidelines for reporting trials or reviews created to strengthen the reporting of medical studies. These checklists are often combined with a statement that includes best practices for research publication, including optimal methods for subgroup analysis reporting.
In our search, we found that 13 documents (25%) recommended the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) checklist for systematic reviews, 12 documents (23%) recommended the Consolidated Standards of Reporting Trials (CONSORT) checklist for randomised trials, 10 documents (19%) recommended the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist for observational studies and the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement for economic evaluations was mentioned in two documents (4%). Documents that described a checklist often described more than one; 29 documents (56%) did not recommend any checklist or guidance for publication.
Individual patient data
Like we mentioned before, detecting patient subgroups can be difficult due to the limited number of patients that have been included in studies. To overcome this problem, researchers may perform a review to identify and analyse the individual patient data from multiple trials. This approach is also known an individual patient data meta-analysis (IPDMA).70 Of the 52 documents, five (9.6%) describe the IPDMA as a method to detect subgroups, and only one document provided methodological guidance on IPDMA.
This scoping review provides insight into the adoption of methodological subgroup effects research in guidelines, both by international and national key organisations. Investigating subgroup effects can be challenging and resulted in the publication of several non-reproducible subgroup effects.71–73 With the growing discussion on adequate subgroup analysis,4 10 proper guidance could reduce the number of non-reproducible findings and improve individual patient care. However, only 45% of the 60 organisations included in our review had some guidance documents on subgroup analysis research, with large differences between organisation types. Overall, HTA organisations provided most documents with detailed and up-to-date methodological information, while industry organisations provided hardly any guidance documents, possibly because these organisations use research guidance from other organisations (eg, regulatory organisations). Organisations that included subgroup analysis in their guidelines often only mention subgroup research as a tool to evaluate heterogeneity between (sub)groups.
Our assumption was that most organisations would include some guidance on how to perform credible subgroup analysis, both for exploratory and confirmatory reasons. However, most documents lacked detail, especially regarding statistical and methodological recommendations. Given that research in personalised medicine is growing, we hope that key organisations will update and refine their guidance and at least include the credibility criteria of Sun et al.5
Multiple sources for research guidance can be useful, however, it can also increase confusion if the suggestions for subgroup research differ between documents and/or key organisations and do not include the latest recommendations. More consistency in recommendations would help to reduce reproducibility issues and likely improve reporting standards. Perhaps we should strive to a more coherent set of guidance distinguishing between different main purposes for performing subgroup analyses.
In our search for official guidance documents, some organisations had papers that stated that the findings and conclusions in these documents were those of the authors and that the findings and conclusions did not necessarily represent the views of the key organisation. Still, we included those documents in our review because they were located on the official website and often were presented with an official logo of the organisation/company.
Although most documents were written in the last 10 years, a delay can exist between methodological publications and the implementation of this research into guidelines. However, methodological and statistical difficulties in subgroup analyses have been reported over the last three decades.16 73 74
Strengths and limitations
This is the first scoping review targeted at guidance on exploring, confirming and interpreting subgroup effects by key organisations. The scoping review is a useful instrument that can be used to generate an overview of available guidance documents by different organisations in different regions of the world. As recommended by Levac et al, we kept the research question broad and simultaneously considered the different concepts of interest.75 To balance between feasibility and comprehensiveness, we arranged regular meetings with methodological experts to discuss and justify decisions and, if needed, adapt the search strategy.
Some potential limitations should also be discussed. First, our review was targeted at research guidance documents. To search for these documents, we primarily used the official website of the concerning organisations. This might have limited the number of documents in our review to documents that were publicly available on their website. To overcome this, we tried to contact all organisations in an attempt to receive the most up-to-date documents. We limited the number of reminder emails to two to ensure the practicality of the scoping review. Second, out of the 26 organisations that responded to our inquiry, 21 (81%) had documents. This might indicate that organisations without documents might be less likely to respond to our email or that they had documents but not openly available and were not willing to respond. This could lead to bias as we marked these latter organisations as ‘no documents’, which would be an underestimation of the results.
Third, although we included organisations from different continents, we limited our search to English guidelines or translated documents. This might have had a negative impact on articles from Asia-Pacific, possibly explaining why we found fewer organisations and guidelines from Asia-Pacific, compared with North America and Europe. Fourth, data extraction and primary analysis were performed by one reviewer of our research team. It might have improved the accuracy of the results if it was done by multiple researchers. However, this was not feasible for our review. To ensure the quality of the review, we discussed all discrepancies and difficulties with the research team during all steps of the reviewing process.
In conclusion, less than half of the organisations included in our scoping review did provide some kind of guidance on exploring, confirming and interpreting subgroup effects in their guidelines. Furthermore, information was often minimal, and methodological and statistical guidance of best current practices was missing. To further improve and speed-up personalised medicine, more effort is required to uniformly disseminate and integrate research findings on subgroup analysis into practical guidelines for decision making.
This work was in part supported by the “The University of Sydney – Utrecht University Partnership Collaboration Awards”.
Contributors MMR, JH, TD and JBR have contributed to the conception and design of the study. All authors have contributed to the final design and focus of the scoping review. Reviewing has been performed by LHL and SRWW. Scoring and data analysis has been performed by SRWW. SRWW drafted the manuscript. All authors have made contributions to the drafting and revising of the article. All authors have read, reviewed and approved the final version of the manuscript before submission.
Funding This study was funded by a TOP grant by the Netherlands Organisation for Health Research and Development (ZonMW) Number: 91215058.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Search strategies and data extraction documents are available on request to the corresponding author.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.