Objectives To investigate whether progression-free survival (PFS) can be considered a surrogate endpoint for overall survival (OS) in advanced non-small-cell lung cancer (NSCLC).
Design Meta-analysis of individual patient data from randomised trials.
Setting Five randomised controlled trials comparing docetaxel-based chemotherapy with vinorelbine-based chemotherapy for the first-line treatment of NSCLC.
Participants 2331 patients with advanced NSCLC.
Primary and secondary outcome measures Surrogacy of PFS for OS was assessed through the association between these endpoints and between the treatment effects on these endpoints. The surrogate threshold effect was the minimum treatment effect on PFS required to predict a non-zero treatment effect on OS.
Results The median follow-up of patients still alive was 23.4 months. Median OS was 10 months and median PFS was 5.5 months. The treatment effects on PFS and OS were correlated, whether using centres (R²=0.62, 95% CI 0.52 to 0.72) or prognostic strata (R²=0.72, 95% CI 0.60 to 0.84) as units of analysis. The surrogate threshold effect was a PFS hazard ratio (HR) of 0.49 using centres or 0.53 using prognostic strata.
Conclusions These analyses provide only modest support for considering PFS as an acceptable surrogate for OS in patients with advanced NSCLC. Only treatments that have a major impact on PFS (risk reduction of at least 50%) would be expected to also have a significant effect on OS. Whether these results also apply to targeted therapies is an open question that requires independent evaluation.
- Clinical Pharmacology
this is an open-access article distributed under the terms of the creative commons attribution non-commercial license, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. see: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
Statistics from Altmetric.com
To investigate whether progression-free survival (PFS) can be considered a surrogate endpoint for overall survival (OS) in advanced non-small-cell lung cancer (NSCLC).
Our analyses provide only modest support for considering PFS as an acceptable surrogate for OS in patients with advanced NSCLC.
Only treatments that have a major impact on PFS (risk reduction of at least 50%) would be expected to also have a significant effect on OS.
Strengths and limitations of this study
Analyses based on individual patient data.
Widely accepted statistical methodology for surrogate endpoint validation.
Data available on a limited number of trials.
Results may not apply to targeted therapies.
A surrogate endpoint is a measure that can substitute for a ‘final’ or ‘true’ clinical endpoint to predict patient outcomes earlier or more conveniently than with the true endpoint. The conditions required for an endpoint to be considered a valid surrogate have been intensely studied in the recent statistical literature, and whether a surrogate can ever be ‘validated’ is still a matter of debate today.1–8 Two independent conditions have proven useful to explore potential surrogate endpoints in clinical settings: one stipulates that the surrogate endpoint should predict the clinical endpoint, and the other that the effect of a treatment on the surrogate endpoint should predict the effect of that treatment on the true endpoint.9
Overall survival (OS) remains one of the most important clinical outcomes for assessing the efficacy of cancer treatments in randomised clinical trials. However, in most cases, deaths occur only after prolonged follow-up, and with the increasing number of active cancer treatments, the effect of a first-line agent on OS may be confounded by subsequent therapies. Progression-free survival (PFS), measured from randomisation until objective tumour progression or death, can be assessed earlier than OS, but whether it can be considered a valid surrogate for OS depends on the malignancy and the treatment under investigation. For example, OS differences can be reliably predicted from PFS differences in advanced colorectal cancer treated with fluoropyrimidines, but not in advanced breast cancer treated with anthracyclines or taxanes.10 ,11
We investigated whether PFS is an acceptable surrogate for OS in patients with advanced non-small-cell lung cancer (NSCLC) using individual data from 2334 patients enrolled in five randomised controlled trials comparing docetaxel-based chemotherapy with vinorelbine-based chemotherapy as the first-line treatment for advanced NSCLC.
Patients and methods
We analysed data from seven randomised controlled trials12–18 included in a published meta-analysis of OS comparing docetaxel-based chemotherapy with vinca-alkaloids-based chemotherapy as the first-line treatment for NSCLC.19 Eligible trials included at least one treatment arm with either docetaxel alone or in combination with either a platinum agent (cisplatin or carboplatin) or gemcitabine and at least one vinca-alkaloid-based treatment arm. Two of the seven trials included in the meta-analysis of OS could not be included in our analysis of surrogacy because the definition of PFS could not be ascertained reliably in spite of in-depth review of the case report forms.13 ,14
Table 1 provides details on the remaining five trials. The experimental arm consisted of docetaxel plus platinum (cisplatin or carboplatin) in two trials, docetaxel plus gemcitabine in two trials and docetaxel alone in one trial. The control arm consisted of vinorelbine plus cisplatin in four trials and vinorelbine alone in one trial. Standard chemotherapy doses and schedules were used in the experimental and control arms.
Table 2 shows the distribution of baseline patients’ characteristics analysed in the five trials. The WJTOG trial only included patients ≥70 years and performance status ≥1.18 The Taxobel 303 trial only included stage IV patients.15 Approximately three-quarters of the patients were male and one-third of the tumours were squamous cell carcinomas.
A first meta-analysis, based on summary data extracted from the papers describing the results of the seven trials, suggested that docetaxel-based regimens were slightly superior to vinca-alkaloid-based regimens in terms of OS for first-line therapy of advanced NSCLC.19 A subsequent meta-analysis confirmed these results using individual patient data from a total of 322 centres participating in the same set of seven trials.20 The following data were requested for the subsequent meta-analysis: patient identifier, centre identifier, randomisation date, treatment assigned by randomisation, age, gender, body mass index (BMI), performance status, stage, overall tumour response to the first assigned treatment, date of response, date of progression with the first allocated treatment, date of death or last visit, survival status and cause of death if applicable.
Time to event analyses
PFS was defined as the time from random assignment to disease progression (as assessed in each individual trial) or death from any cause. OS was defined as the time from random assignment to death from any cause. The distributions of PFS and OS were estimated using the Kaplan-Meier method. Treatment groups were compared using a Cox regression model. The median follow-up time was estimated using the Kaplan-Meier method with censoring for death.
A two-level modelling approach was adopted to estimate the association between PFS and OS and between the treatment effects on these endpoints. Treatment effects were estimated as logarithms of the HR (log HR). The log HR has intuitive appeal as a measure of treatment effect: it is equal to zero in the absence of a treatment effect, and is approximately equal to the risk reduction for small treatment effects (hence a log HR of −0.10 corresponds to a risk reduction of about 10%, and a log HR of 0.10 corresponds to a risk increase of about 10%). The log HRs were estimated within units of analysis consisting of 135 centres or 64 strata. Centres were either individual centres if they had more than three patients per treatment arm or groups of small centres with an average size at least equal to the average size of the big centres of the same trials (Tax 326: average size=15, Taxobel 303: average size=17, Hellenic Oncology Research Group: average size=25, West Japan Thoracic Oncology Group 9904: average size=14 and French: average size = 20). Strata were defined within each trial by the cross-classification of the following prognostic factors: age (<60 vs ≥60 years), gender (male vs female), performance status (Eastern Cooperative Oncology Group 0 or 1 vs 2 or 3), BMI (<18.5 vs ≥18.5 kg/m²), histology (squamous vs non-squamous) and stage (IIIb vs IV vs unknown). Prognostic strata were formed as follows: a Cox model for OS was fitted within each trial with treatment, each of the prognostic factors listed above and the treatment-prognostic factor interactions. The prognostic factors were ordered by increasing level of significance for the treatment-prognostic factor interaction. The first prognostic factor selected in this way was the factor most predictive of the effect of treatment on OS and was used to split the patients of a trial into two (or three) strata. Each of these strata was then split by the second prognostic factor and so on. The splitting was stopped when it produced strata with less than three patients per treatment arm.
The association between PFS and OS was quantified through a bivariate copula model fitted on individual patient data. Kendall's τ was used to quantify the correlation between the endpoints.21 A linear regression model was fitted on the estimated treatment effects on PFS and on OS (log HRs for PFS and OS). Coefficients of determination (equal to squared correlation coefficients) were estimated using weighted linear regression.21 Coefficients of determination (R²) quantify the proportion of variance explained by the regression. The surrogate threshold effect was defined as the minimum treatment effect on PFS required to predict a non-zero treatment effect on OS in a future trial.22
Treatment effects on PFS and OS
A total of 2331 patients were included in the analysis, since PFS was missing for three patients. The median follow-up of patients still alive was 23.4 months. For the entire cohort, the median OS was 10 months and the median PFS was 5.5 months (figure 1), with little difference between the curves until about 12 months.
The HR were 0.97 for PFS (95% CI 0.89 to 1.05, p=0.44) and 0.92 for OS (95% CI 0.84 to 1.01, p=0.089) (figure 2). There was significant heterogeneity between the five trials in terms of PFS (p=0.01) but not in terms of OS (p=0.72).
Correlation between PFS and OS
PFS showed some correlation with OS (τ=0.59; 95% CI 0.58 to 0.61).
Correlation between treatment effects
The coefficient of determination between treatment effects estimated within centres was R²=0.62 (95% CI 0.52 to 0.72). The linear regression equation was log HR (OS)=−0.048+0.76×log HR (PFS) (figure 3). Using centres as the unit of analysis, the surrogate threshold effect was a PFS HR of 0.49, indicating that a risk reduction of 51% in terms of PFS would predict a non-zero effect on OS.
The coefficient of determination between treatment effects estimated within strata was R²=0.72 (95% CI 0.60 to 0.84). The linear regression equation was log HR (OS)=−0.071+0.87×log HR (OS) (figure 4). Using strata as the unit of analysis, the surrogate threshold effect was a PFS HR of 0.53, indicating that a risk reduction of 47% in terms of PFS would predict a non-zero effect on OS.
Our analyses suggest that PFS is not a statistically acceptable surrogate endpoint for OS in patients with metastatic NSCLC treated in first line with docetaxel-based or vinorelbine-based chemotherapies. Indeed, although about two-thirds of the treatment effects on OS are explained by the treatment effects on PFS, the surrogate threshold effect ranges from 0.49 to 0.53 depending on whether strata or centres are used as the unit of analysis, which implies that only a major benefit of some new drug on PFS (hazard reduction of about one-half or greater) would be expected to also produce a non-zero benefit on OS. These analyses are quite similar, regardless of whether treatment effects are estimated in centres participating in the trials or in strata defined by the characteristics of the patients most predictive of survival benefits, despite the fact that the latter analysis could have overestimated the association between the treatment effects through deliberate confounding by the prognostic factors used to define the strata.
In this set of trials, docetaxel-based regimens showed a trend towards better results than vinorelbine-based regimens, but the difference was not significant for either OS (HR=0.92, p=0.089) or PFS (HR=0.97, p=0.44). If anything, the difference was more pronounced for OS than for PFS, an unusual finding with advanced solid tumours, for which a benefit on PFS is generally diluted to yield a smaller benefit on OS.10 ,11 Other meta-analyses did not support this finding. A meta-analysis comparing gemcitabine–platinum with other platinum-containing regimens found about the same benefit of gemcitabine–platinum on PFS (HR=0.88, information available on 14 of 17 trials) as on OS (HR=0.91 for the same 14 trials).23 A meta-analysis of trials comparing longer with shorter durations of chemotherapy found a much more pronounced benefit of longer chemotherapy duration on PFS (HR=0.75 on 9 of 13 trials) than on OS (HR=0.93 for the same 9 trials).24
The difference between the median PFS and the median OS was only 4.5 months in this and other meta-analyses,23 and therefore the gain in time from using PFS instead of OS in future trials of chemotherapy for advanced NSCLC would not be as large as in other tumour types.10 ,11 The short survival time after progression implies that differences in OS are likely to be observed for truly effective new treatments.25 All in all, these findings suggest that even if PFS could be proposed as a plausible surrogate for OS from a statistical point of view, it would not be a very attractive one to evaluate the worth of conventional chemotherapies for advanced NSCLC. The exclusion of two trials with unreliable PFS from our meta-analysis casts further doubts on the usefulness of this endpoint in advanced NSCLC, at least as measured a decade ago. Such exclusions also cast some doubts on meta-analyses that rely solely on published papers rather than on carefully reviewed individual patient data.26
Our analyses have several limitations. We used a pragmatic approach, using PFS as measured by the investigators in each trial, ignoring any possible differences in the measurement techniques or schedules. While such differences may have an impact on PFS duration, they are unlikely to have much impact on the PFS HR. It is also unlikely that the results would have been much different, had a blinded central review of PFS been available in all trials.27 The fact that the treatment doses and schedules differed from trial to trial does not raise any particular concern. Indeed, such differences could have obscured (rather than enhanced) the relationship between treatment effects on PFS and on OS; hence, the observed relationship is probably an underestimate of what would have been observed in a more homogeneous setting. More importantly, the randomised comparisons in this set of trials were between two standard combinations of cytotoxic drugs. Although these analyses provide modest support for considering PFS as an acceptable surrogate for OS in patients with advanced NSCLC, treatments that have a major impact on PFS would be expected to also have a significant effect on OS. The relationship between PFS and OS and between the treatment effects on PFS and OS might not be the same for different cytostatic agents or for targeted agents, and given the obvious advantages of using PFS as the primary endpoint in randomised trials, these issues deserve further investigation through further meta-analyses of contemporary randomised trials.28
Contributors SL was responsible for the conception and design, collection of data, data extraction, interpretation of results, revision and the final approval of the manuscript. PS was responsible for data analysis and for drafting the article. NB was responsible for data extraction and monitoring. FF was responsible for the collection of data, revision and the final approval of the manuscript. VG was responsible for the collection of data, revision and the final approval of the manuscript. JLP was responsible for the collection of data, revision and the final approval of the manuscript. JYD was responsible for the conception, revision and the final approval of the manuscript. SK was responsible for the collection of data, revision and the final approval of the manuscript. JPP was responsible for the conception and design, revision and the final approval of the manuscript. EQ was responsible for data analysis and for drafting the article. MB was responsible for the conception and design, interpretation of results, for drafting and revising the article and for the final approval of the manuscript.
Funding This work was supported by an unrestricted research grant from sanofi-aventis, Paris, France and by the Ligue Nationale Contre le Cancer, France.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.