Objectives Men diagnosed with non-metastatic prostate cancer require standardised and robust long-term prognostic information to help them decide on management. Most currently-used tools use short-term and surrogate outcomes. We explored the evidence base in the literature on available pre-treatment, prognostic models built around long-term survival and assess the accuracy, generalisability and clinical availability of these models.
Design Systematic literature review, pre-specified and registered on PROSPERO (CRD42018086394).
Data sources MEDLINE, Embase and The Cochrane Library were searched from January 2000 through February 2018, using previously-tested search terms.
Eligibility criteria Inclusion required a multivariable model prognostic model for non-metastatic prostate cancer, using long-term survival data (defined as ≥5 years), which was not treatment-specific and usable at the point of diagnosis.
Data extraction and synthesis Title, abstract and full-text screening were sequentially performed by three reviewers. Data extraction was performed for items in the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies checklist. Individual studies were assessed using the new Prediction model Risk Of Bias ASsessment Tool.
Results Database searches yielded 6581 studies after deduplication. Twelve studies were included in the final review. Nine were model development studies using data from over 231 888 men. However, only six of the nine studies included any conservatively managed cases and only three of the nine included treatment as a predictor variable. Every included study had at least one parameter for which there was high risk of bias, with failure to report accuracy, and inadequate reporting of missing data common failings. Three external validation studies were included, reporting two available models: The University of California San Francisco (UCSF) Cancer of the Prostate Risk Assessment score and the Cambridge Prognostic Groups. Neither included treatment effect, and both had potential flaws in design, but represent the most robust and usable prognostic models currently available.
Conclusion Few long-term prognostic models exist to inform decision-making at diagnosis of non-metastatic prostate cancer. Improved models are required to inform management and avoid undertreatment and overtreatment of non-metastatic prostate cancer.
- prostate cancer
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Comprehensive and focused search strategy on prognostic models built around survival outcomes rather than shorter-term surrogates.
Contemporary clinical review of model accuracy and usability, to inform clinical practice.
Thorough bias assessments of individual studies, utilising the newly developed PROBAST tool.
Unable to assess bias across studies, publication bias and selective reporting.
Prostate cancer (PCa) is the most common male cancer and its incidence is increasing, with 1.3 million estimated new cases globally in 2018.1 2 The vast majority of new presentations in the UK (>80%) are with localised or locally advanced disease representing a significant healthcare and economic burden.3 Treatment decisions, in this growing group of men, are notoriously complex with the risk of progression and psychological impact of a cancer diagnosis balanced against significant potential morbidity associated with treatment. These latter problems can be very significant with rates of erectile dysfunction as high as 79% and 66% 3 years after prostatectomy and radiotherapy, respectively, and incontinence rates of 20% and 3%, respectively4. As a result of this, the uptake of conservative management is increasing with rising confidence in using active surveillance.5 6 The predominant decision dilemma is therefore at the point of diagnosis when men are considering treatment options and indeed whether treatment is needed, particularly given the fact that many men may have indolent disease and are more likely to die of non prostate cancer related causes. This point is highlighted in the work of Lloyd et al who showed that the lifetime incidence of being diagnosed with the disease is 13.4% but the risk of dying from the disease is only 4.3%.7
Most national guidelines currently risk-stratify men according to modified versions of the three-stratum D’Amico classification system, first proposed in 19988. This used biochemical recurrence as the primary outcome from a cohort of men all managed by radical treatment. However, biochemical recurrence is known to be a poor surrogate for survival and many men will no longer undergo radical treatment.5 6 9 The value of this system is therefore questionable, especially given its use has moved from predicting radical therapy outcomes to counselling men at diagnosis about whether to have surveillance or treatment. Alternative risk models have been proposed to delineate smaller groups using standard variables (prostate-specific antigen (PSA), Gleason score, T-stage) or which integrate additional parameters, such as biopsy characteristics.10–12 However, many are built around single-centre data, using PSA-screened and heavily radically-treated populations, making them less applicable to the fundamental decision dilemma of whether treatment is needed in the first place.
Prognostic models, according to the American Joint Committee on Cancer (AJCC), should use survival itself as an endpoint, which is less equivocal and more robust.13 Using survival is especially important in prostate cancer given the slow natural history of the disease. In other tumour-types, high quality prognostic models using long-term survival are already integrated into routine practice and endorsed by the AJCC.14 However, no prognostic model for PCa has yet been endorsed, nor to our knowledge, is any such model widely-used in routine clinical practice. Models integrating the impact of radical treatment compared with conservative management would be particularly powerful.
Our objective in this study was therefore to perform a rigorous systematic search of the literature to identify available prognostic models built specifically around long-term patient survival, available for use at the point of a new diagnosis of primary non-metastatic PCa. Our aims were to establish (i) what models were available (ii) their accuracy in terms of discrimination and calibration and (iii) their generalisability, external validation and clinical utility.
The study protocol followed the recommendations of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.15 The review aim, search strategy and study inclusion and exclusion criteria were framed using the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS).16 The search strategy was informed by previous similar studies, including publications which tested and recommended search terms for risk-prediction models.17 18 The full systematic review protocol was pre-specified and registered through PROSPERO, reference CRD42018086394 (https://www.crd.york.ac.uk/prospero/) and is available in the online supplementary files.
In summary, this was a review of studies reporting multivariable long-term survival models for use at the point of diagnosis for men newly diagnosed with non-metastatic PCa. Long-term survival, was defined as ≥5 years following diagnosis. We focused on publications subsequent to January 2000 to increase relevance to modern practice. For inclusion, studies needed to include men undergoing more than one treatment type and models should be multivariable. Models for cancer-specific or all-cause survival outcomes were potentially eligible, and any eventual model types were allowable. Both model development and model validation studies were eligible. Single-parameter or single-treatment studies were excluded. Comprehensive study inclusion and exclusion criteria are shown in box 2.
Eligibility criteria for study inclusion and exclusion
Study inclusion criteria
All of the following inclusion criteria must be met:
Studies reporting models based on men with non-metastatic prostate cancer.
Studies evaluating ‘long-term’ (≥5 years) cancer-specific or overall survival outcomes.
Studies reporting models in screened or non-screened populations.
Studies including men undergoing more than one treatment option.
Models available for use at the point of diagnosis – that is, pre-treatment.
The model includes more than one parameter, that is, multivariable.
Study exclusion criteria
Any of the following is a reason to exclude a study:
Any article that is not an original study (eg, reviews, commentary, editorials, corrigendums, letters).
Conference proceeding or abstract from poster/oral communication only.
Study where data cannot be derived to contribute to a primary or secondary outcome of this systematic review.
Studies pertaining only to men with advanced/metastatic disease.
Studies pertaining exclusively to men after an active treatment option for example, after radical prostatectomy.
Studies of single biomarkers or single parameters only.
Studies including men exclusively undergoing a single treatment type.
Studies were identified by searching MEDLINE, Embase and Cochrane Library from 1 January 2000 to 28 February 2018. Detailed search strategies for each database are available in the online supplementary data. Highly relevant but excluded articles were recorded, collated and the references analysed for additional studies.
Search results were exported into Covidence software, an online screening platform endorsed by Cochrane (covidence.org). Title and abstract screening and full-text screening were sequentially performed by a team of three reviewers. Prior to screening, a pilot screening process was conducted for calibration of screening between reviewers. Reviewers were not blinded to study authors, institution, publication journal or year of publication.
The full list of data items extracted from each included study is recorded in the protocol. These were informed by the CHARMS checklist16 and included: (1) Study design, (2) Characteristics of study participants, (3) Outcomes, (4) Candidate predictors, (5) Sample size, (6) Missing data, (7) Statistical methods, (8) Model performance and evaluation and (9) Usability. Model performance, assessed by discrimination was the principal summary measure.
To assess the validity of eligible studies, individual studies were assessed for bias using the new Prediction model Risk Of Bias ASsessment Tool (PROBAST).19 The PROBAST tool assesses both risk of bias and applicability of both model development studies and validation studies.20
Patient and public involvement
There was no formalised patient and public involvement in the design or conduct of this review. Preliminary results were presented to a departmental patient and public involvement group, and their comments used to inform the write-up of the review.
The search of Cochrane, MEDLINE and Embase yielded 6581 studies after deduplication. Sixteen additional studies were identified by reviewing the references of excluded but relevant studies. A total of 12 studies were eligible for inclusion in the final review.21–32 Two of these had not been summarised in previous reviews.26 27 The PRISMA flow diagram is shown in figure 1, including the reasons for exclusion at full-text screening. Only one reason for exclusion was assigned to each study, when multiple reasons may have been present. Nine of the final 12 included studies were model development studies, three were model validation studies. Two of these external validations were of models already included as model development studies.30 31 One study related to an external validation of the Cancer of the Prostate Risk Assessment (CAPRA) score against mortality.32 The original CAPRA model development study however did not meet the eligibility criteria as it was developed against recurrence rather than long-term survival.11
Characteristics of the included studies’ participants and settings are summarised in table 1. The model development studies used data from over 231 888 men. However, two studies used analytical cohorts from the same registry (Surveillance, Epidemiology and End Results (SEER)).23 29 Two studies used data from single US centres, two used data from groups of four hospitals, and five were from regional or national multicentre registries. Eligibility criteria of patients into studies varied significantly, with some models using very specific or selected sub-cohorts only. For example Nguyen et al included only consenting men, undergoing radiotherapy (RT) or radical prostatectomy (RP), with at least one intermediate or high-risk feature but ≤T3b disease.22 The treatment cohorts included, and whether treatment effect was a parameter in the final model was highly variable. Only six of the nine models included any men who had been managed conservatively (including ‘watchful waiting’ or ‘conservative management’); none of the models described or defined a specific ‘active surveillance’ cohort.21 23 25 27–29 Only three of the nine models included treatment as a predictor variable - none of which were externally validated.24 25 28 Median follow-up was relatively short within all model development cohorts, with the longest reported follow-up being 7.6 years26.
Results of individual studies
The final results of individual studies are summarised in table 2. The primary outcome was cancer-specific mortality only in one study,27 and was overall-survival only in another study.25 The remainder reported measures of both. The study designs, model-types and performance metrics varied markedly. We therefore focused on describing the studies, results, applicability and availability. Significant heterogeneity in the question being asked and statistical methods of validation between studies meant that attempts to meta-analyse data would not be appropriate. Modelling techniques varied, with the majority of studies using a proportional hazards model (Cox or Fine and Grey) although none reported assessing whether the proportional hazards assumption was valid. Included studies did not report any flexible parametric approaches to deal with continuous variables, and the majority used group categorisations of these variables. Reporting of model accuracy was inconsistent. Considering area under the curve and c-indices synonymously, six of the nine studies reported some measure of discrimination with values ranging from 0.63 to 0.90 for PCa survival outcomes, although this higher figure was derived within a small elderly sub-cohort23 28 and 0.58 to 0.73 for overall outcomes.23 25 Only four of the nine studies reported assessing calibration in some capacity.23 25–27 Relative performance within particular sub-populations were not generally reported.
Seven of the nine model development studies reported internal validation. Two reported using bootstrapping, and one used a separate 40% random sample of the original dataset for internal validation.21 26 27 An additional three external validation papers were included (table 3). Of note, each of these included the author of the original model within their author list, suggesting these are not completely independent validations. These three models each used large numbers of subjects over comparable timeframes to their model development study. Discrimination of the Cambridge Prognostic Groups, SEER Cancer Survival Calculator and CAPRA scores were comparable for prostate cancer specific mortality (PCSM) at 0.81, 0.81 and 0.80, respectively, over 5 years30-32 (table 4). Discrimination was poorer for overall mortality at 0.71 in the latter study. These external validation papers performed quite well on individual bias assessment (table 5). In the CAPRA validation paper estimates were reported to be ‘adjusted’ for age and treatment type, such that it is unclear whether the reported accuracy reflects that of the usable model.32
Risk of bias
Risk of bias within studies is summarised in both table 5 and figure 2. Frequent concerns were observed with respect to participant selection and inclusion, particularly with respect to reporting or allowing for missing data. The outcome of death was well-defined and unambiguous in the majority of studies. Every included study had at least one parameter for which there was high concern of bias – leading to a high overall judgement of bias in the PROBAST tool. Concerns about applicability to the review question were present in more than half of the studies.
All of the included studies reported models that were nomograms, look-up tables or grouping stratifications. Seven of the nine studies were clinically usable through the publication itself. One was also available in a dedicated website.31 The SEER Cancer Survival Calculator was never launched online, and the publication itself does not provide sufficient detail to use the model.23 The model by Margel et al was available but not usable as it included year of entry as a predictor variable, making the model usable only on retrospective series rather than in future individual cases. Indeed, this model was developed to answer the question of whether pathological information adds to a prognostic model, rather than being intended for use at diagnosis.21
Treatment decisions at the point of diagnosis of non-metastatic PCa should be informed by the likely prognosis of the disease. Despite finding a number of published prognostic models, there remains a lack of well-validated, unbiased, generalisable models for use at diagnosis. In particular, there was a lack of external validation and dearth of models that compare outcomes between conservative management and radical treatment.
A number of previous reviews have assessed ‘prediction’ or ‘risk’ models in their broadest sense.33–35 Shariat et al have previously published a thorough catalogue of available predictive models in PCa – predicting everything from detecting PCa in the initial biopsy setting to survival. Other reviews have focused exclusively on outcomes following radical prostatectomy or radiotherapy.36 37 Lughezzani et al in 2010 for example summarised models predicting Gleason score upgrading, pathological stage, life expectancy, perioperative mortality, postoperative biochemical recurrence or functional outcomes in addition to PCSM after prostatectomy.36 The paucity of models using mortality as an outcome was particularly noted and the study concluded that no tools were capable of quantifying the benefit of RP relative to other treatment modalities.36 In 2009, a separate review by Shariat et al summarised available models into eight groups. One of these groups was of models predicting survival. This group included only four models, all of which related to advanced or metastatic disease requiring hormonal therapy, or androgen-insensitive disease.33 Green et al explored more historical literature from 1966 until 2012.35 They included four studies which looked at life expectancy in men with localised PCa, two of which were included in our study.25 28 The other two were models from Albertsen et al in 1996 which was prior to our study dates, and from Walz et al in 2007 which focused exclusively on non-cancer mortality and therefore did not meet our eligibility criteria.38 39 A recent review into decision-making tools again took an overview of tools available for use at different points in the patient pathway.40 Here, the only mentioned models that predicted survival were all post-prostatectomy models.40 In 2015 Kent and Vickers reported on ‘gross deficiencies’ in current tools for prediction of non-PCa death and concluded that they were unable to identify a suitable life expectancy tool.41 These previous reviews all suggest there is a lack of focus on and availability of good quality survival models.
Interpretation of findings
Although a number of previous reviews have been published, ours represents the most systematic and contemporary work focused towards the decision dilemma that patients and clinicians face, as it includes only models that are available at the pretreatment stage and that are not treatment-specific. This review demonstrates that only a small number of models for this setting have been published using long-term survival as an endpoint, and only three have been externally validated. However, the number of events reported in these three validation studies were well in excess of the 100 suggested as the minimum number needed for adequate validation.42 Within the external validations, model discrimination of up to 0.81 was reported for disease related mortality.30 31 The included studies highlight the potential for using large datasets to develop prognostic models. These have the advantage of providing data from ‘real-world’ settings, outside of the clinical trial context or data exclusively from specialised centres. Our included studies commonly used elements of good study design that would be in keeping with the AJCC acceptance criteria for risk models.13 For example, criteria that were seen in all studies, were that the prognostic time-zero was well-defined, and that model developments have been published in well regarded peer-reviewed journals. However, other criteria were lacking such as reporting measures of discrimination, assessing calibration and thorough reporting of missing data in validation work.
The most robust tools we found are the three that have been externally validated – namely the Cambridge Prognostic Groups, The SEER Cancer Survival Calculator and the UCSF CAPRA score. Of note, the Cambridge Prognostic Group stratification criteria has not previously been summarised in prior reviews, as both its development and validation have been published in the last few years.27 31 However, the SEER calculator has never been released for public use. Each of the other two models have significant shortcomings by disregarding treatment effect, focusing only on disease-specific mortality and ignoring comorbidity such that their value at the point of diagnosis is diminished. Importantly, each of the three external validation studies also had potential flaws in their design, using predominantly complete-case analyses and non-independent authors.30–32
Many of the included studies used historical cohorts. This is a necessity when using long-term survival as an outcome, but, raises issues of generalisability to men diagnosed in the contemporary setting. The uptake of prebiopsy multiparametric MRI and targeted biopsies, may for instance impact on the type of PCa detected and affect generalisability of previous models to current practice.43 Another issue was of small cohorts and low numbers of events in some cases, often as a result of relatively short follow-up. The value of 5 year outcomes themselves is questionable given randomised control trial data on survival in non-metastatic PCa would suggest cancer-survival over this timeframe is incredibly high.44 An important feature of any prognostic model should be its usability and applicability to a man diagnosed today. However, with unquantified missing data this applicability becomes less clear. Dealing with PCSM in isolation may also be problematic, not least because of the importance of competing risks of death in a disease that affects older men.45 In this review we see that all but two included models were derived from North American data, such that generalisability to European or UK men could be questioned, with differing approaches to PSA-testing and different healthcare contexts. This is particularly relevant with regards to screening, whereby historical American cohorts are likely to have been detected through PSA screening, such that issues of lead time bias and detection of cancers that would not be considered clinically-significant may affect model generalisability to modern practice.
Access for clinicians and patients to know about and use models should be easy with modern web-based software. However, we found models were often only available in paper nomograms published with the article. Rather than online resources increasing the availability of models, there is the suggestion of the opposite occurring, with sites such as www.nomogram.org and www.clinicriskcalculators.org which were previously cited in reviews no longer being available online.36 Indeed, the SEER Cancer Survival Calculator, one of the most promising models we explored, which was also externally validated, was never made available online or for clinical use.23 30 The UCSF CAPRA score (www.urology.ucsf.edu) and Cambridge Prognostic Groups (www.cambridgeprognosticgroup.com) on the other hand are freely available online. Online publication does not in itself difficulty in accessing models, or a reluctance to fully share coefficients may partly explain the lack of external validations by researchers outside of the original models’ authorship group. Another hurdle to acceptance will be the ‘face validity’ of a model. For example, in any PCa prognostic model clinicians would likely expect grade, PSA and stage to be incorporated as a minimum set of variables; each of which has been shown to be independently prognostic.46–48 However, two of the models based on the largest dataset failed to include PSA which is inadequately recorded in the SEER database.23 30 A number of the other models used 3-strata grade classifications, rather than the full Gleason grade system, or the contemporary grade group system, which would again seem inadequate for a modern PCa model.49
A key step in the development of any prognostic model, following validation, should be a clinical impact study to quantify whether the model’s use improves decision-making and patient outcomes within a comparative design.50 Reviewing clinical impact studies was beyond the scope of this formalised review, but no such studies were found on simple literature review. These studies not only assess whether use of the model is an improvement on standard care, but also enable the study of factors that may affect implementation into care, such as the acceptability and ease of use to clinicians or patients, which can be difficult to assess in a review such as this.50 Impact studies can also help to bridge the gap between clinical validity and clinical utility, as utility of a model is not proportional to its prognostic capabilities. This has recently been explored further in a review on the UCSF CAPRA score, which confirmed its prognostic capacities, but was unable to demonstrate clinical usefulness – particularly when deliberating between different treatment strategies.51
Strengths and limitations at review level
While this review has particular strengths in its broad coverage of the literature and search strategy, we recognise potential limitations. Although we have assessed bias within individual studies we recognise that risk of bias will also exist across studies, driven particularly by publication bias and selective reporting within studies which we were unable to assess. Other limitations to our review may relate to our timeframe of inclusion of studies only from 2000 onwards. Models developed prior to this time may have undergone more thorough testing or validation and clinical impact assessment. However, our rationale for focusing on this contemporary time period was to investigate models appropriate to modern management; with significant changes having taken place in patient management and diagnostic practice since that time.43 49
We recognise that exciting developments are also underway to propose genomical or biomarker-based prognostic indicators.52 Many of these are currently reported as single parameter studies, rather than being incorporated into existing models. As such these would not meet the eligibility criteria for this review. As others have suggested, any incremental value of these models should be assessed against ‘a gold-standard multivariable clinical prognostic model’.53
Very few long-term prognostic models exist to inform the predominant decision dilemma of whether to undergo treatment or not after first diagnosis of non-metastatic prostate cancer. Current models are limited by inadequate external validation and fall short of many of the expectations of an unbiased, high-quality prognostic model.13 The most robust available tools are the Cambridge Prognostic Groups and the UCSF CAPRA score. However, both have significant shortcomings and are limited in their applicability at diagnosis by failing to include treatment effect, and disregarding non-cancer mortality. Work should focus on developing prognostic models built on long-term survival outcomes which maximally utilise available clinico-pathological information and contextualise PCa within a patients context of competing risks. High quality models including treatment effect are overdue, and crucial if both undertreatment and overtreatment of prostate cancer is to be minimised.
Contributors DT conceived and designed the project under the supervision of VJG and PP. Screening and data extraction were performed by DT, SR and BB. Analyses were performed by DT. DT, SR and VJG wrote the manuscript. Each author reviewed and edited the final version.
Funding This work was supported through two research scholarships received through The Urology Foundation (DT & SR). The funders played no role in study design, collection, analysis, interpretation of data, writing of the report or in the decision to submit for publication. They do not endorse or accept any responsibility for the content.
Competing interests VJG was involved in developing the CPG model, but derives no monetary or other remuneration from the model, which is freely available and independently peer-reviewed.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
Patient consent for publication Not required.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.