Article Text

Original research
Discrimination capability of pretest probability of stable coronary artery disease: a systematic review and meta-analysis suggesting how to improve validation procedures
  1. Pierpaolo Mincarone1,
  2. Antonella Bodini2,
  3. Maria Rosaria Tumolo1,
  4. Federico Vozzi3,
  5. Silvia Rocchiccioli3,
  6. Gualtiero Pelosi3,
  7. Chiara Caselli3,
  8. Saverio Sabina4,
  9. Carlo Giacomo Leo4
  1. 1Institute for Research on Population and Social Policies, National Research Council, Brindisi, Italy
  2. 2Institute for Applied Mathematics and Information Technologies "Enrico Magenes", National Research Council, Milan, Italy
  3. 3Institute of Clinical Physiology, National Research Council, Pisa, Italy
  4. 4Institute of Clinical Physiology, National Research Council, Lecce, Italy
  1. Correspondence to Dr Carlo Giacomo Leo; leo{at}ifc.cnr.it

Abstract

Objective Externally validated pretest probability models for risk stratification of subjects with chest pain and suspected stable coronary artery disease (CAD), determined through invasive coronary angiography or coronary CT angiography, are analysed to characterise the best validation procedures in terms of discriminatory ability, predictive variables and method completeness.

Design Systematic review and meta-analysis.

Data sources Global Health (Ovid), Healthstar (Ovid) and MEDLINE (Ovid) searched on 22 April 2020.

Eligibility criteria We included studies validating pretest models for the first-line assessment of patients with chest pain and suspected stable CAD. Reasons for exclusion: acute coronary syndrome, unstable chest pain, a history of myocardial infarction or previous revascularisation; models referring to diagnostic procedures different from the usual practices of the first-line assessment; univariable models; lack of quantitative discrimination capability.

Methods Eligibility screening and review were performed independently by all the authors. Disagreements were resolved by consensus among all the authors. The quality assessment of studies conforms to the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2). A random effects meta-analysis of area under the receiver operating characteristic curve (AUC) values for each validated model was performed.

Results 27 studies were included for a total of 15 models. Besides age, sex and symptom typicality, other risk factors are smoking, hypertension, diabetes mellitus and dyslipidaemia. Only one model considers genetic profile. AUC values range from 0.51 to 0.81. Significant heterogeneity (p<0.003) was found in all but two cases (p>0.12). Values of I2 >90% for most analyses and not significant meta-regression results undermined relevant interpretations. A detailed discussion of individual results was then carried out.

Conclusions We recommend a clearer statement of endpoints, their consistent measurement both in the derivation and validation phases, more comprehensive validation analyses and the enhancement of threshold validations to assess the effects of pretest models on clinical management.

PROSPERO registration number CRD42019139388.

  • coronary heart disease
  • cardiovascular imaging
  • computed tomography
  • public health

Data availability statement

All data relevant to the study are included in the article or uploaded as supplemental information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is the first meta-analysis summarising the most up-to-date data on the discrimination capability of pretest probability models of stable coronary artery disease.

  • The systematic review pays careful attention to the whole validation procedures.

  • The majority of included studies were considered to be of high methodological quality.

  • We considered pretest models developed in cohorts of patients referred for an anatomical test.

  • The meta-analyses have a low reliability due to the small number of included studies and the very high heterogeneity.

Introduction

The leading cause of mortality and morbidity worldwide in 2019 was represented by cardiovascular disease with 523 million prevalent cases and 18.6 million deaths.1 Among these, coronary artery disease (CAD) was reported in 197 million subjects and caused 9.14 million deaths. Stable CAD is typically caused by the build-up of plaques that limit blood flow and is characterised by reversible myocardial demand/supply mismatch usually inducible by exercise, emotion or other stress, and commonly associated with transient chest pain (stable angina pectoris).2 3

Stable CAD diagnosis is supported by non-invasive functional and/or anatomical testing,2 3 and invasive coronary angiography (ICA).2 To limit the risk of inappropriate examinations and their consequences on patients’ and healthcare professionals’ safety, and economic sustainability of healthcare systems,4–7 eligibility to diagnostic testing is established through models that provide a risk stratification of subjects based on a pretest probability (PTP) of CAD. Since the introduction of the Diamond-Forrester model (DFM)8 and the Duke Clinical Score (DCS),9 several alternative PTP models have been developed in cohorts of patients referred for ICA or coronary CT angiography (CCTA). Indeed, due to its very high sensitivity and negative predictive values, CCTA can substantially contribute to ruling out CAD.10 The DFM and its more recent updates have been recommended in guidelines for stable symptomatic subjects.3 11 Recent debates within scientific societies broach the question of the overestimation flaw of such models. The UK National Institute for Health and Care Excellence (NICE) has preferred no longer to resort to a probabilistic risk-stratification approach and adopt a simpler identification of anginal chest pain to decide for further testing.12 The European Society of Cardiology (ESC) updated guideline that determines PTPs from the stratified prevalence of CAD in a contemporary cohort, instead of recurring to a prediction model as in the past. These new estimated risks are noticeably lower compared with the previous ones and then underestimation of the disease prevalence can be obtained in different populations.13 US experts are debating on whether adopting the NICE diagnostic approach or keeping on using PTP.14 15 To face the flaws on widely recognised PTP models highlighted by NICE and ESC, these organisations clearly underline the need for more information on the various risk factors acting as modifier of the PTP, especially in the low probability range,11 and for the development and validation of new scores addressing outstanding uncertainties in the estimation of the PTP of CAD.12

This review provides several new contributions to the actual debate on how to ameliorate the PTP models developed for anatomically defined outcomes. It mainly focuses on external validation,16 carries out a meta-analysis to identify the best results and characterises the best procedures in terms of discriminatory ability, significant predictive variables and method completeness. By highlighting some key issues that could be further improved on the development and validation phases, this work aims at stimulating more rigorous procedures for the comparison of different pretest models.

Methods

This systematic review conforms to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.17,18

Study inclusion and exclusion criteria

We identified studies that validated pretest models intended for the first-line assessment of patients with chest pain and a suspect of stable CAD. The disease was considered as a binary outcome determined through either ICA or CCTA. Reasons for exclusion were: (1) acute coronary syndrome, unstable chest pain, a history of myocardial infarction or previous revascularisation; (2) models that included a diagnostic procedure that does not reflect the usual practices of the first-line assessment3 11; (3) models based on a single predictive variable; and (4) lack of clearly stated discrimination capability. Unlike previous works,19 external validation was primarily considered. We also included internal validation but limited it to k-fold cross-validation as a technique inspired by the same purposes of external validation. Moreover, papers referring to machine learning (ML)-based PTP models have been excluded as considered in a recent review focusing on CAD diagnosis by ML with aims close to ours.20

Only full papers were retained because other publications, for example, letters to editors, conference proceedings, etc, are usually not assessed for study quality. Only articles published in English and Italian were considered.

Searches

The databases Global Health (Ovid), Healthstar (Ovid) and MEDLINE (Ovid) were systematically searched (CGL, PM) on 22 April 2020 using several keywords including: angina pectoris, chest pain, coronary artery disease, coronary heart disease, coronary stenosis, stratification score, likelihood function, predictive model, pre-test probability, coronary angiography, cardiac catheterisation and computed tomography angiography. The same full electronic search strategy was applied to the three databases (no filter was used), and is reported in online supplemental file 1c. Citation searches were also performed on reference lists of definitively included studies.

Study selection

Eligibility screening was performed independently by all the authors. Preliminary screening was performed using Abstrackr21 based on title and abstract with each paper assessed by two randomly assigned reviewers among the authors. Selected papers were assessed based on full text. Disagreements were resolved by consensus among all the authors.

Data extraction strategy

A data collection form was developed by three authors (AB, CGL, PM) and filled in by reviewers independently. Each selected paper was assigned for data extraction to the statistician (AB) and two randomly selected reviewers. Correspondence with the authors of the included studies was initiated if necessary. The reviewers worked independently and in plenary session meetings. Disagreements were resolved by consensus among all the authors. AB, CGL and PM reviewed the final form for internal consistency.

Study quality assessment

The quality assessment of included studies conforms to QUADAS-2 and was performed by four reviewers (AB, CGL, PM, MRT).22 Due to the previously described features (1–4), we considered that the eligible works did not raise applicability concerns.

Data synthesis and statistical analysis

The discriminative performances of prediction models can be summarised using several methods and indices, and the area under receiver operating characteristic (ROC) curve (AUC) or c-statistics is certainly the best known and more suitable.23 Then, it has been chosen as the main index for the purposes of this review. Sensitivity and specificity also describe the discrimination capability of the model for a given cut-off and thus provide an indication of clinical usefulness. However, the bivariate nature of this index is not suitable for direct comparisons and then we resorted to the associated AUC.

For the purposes of generalisation of a PTP model to populations that differ from the development population study, the computation of performance indices is not sufficient because a lower performance is usually expected.16 24 Therefore, we also noted whether more extended validation procedures were performed in order to properly apply a model to new populations.

A random-effects meta-analysis of AUC values from validations of each identified model was performed using R Statistical Software (R Project for Statistical Computing, RRID:SCR_001905)25 by meta26 and auctestr27 packages. Meta-regression was planned to explore the possible sources of unexplained heterogeneity by considering the following factors: (1) sample size, (2) prevalence and (3) anatomical test for outcome assessment.

Patient and public involvement

Patients and the public were not involved in this review.

Results

Study selection

A total of 5711 studies were identified (three through reference lists of included studies) and 2685 different abstracts were screened. Out of the 71 relevant full-texts assessed for eligibility, 27 were finally included (figure 1).

Figure 1

Flow diagram of the study selection process.

Study characteristics

Table 1 summarises the selected studies in terms of model name, geographical location and population recruitment criteria. Sometimes the same model is referenced with different names across the papers, then table 1 indicates the original name and the one we adopted here.

Table 1

Characteristics of the studies on PTP for CAD

Studies are mainly conducted in North America28–37 or Europe.38–46

The updated DFM (uDFM),28 38–40 42–50 and the CAD Consortium Clinical model (CADC-Clin)28 31 34 39–42 46 50 51 are the most assessed models.

The quality of included studies is generally high due to the specific review question and adopted eligible criteria. Nevertheless, a risk of bias arises from a few specific issues. A few validation studies29 33 37 43 51 do not declare that they enrolled only consecutive or random samples of patients. With respect to the index test, only one work adopted an optimal discriminating threshold in addition to prespecified ones.37 Application of CCTA as a reference test yields a risk of bias in many studies30 31 36 43 45 47 48 51 52 that do not report measures against misclassification of the test results. Finally, in four works,31 35 38 51 patients did not receive the same reference test for the diagnosis of stable CAD. A graphical summary of the risk of bias is reported in online supplemental file 3.

Predictive variables

As shown in table 2, the identified models can be classified into two broad classes: basic models, including the DFM (based on age, sex and chest pain) and its updates, and clinical models, including the DCS and the models that extend the DFM by adding a few, mainly traditional,53 risk factors. Within this quite classic framework, the Corus CAD model is distinguished by relating CAD to patients without diabetes to the expression levels of a set of genes. All the models were derived by logistic regression. Exceptions are: DFM, derived by a conditional probability analysis in the late 1970s; Corus CAD, obtained through Ridge regression; CONFIRM score, developed to predict adverse clinical events by fitting a Cox proportional hazards model and subsequently validated for diagnosis of CAD.

Table 2

PTP models’ variables

Cross-validation51 and split sample30 33 have been used in a few cases only.

Predictors were classified into four macro-areas: demography, medical history, clinical presentation/physical examination and biochemistry. The demographic macro-area is present in all models with the variables age and sex, while race is only included in the Expanded clinical model and PROMISE Minimal Risk model. The most used variables in the medical history macro-area are diabetes mellitus and hypertension. The clinical presentation/physical examination macro-area is present in all but the Corus CAD models. Only the Corus CAD and PROMISE Minimal Risk models do not include chest pain. The most used variable in the biochemistry macro-area is dyslipidaemia. The other risk factors are model specific: gene expression (Corus CAD), oestrogen status (Morise score), high-density lipoprotein cholesterol (PROMISE Minimal Risk model) and the high-sensitivity cardiac troponin (uDFM-cTn).

Discrimination capability

All the papers presented ROC curves and/or AUC values. In Adamson et al,47 fixed thresholds only were analysed and the c-statistics associated with sensitivity and specificity reported. Table 3 reports the AUC values and their 95% CIs, while the summary of the meta-analyses conducted for the models with more than one validation is shown in figure 2, where models with a single validation are also considered for the sake of completeness. To carry out meta-analyses as complete as possible, the missing information about the SE of estimated AUC values was filled in by the ‘se_auc’ command of the auctestr package. Then, the (Gaussian) 95% CIs are reported in table 3. This computation only requires to know the study sample size and the prevalence, and is as better as the size of the study is larger. For a small sample size, the computed SE is generally larger than the exact one and then CIs are more conservative. For only two papers, the conditions for inclusion in the meta-analyses are not met.29 30

Table 3

AUC values of PTP models

Figure 2

Summary of the meta-analyses. Models that were validated by one study only are denoted by area under receiver operating characteristic curve (AUC)* and a grey colour in the graphic. CAD, coronary artery disease; CADC-Basic, CAD Consortium Basic model; CADC-Clin, CAD Consortium Clinical model; CASS, Coronary Artery Surgery Study; DCS, Duke Clinical Score; DFM, Diamond-Forrester model; HRA, High Risk Anatomy; uDFM, updated DFM.

AUC values range from 0.5147 (almost failing) to approximately 0.8151 (almost excellent). The statistical heterogeneity of the AUC values among the studies validating each PTP model was assessed by using the Cochran Q test and the I2 statistic.54 In all but two cases (CONFIRM score and Morise score), a statistically significant heterogeneity has been obtained, as expected (p<0.003). On the one hand, the lack of heterogeneity is unreliable, due to the low number (≤5) of included studies and the low power of the Cochran Q test. On the other hand, significant heterogeneity exceeds 0.90 for most analyses and even 0.95 undermining significant interpretations (55 and references therein). Then, in the following the discussion of the pooled values is complemented by a detailed discussion of the individual results.

From the meta-analyses, uDFM-cTn and CONFIRM show the best performances (AUC=0.757 and pooled AUC=0.7554, respectively). In slightly more detail, the extension of uDFM with the use of high-sensitivity cardiac troponin I (uDFM-cTn) has been validated in only one population where it showed a significantly higher AUC than uDFM alone (0.757 vs 0.738, p=0.025) and better calibration (Hosmer-Lemeshow (HL) p=0.0001 vs HL p=0.1123).38 The substantially steady results of the CONFIRM score on several data sets are also confirmed on a validation data set consisting of subjects at the low extreme of traditional cardiovascular risk factor burden.56

DFM, its DFM/Coronary Artery Surgery Study (CASS) version, uDFM and Morise score show the lowest pooled AUC values <0.70. In slightly more detail, DFM/CASS has the lowest pooled AUC value (0.61) due to the two threshold-based validations reported in.47 By excluding these values from the meta-analysis, the pooled AUC value becomes closer to 0.70 (0.6861, 95% CI: 0.6312 to 0.7409) and heterogeneity decreases to a non-significant level (I2=41.9%, p=0.19). With regard to the DFM and its DFM/CASS version, overestimation is usually reported, especially in women.45 However, the DFM’s inferior result is also due to the fact that usually it was not carefully validated but only used as a usual reference model32 44 45 or as a basis to establish the performances of the Corus CAD model.33 35 37 The only deep validation is presented in 43. The Morise score and the Corus CAD are the only two models explicitly considering a female-specific factor (the oestrogen status and a sex-specific score, respectively): when directly compared with the same validating population, the Corus CAD had significantly higher AUC than the Morise score (0.79 vs 0.65, p<0.001).35

The uDFM and the CADC-Clin are the two most validated models with completely different performances (pooled AUC values: 0.6866 vs 0.7406). The uDFM updated and extended the traditional DFM to a contemporary cohort that included subjects 70 years and older. The CAD Consortium Basic model (CADC-Basic) can be considered as a further update on a different contemporary population (see table 2). The most complete validation of the uDFM, considering calibration-in-the-large, recalibration and eventually re-estimation, has been performed by the developers themselves43 who obtained a valid overall effect of predictors. The other validating procedures limit themselves to AUC computation and to a rough assessment of under/overestimation, mainly by the HL goodness-of-fit test and related calibration plots (calibration-in-the-large is applied in one study42).

The CADC-Clin model shows good performances on validating populations by reaching estimated AUC values even >0.80, and this high performance level is generally confirmed in other validations by taking into account estimation uncertainty (95% CIs including 0.80).28 34 40 Moreover, its performances significantly improve with respect to the related CADC-Basic.28 31 34 51 The pooled AUC value (0.7406) is only slightly lower than the highest ones. It could even have been the best one if three highly performing validations51 had presented all the data (ie, SE) for their inclusion in the meta-analysis. The generalisability of the CADC-Clin model to external populations was analysed by deep validation procedures.31 34 41 46 Results on miscalibration analysis could be considered quite consistent across papers. This finding indicates smaller than expected effects of the diagnostic characteristics, chest pain typicality in particular.31 34 41 Model calibration can be worse in women compared with men, a situation that also arises from the validation of other models (eg, DFM43). Despite different pooled AUC values, direct comparisons of either uDFM or CADC-Clin with the CONFIRM history-based score do not lead to a clear evaluation of the advantages of one over the other in terms of AUC,40 42 while the CONFIRM score proves to be better than the DFM.52 Figures 3 and 4 show the forest plot of the meta-analyses for uDFM and CADC-Clin model, the two most validated models. The heterogeneity for the uDFM model is not significantly reduced by removing the two threshold validations in Adamson et al47 (I2=95% vs I2=97.4%). For the uDFM and CADC-Clin models, a meta-regression analysis was also conducted which did not lead to any significant result.

Figure 3

Forest plot of the meta-analysis for the updated Diamond-Forrester model. *PROMISE trial; **SCOT-HEART trial. AUC, area under receiver operating characteristic curve.

Figure 4

Forest plot of the meta-analysis for the CAD Consortium Clinical model. AUC, area under receiver operating characteristic curve; CAD, coronary artery disease.

The traditional DCS generally overestimates prevalence and shows a lack of fit by the HL test. Moreover, miscalibration results from a reduced effect of sex and chest pain typicality and an increased effect of diabetes and dyslipidaemia.51

The Corus CAD model stands out from the other models because it defines an age-specific and sex-specific gene expression score. Validation is performed by AUC comparisons, HL test and additivity to DFM and other models. The validation procedures show significant AUC improvement when the score is added to other models (eg, 0.81 vs 0.65 when added to Morise score, with non-overlapping CIs35; 0.721 vs 0.663 when added to DFM, p=0.00333; not shown in the table). Testing the Corus CAD model on different data sets from an extension of the original validation population provides results very similar to the original ones.29

Finally, the Minimal Risk model upsets the usual point of view because it aims to directly identify patients with chest pain and normal coronary arteries. Unfortunately, the only other external validation published up to the date of our search57 cannot be considered here because it was based on a former version of Fordyce et al30 that included some computational errors.58

Discussion

External validation is an indispensable tool for investigating the generalisability of a PTP model to populations that differ from the development population study. This process can use different approaches, from the computation of indices to more complex procedures that aim at understanding how the original model should adapt to the new population. The papers included in this review mainly relied on AUC. The advantage of this index lies in being suitable both for individual evaluations and for rigorous comparisons. However, the AUC is a summary: only the whole ROC curve will allow evaluation of the clinical usefulness of a test by showing the true positive and false positive fractions that will be obtained for any eventually chosen cut-off.

Most of the papers included in this review did not provide a careful assessment of the discriminative performances of the validated model with respect to a well-defined threshold, but limited to compute sensitivity and specificity with respect to the thresholds suggested by either European or American guidelines. Studies on the CAD Consortium models and the Corus CAD model are exceptions. As far as the CAD Consortium models are concerned, clinical usefulness is assessed at cut-offs that vary from 5% to 20%. A cut-off of 14.75 (15 in subsequent works) was identified for the Corus CAD model in the main work,33 a value that corresponds to a disease likelihood of 20% on a validation data set (positivity for index ≤15). Notably, Corus CAD recently lost Medicare coverage in the USA.59 The very low AUC values obtained by Adamson et al47 at the cut-off of 15% in the comparison of the performance of major guidelines for the assessment of stable chest pain including risk-based strategies are representative of a general clinical protection approach leading clinicians to prefer a very high sensitivity, which of course implies low specificity.60 61

Despite the fact that all the models are obtained by regression techniques, which allow the interpretation of the effect of the predictor on the outcome of interest, very few papers31 34 41 43 address a complete validation procedure without rejecting a model after obtaining a poor preliminary performance on the new population by some test. Rather, a different model is developed, without any further in-depth analysis of the failure reason. Regardless of the quality of the new developed model, the lack of adequate consideration of in-depth validation procedures involves the loss of the information captured by the initial study and hinders a deep understanding of how effect size of relevant risk factors can change in a different geographical or setting framework.24 For instance, deep validation procedures like miscalibration analysis allow questioning the effect of chest pain typicality in different data sets.31 34 41 This finding is consistent with what was recently noted by Di Carli and Gupta62: angina remains a common presenting symptom in a high proportion of patients with cardiac condition who do not show obstructive lesions in their coronary angiograms.

The diagnostic question is central in the determination of which diagnostic pathway and test is the most appropriate62 63 and also affects statistical analysis. A carefully defined outcome should be required to provide a reliable basis for the evaluation of the effect of any predictive variable.64 When referring to validation specifically, the application of a statistical model to predict an outcome different from the originally intended one raises some concerns and, eventually, should be explicitly noted. In data-driven models, the outcome definition in the population study also influences predictor selection. Thus, a small AUC value in the validation set does not necessarily indicate a lower performance of the original model on the new population. Instead, it suggests that the model may not be appropriate for the context.57

Despite meta-regression not being able to statistically assess the portion of heterogeneity explained by differences in sample size, prevalence and choice of the anatomical reference test, differences between studies in terms of the way the outcomes are defined and measured contribute to the methodological heterogeneity we narratively highlighted in this review.65 66

The main strengths of this review were the large number and high quality of included studies, the attention paid to validation procedures, as well as to AUC values alone and the careful consideration of different aspects yielding heterogeneity, as well as statistical heterogeneity alone.

The study had limitations. Most studies mainly refer to Western populations with a minority of studies referring to Asian subjects (Japan, South Korea and China).48–50 52 56 67 Another limitation was that most of the studies did not investigate the use of any threshold. Pooled AUC values from meta-analyses can provide only an approximate summary of the discrimination capacity of most of the models, due to the low number of validating studies. This also affects the analysis of heterogeneity due to the low power of the test, and the feasibility of meta-regression.68 Although the focus of our meta-analysis was not a measure of an intervention effect, the meta-analysis was limited in the consideration of other possible sources of heterogeneity, mainly clinical like mean age or proportion of women. However, a multivariable analysis considering all the study-related variables together would have been unreliable, due to the low number of validations for most of the models.

Finally, in this review, we only considered pretest models developed in cohorts of patients referred for ICA or CCTA. Our choice was determined by main guidelines and traditional, well-established models. However, the need of models that are able to predict functionally significant CAD has been underlined,69 for prognostic purposes as well. Nevertheless, how these alternative models could be used in a risk-stratification approach to guide further patient–clinician decision-making has not been assessed yet.

Conclusions

Several agencies and scientific organisations emphasise the need for increasing the knowledge on how the prediction of the disease can be modified according to the risk factors present in any specific study population or, possibly, in any particular patient. This would indeed improve the precision of the estimated clinical likelihood of CAD. However, the increasing availability of large data sets and the highly improved computational power seem to have directed large part of recent researches towards model development rather than model validation.16 First of all, our review makes an important selection among the many developed models by mainly considering those externally validated. Then, it provides insights into the effects of traditional and emerging risk factors, biomarkers and comorbidities on the PTP of obstructive CAD. Finally, our findings lead to the following important recommendations. To achieve a more robust exploitation of PTP models in decision-making processes, significant endpoints should be more clearly stated and consistently measured both in the derivation and validation phases. In addition, more comprehensive validation analyses should be adopted to understand model weaknesses and variations. Finally, increased efforts are still needed to threshold validation and to analyse the effect of PTP on clinical management.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplemental information.

Ethics statements

Ethics approval

Our work of systematic review and meta-analyses of published literature does not require a research ethics approval.

Acknowledgments

The authors express their great appreciation to Dr Philip D Adamson for the valuable discussion on some methodological aspects. The authors would also like to thank Dr Tommaso Leo for the clarifications provided on some clinical aspects and Roberto Guarino (National Research Council of Italy, Institute of Clinical Physiology, Lecce) for his technical support (informatics tools for document management and title and abstract screening). Finally, the authors express their appreciation for the insightful comments of the two reviewers, Professor William Wijns and Dr Marc Dewey, that allowed the improvement of the manuscript.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • PM and AB contributed equally.

  • Contributors AB, CGL, PM, GP and SS provided substantial contribution to the conception of the work. CGL and PM performed the literature search and retrieved selected publications. All the authors (PM, AB, MRT, FV, SR, GP, CC, SS and CGL) contributed to the extraction and interpretation of data. AB carried out the meta-analysis and the meta-regression analysis. AB, CGL, PM and MRT assessed the quality of included studies. All the authors contributed to drafting the work. AB, CGL and PM revised it critically. All the authors approved the version to be published and are accountable for all aspects of the work. CGL is responsible for the overall content as the guarantor.

  • Funding Part of this work was supported by the European Union Horizon 2020 research and innovation programme under grant agreement no 689068—Project 'Simulation Modelling of coronary ARTery disease:a tool for clinical decision support (SMARTool)'.

  • Disclaimer This publication reflects only the authors' view and the commission, which has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript, is not responsible for any use that may be made of the information it contains. The funding source (European Commission) had no role in the study. All the authors are independent from funders, had full access to all of the data in the study, and can take responsibility for the integrity of the data and the accuracy of the data analysis.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.