Objectives Predicting the presence or absence of coronary artery disease (CAD) is clinically important. Pretest probability (PTP) and CAD consortium clinical (CAD2) model and risk scores used in the guidelines are not sufficiently accurate as the only guidance for applying invasive testing or discharging a patient. Artificial intelligence without the need of additional non-invasive testing is not yet used in this context, as previous results of the model are promising, but available in high-risk population only. Still, validation in low-risk patients, which is clinically most relevant, is lacking.
Design Retrospective cohort study.
Setting Secondary outpatient clinic care in one Dutch academic hospital.
Participants We included 696 patients referred from primary care for further testing regarding the presence or absence of CAD. The results were compared with PTP and CAD2 using receiver operating characteristic (ROC) curves (area under the curve (AUC)). CAD was defined by a coronary stenosis >50% in at least one coronary vessel in invasive coronary or CT angiography, or having a coronary event within 6 months.
Outcome measures The first cohort validating the memetic pattern-based algorithm (MPA) model developed in two high-risk populations in a low-risk to intermediate-risk cohort to improve risk stratification for non-invasive diagnosis of the presence or absence of CAD.
Results The population contained 49% male, average age was 65.6±12.6 years. 16.2% had CAD. The AUCs of the MPA model, the PTP and the CAD2 were 0.87, 0.80, and 0.82, respectively. Applying the MPA model resulted in possible discharge of 67.7% of the patients with an acceptable CAD rate of 4.2%.
Conclusions In this low-risk to intermediate-risk population, the MPA model provides a good risk stratification of presence or absence of CAD with a better ROC compared with traditional risk scores. The results are promising but need prospective confirmation.
- Coronary heart disease
- Information technology
Data availability statement
Data are available upon reasonable request. All data relevant to the study are included in the article and all raw deidentified participant data are available on demand, approved by the ethical committee, with Professor Hans-Peter Brunner-La Rocca.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
STRENGTHS AND LIMITATIONS OF THIS STUDY
First validation study of an artificial intelligence (AI) model in a low-risk to intermediate-risk cohort without the use of any additional non-invasive or invasive tests.
The memetic pattern-based algorithm model uses many easily available variables like blood results, ECG parameters and history of the patient in an outpatient clinic setting.
This validation study, comparing an AI model with the pending risk scores, is performed in real-world data of patients in an outpatient setting.
Strength is the validation in a real-world data set of outpatient clinic patients, where risk stratification is most necessary.
Limitation is the absence of invasive coronary angiography in all patients because it was only performed in the cohort when clinically indicated in patients.
Coronary artery disease (CAD) is multifactorial, that is, influenced by multiple risk factors, environmental factors and genetic predisposition. Clinical presentation of CAD is diverse, making the diagnosis difficult. Predicting the likelihood of having CAD is important for clinical decision-making and to define the need for further testing.1 2 Incorporation of elements in addition to cardiovascular risk factors, like metabolic syndrome,3 plasma C reactive protein,4 coronary artery calcium, carotid intima-media thickness and ankle-brachial index, was partially successful in improving prediction of CAD.5 However, the true added value remains largely unclear.6 Moreover, some of them require additional diagnostic tools such as CT scan or high-resolution echography, which are costly, expose patients to radiation or are investigator dependent. These shortcomings resulted in a negative statement from the US Preventive Services Task Force in using these additional factors as screening tools for diagnosis of CAD.7
The guidelines in the USA recommend using the Diamond and Forrester (DF) score, while the European Society of Cardiology (ESC) guidelines use a pretest probability (PTP) score that is an extended score from the DF score,8 and in 2019 reupdated reviewing pooled analysis.9 Canadian and The National Institute for Health and Care Excellence (NICE) guidelines recommend using the Duke clinical score, estimating the probability of obstructive CAD defined as a coronary artery stenosis of >50% on coronary CT angiography (CCTA) or invasive coronary angiography (ICA).10 11 These scores, however, may result in an overestimate of the probability of CAD, especially in women, and accuracy is limited.8 12 An update from the American College of Cardiology/American Heart Association states using additional risk-enhancing factors to guide decisions, but beside a calcium score, it is not further specified in the present guidelines.13 The CAD consortium clinical (CAD2) model incorporates these risk factors to the risk score with a slightly better result compared with the PTP.12 14 However, this model needs caution when used in high-risk population.12 15 Recent analysis provided promising results for the use of multiple biomarkers in predicting the presence of CAD.16 However, the excellent accuracy of the model could not be confirmed in an independent cohort.17 In contrast, memetic pattern-based algorithm (MPA)-based artificial intelligence (AI) combines multiple methods rather than relying on a single statistical method. Based on general clinical parameters and routine laboratory results, it accurately predicted CAD in two independent high-risk cohorts and a simulated low-risk to intermediate-risk cohort.18 19 Validation in a real-life low-risk to intermediate-risk population, where a sensitive, non-invasive screening tool to exclude the presence of CAD is most important, is, however, lacking. Therefore, this study aims to apply and validate the BASEL/LURIC MPA18 19 model in a low-risk to intermediate-risk population referred by general practitioners (GPs) for cardiology evaluation of CAD.
The cardiovascular clinic (CVC) cohort included 4344 patients, aged >18 years old, who were referred by GPs to the cardiology outpatient clinic at Maastricht University Medical Centre between April 2006 and November 2011.
The clinical charts of 903 patients missed important information concerning history, physical examination, diagnostic investigations or outcome. Of the remaining 3441 patients, 853 patients could not be classified regarding the presence or absence of CAD (see below); and in additional 789 patients, no blood sampling was performed. Of the remaining 1799 patients, 700 patients were randomly selected, based on the power calculation. In four patients, blood tests were not accurate (haemolysis), leaving 696 patients for this analysis.
Patient and public involvement
Patients were not involved.
Sample size calculation
Power calculation was performed prior to analysis. Area under the curve (AUC) estimation was done with DeLong et al method.20 For AUC=0.87 and prevalence=16% to reach 95% CI +−0.05 (from 0.82 to 0.92), the required number of observations is 531.
Sensitivity and specificity test: for true positive ratio (TPR)=80% and false positive ratio (FPR)=20% and prevalence=16% to reach 95% CI of +−8% for TPR and +−4% for FPR, the required number of observations is 601.
Because of the retrospective nature of the study, it was uncertain how many samples might provide inaccurate results. To be on the safe side, it was decided to analyse blood samples from 700 patients.
Patient evaluation: diagnosis of CAD
The patients of the CVC cohort underwent the same evaluation as performed in the BASEL and LURIC cohort,18 19 comprising clinical evaluation, including patient history and physical examination, 12-lead ECG and laboratory tests. Most patients underwent a stress-ECG and an echocardiography, which was decided based on the referral letter by the GP. It was left to the cardiologist who screened these letters to decide as to whether these examinations were performed. Based on the clinical assessment of the cardiologist, further examinations were performed. These included CCTA, nuclear imaging and ICA; that is, not all patients underwent ICA. Because of the lower risk of CAD, performing invasive testing in all patients would have been ethically inacceptable. When an ICA was performed, standard Judkins technique was used. The presence of >50% stenosis in at least one coronary vessel by visual interpretation was classified as significant CAD. In patients who did not undergo ICA, assessment of CCTA was used (at least one >50% stenosis) unless artefacts made interpretation impossible. In patients who did not undergo CCTA or where it was not interpretable, diagnosis of CAD was made if patients had a coronary event related to CAD within 6 months after the initial evaluation. Exclusion of CAD required either the absence of a significant stenosis in any vessel in ICA or CCTA or a follow-up without any cardiac event for at least 3 years after initial testing if no ICA or CCTA was performed or results were not interpretable. Patients who could not be classified based on these criteria (ie, cardiac event between 6 months and 3 years or no sufficient follow-up) were excluded from the analysis.
Application of the MPA
The model was originally developed on the basis of the BASEL Study18 and then optimised and further validated using the LURIC Study.18 19 The model is a multilayer non-linear complex classifier derived from evolutionary learning optimisation process by applying, combining and finding optimal parameterisation of multiple methods from the field of pattern recognition and machine learning as described before19 and summarised below. More details regarding the development of the MPA model are added in the online supplemental files.
The following variables were included. Clinical: age, sex, weight, height, presence and type of chest pain, diabetes, nicotine use, pathological Q-waves (at ECG), systolic and diastolic blood pressure and relevant medication like statin use; laboratory results: mean corpuscular haemoglobin concentration (MCHC), white blood cells, urea, uric acid, troponin, glucose, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, alanine aminotransferase (ALAT), alkaline phosphatase, amylase, total protein, albumin and bilirubin. Three of these input variables (pathological Q-waves, white blood cells and MCHC) were not available in the data collected in the CVC cohort and the missing values were replaced with a constant value within the normal range. Missing values in body mass index (n=205), systolic and diastolic blood pressure (n=73) variables were replaced by the median value within corresponding age group and sex. Missing values regarding smoking (n=140) were replaced with 0 (non-smoker).
As described in detail previously,18 19 the MPA model is a highly automated data-driven process, deriving the optimal solution for the particular classification problem on the basis of available data. The risk score discriminates either the presence or absence of CAD. The following statistics describe the quality of validation of the model:
Discriminating quality of the risk score (receiver operating characteristic (ROC) curve, area under the ROC curve).
Quality of the model diagnostic decisions (ie, sensitivity, specificity, positive predictive value, negative predictive value (NPV)) in the three lowest classes.
Discriminating and diagnostic quality of risk classes (risk at class, relative size of class).
The model quality was compared with results of clinically used methods including the updated PTP and the CAD2 score. ROC curves are used for comparing the discriminating power of the scores. For diagnostic decision comparison, the thresholds defined by the guidelines were applied to the PTP and CAD2 score.21 22 The risk classes were defined as follows:
PTP <5% (no further testing)=class 1
PTP from 5% to 15% (may consider non-invasive testing)=class 2
PTP from 15% to 50% (CCTA for exclusion of CAD)=class 3
PTP from 50% to 85% (non-invasive ischaemia testing)=class 4
PTP from 85% to 100% (direct ICA)=class 5
For even better prediction of the CAD risk, we calibrated the MPA model thresholds on the basis of the new ESC guidelines21 and a priori information from the LURIC paper19 without using any information from this study. The full details of risk class thresholds applied to the MPA model, PTP and CAD2 scores are shown in table 1.
Disease consortium clinical
Data are shown as number and percentages, mean (±SD) or median (IQR), as appropriate. Group comparisons were done using χ², Student’s t-test or Mann-Whitney U test as appropriate. A p value of <0.05 was considered to be statistically significant. Statistical analysis was done with the use of SPSS V.25.0.
Baseline characteristics of the CVC population are shown in table 2. The average age was 65.6±12.6 years and approximately half were women. Thirty-seven per cent of the patients were referred to the outpatient clinic because of chest pain, 20% palpitations, 11% dyspnoea, 4% fatigue, 1% oedema. Sixty-nine per cent of the patients had primary or secondary complaints of chest pain, one-third complained about dyspnoea. Twenty-eight per cent of the patients complained about palpitations, 12% of the patients complained about vertigo and 3% suffered from oedema. Cardiac risk factors were relatively common.
The prevalence of CAD was 16.2% (113 out of 696 patients). In 116 patients (17%), ICA was performed, of which 32 patients (28%) had no CAD. In classes 1 based on the MPA model, 10 patients underwent an ICA, none of which had CAD. In 217 patients (31%), CCTA was performed, of which 172 patients (79%) had no CAD. In class 5 as defined by ESC guideline threshold, 15 patients underwent a CCTA, of which 12 (80%) had CAD.
Results of the distribution of patients in the risk classes of the three models and the effective rate of CAD are shown in table 3. The MPA model performed significantly better in terms of AUC than the PTP and the CAD2 scores (figure 1 (p<0.0005)); the prevalence of CAD ranged from 4,2% in class 1 and increased to 76.3% in class 5. There was a high NPV (95.8%), as shown in table 4. Importantly, this class represent almost 70% of the patients.
In the very low-risk group, 4.2% had CAD, using the MPA model, and no CAD applying the PTP or the CAD2 score, but the proportion in this group was more than twice as large using MPA or CAD2 compared with PTP. In classes 2–3 (low-medium risk), compared with PTP and CAD2, the MPA group was up to five and four times smaller, with a slightly higher CAD presence. In class 4 (high risk), the percentage of the total population for MPA was comparable with the PTP and half of CAD2. Prevalence of CAD was similar using PTP and CAD2 and more than 20% higher in the PTP score. The CAD2 score is 1.7% in class 5 and PTP zero patients, while in the MPA model, it was up to 8.5% of the population.
Comparing tables 1 and 3, there are differences between actual disease prevalence and the a priori risk estimate in the higher classes. The MPA model and CAD2 score overestimate the risk in class 4. The PTP score is in line with the expected prevalence of CAD, but provides 0% of patients in the very high class and only 7% in the very low risk. The PTP score and CAD2 score have high percentages of intermediate risk (classes 2, 3 and 4), respectively, 93% and 82.3%, compared with 23.9% in the MPA model.
The present study shows that the AI-based MPA model is able to predict the presence of CAD with good accuracy in a truly low-risk to intermediate-risk population where accurate screening is most relevant. By combining easily available clinical parameters and generally available blood tests, it achieved an accuracy comparable with the currently best non-invasive tests, but likely lower costs, no potential risk and applicability in primary care. This may make the model attractive for clinical implementation. Table 5 summarises the advantages of the MPA model.
The model was developed in a high-risk population (BASEL)18 and further extended, optimised and validated in another high-risk population (LURIC).19 The latter study also included a simulated low-risk population, showing promising results. However, such simulation is only useful for hypothesis generation. Validation in an independent, real low-risk to intermediate-risk population was required and nicely shown by the results of this study.
A strength of this study is that it included a population that is relevant in daily practice, as non-invasive testing is most useful in patients with low to intermediate risk. The prevalence of CAD of the CVC cohort is in line with recently published pooled data of three large-scale studies,9 21 23 24 and much lower than the cohorts in which the model was developed (66% and 68%, respectively).18 19 This may be clinically relevant as the cohorts were completely independent, confirming the accuracy of the MPA model, which may make it suitable for screening in such a low-risk to intermediate-risk population.
The MPA model uses only easily available clinical and laboratory variables. Together with the accurate results with a comparable AUC in all three independent cohorts, it may generate a risk prediction model that is attractive to use in clinical practice, comparing favourably with the traditional risk scores, as a much larger percentage of the population can be reassured without further testing. It may be argued if pooling of the first three classes is adequate. Safety of such an approach needs to be prospectively tested to investigate the clinical consequences of using MPA in screening processes of CAD detection. Still, given the good prognosis of mild stable CAD, it is very likely that such an approach is safe. Only a reduced number of patients would be advised for additional non-invasive testing, resulting in less false positive findings. This may prevent the overuse of invasive testing and risk of causing harm to patients due to complications.
There may be an economic advantage using the MPA model, providing high accuracy at lower costs. In addition, beneficial effect can be expected to reduce the expected shortage in healthcare resources due to the ageing patient population with increasing risk of CAD. Patients with identified need for further diagnostic testing, either non-invasive or invasive, would have a shorter waiting time performing a test because a substantial number of patients would not need such testing. Therefore, using the MPA model in daily practice has the potential to much better target diagnostics and as consequence therapies.25
The ESC guidelines from 2013 recommended not performing any additional testing in patients with PTP of <15%.24 This value was pragmatically based on high risk of false positive results and, as a consequence, performing no test may result in fewer incorrect findings.24 In clinical practice, however, much lower thresholds are usually applied which is in line with the overall CAD prevalence of 16% in our population referred by GPs for further evaluation by cardiologists. It is questionable if a 15% risk of having CAD is really acceptable in clinical practice. Unfortunately, proper evidence is lacking concerning best cut-off values for excluding the diagnosis of CAD. The 2019 ESC guidelines on chronic coronary syndromes changed the cut-off value to <5% for exclusion for further testing,21 which is more in line with the NICE and US guidelines, using <10% and <5%.11 22
A cut-off value of <5% would basically exclude almost all patients if the 2013 PTP risk score of the ESC was used24 as the 2013 PTP overestimated the prevalence of CAD significantly,21 which is also the case in our CVC cohort. The updated PTP score provides more realistic estimates, at least in the lower range.8 9 12 22 Compared with the PTP score, the CAD2 appears to improve the prediction of low-risk patients with <15% probability of CAD.12 Overestimating the risk in the low-risk population can explain the high cut-off value of the low-risk group in the 2013 ESC guideline, which is corrected in the new guideline, whereas the US guidelines were already stricter in this regard.21 22 24
The ESC guidelines advise non-invasive testing in a PTP between 5% and 85%,21 of which CCTA is recommended in low-risk to intermediate-risk patients because of its high NPV and non-invasive perfusion imaging for ischaemia for the higher intermediate-risk group.21 In those with a probability of >85%, direct ICA is recommended, but such a probability cannot be achieved by PTP. Thresholds for ICA after non-invasive testing are not provided.21 The in a low risk developed CAD2 score is clearly performing less in high-risk populations.15 26 The US and NICE guidelines advise direct ICA at a risk of >70% and >61%, respectively.11 22 In addition, the NICE guidelines overestimate the presence of CAD by 18.4% compared with the 2013 ESC guidelines27 that would result in even more patients directly referred for diagnostic ICA. In clinical practice, patients with relatively high risk often get invasive testing despite normal non-invasive testing, because the presence of obstructive CAD cannot be ruled out, especially if an exclusion cut-off value <5% is applied.21 Therefore, the best cut-off value for direct ICA needs to be determined and should probably be lower than 85%, and more in line with the US guidelines, that is, 70%.22
The ESC guidelines lack clear recommendation about assessing the clinical risk of CAD. The PTP is based on sex, age and the nature of symptoms only. To provide a more accurate estimation of risk, the guidelines recommend additional information such as a resting ECG, cardiovascular risk factors, cardiac dysfunction and/or the calcium score. However, they do not provide concrete recommendations in this regard and no estimation of risk based on the combination of these factors is provided to determine the overall clinical likelihood of obstructive CAD.21 The lack of a clear and easy-to-use risk classification may result in heterogeneous interpretation of these recommendations and unnecessary overuse of (non-invasive) diagnostic testing, in contrast to the MPA model. Not mentioned specific in the pending guidelines, CAD2 score incorporates these risk factors.12 15 21 With additional adjustment according to the ESC guideline thresholds,21 the MPA model could be further improved to exclude CAD in a large part of the patients and identify approximately 9%, where direct ICA would be reasonable. This would make non-invasive testing in less than one-quarter necessary, which is significantly less compared with the PTP and CAD2, respectively, over 90% and 80% in need of additional non-invasive testing. Obviously, the accuracy and clinical use of such an ESC-adjusted MPA model need to be proven in additional cohorts, and prospective testing is required.
Most AI and deep learning models are not compared against each other and most have not been validated in more than one independent cohort. Previously published models do not perform superior, when only readily available data is used, as in the MPA model, without inclusion of noninvasive or invasive tests.28 When CT coronary or calcium score is included in other models, the AUC is in comparison with our results.28 29 A recent paper compared six AI algorithms applying them in a openly available data set predicting CAD with similar to even slightly better accuracy as compared with our study, which is promising. However, these models used non-invasive testing like fluoroscopy, thallium heart scan or even angiography, which is an important limitation for implementation in clinical practice as compared with our model.30 This strengthens the importance of the results of our study in order to enhance better risk stratification with the use of the MPA model.
Limitations are summarised in table 5. A limitation of our study is the absence of ICA in a substantial part of the population. However, we applied clearly defined rules for the presence or absence of CAD with extended follow-up, allowing to separate patients into the two groups with high likelihood of accurate diagnosis. Therefore, the chance of relevant bias regarding the diagnosis of CAD is low and clinically not meaningful.
Another limitation is the definition of CAD, using coronary artery stenosis of >50% as a threshold. On the one hand, coronary wall irregularities may be also relevant in longer term. Next, diagnoses like coronary microvascular dysfunction or myocardial ischaemia with non-obstructive CAD were not considered during the follow-up as such events are not related to CAD with significant stenosis in major coronary arteries. On the other hand, it has been shown that not all stenoses of >50% are haemodynamically relevant.31 Still, the extended follow-up of the CVC population takes clinical events also into consideration.
Our results also highlight that the estimated risk depend on the average risk of a population. Thus, the MPA resulted in higher prevalence of CAD in high-risk populations that were referred for ICA,18 19 most of them priorly underwent non-invasive testing. Whereas the prevalence in groups 1 and 2 was the same, it was lower in the other three groups. This may be of clinical consequence and indicates the need of investigating novel tests in the most appropriate population and of prospective validation. Nevertheless, the identical overall test accuracy (AUC) independent of the population risk is very reassuring and in contrast to other models using biomarkers17 or the CAD2 model, which is more reliable in low-risk cohorts compared to high-risk populations, where its use needs caution.12 15 The MPA model has not been validated in a prospective trial. Although the results of three studies using the MPA model showed uniform results without knowing the presence of CAD in advance in different risk populations, prospective study is mandatory for the MPA model to become a part of routine practice for evaluation of patients with suspected CAD. Still, non-invasive tests have generally not been prospectively investigated regarding clinical impact but only to predict the presence or absence of CAD as done in this study. An exception is the evaluation of CCTA that provided evidence on its use as a diagnostic and prognostic tool in CAD.32
In this low-risk to intermediate-risk cohort referred for cardiac evaluation, the MPA model is a useful tool in the evaluation for CAD, superior to the generally used risk scores. The fact that no imaging is needed makes it easily applicable in the outpatient setting, even in primary care. It has an excellent NPV to safely exclude the presence of CAD in up to two-thirds of the population, precluding the need for further testing. Despite these optimistic results, prospective evaluation of its implementation is mandatory to prove the impact of the MPA model on clinical decision-making.
Data availability statement
Data are available upon reasonable request. All data relevant to the study are included in the article and all raw deidentified participant data are available on demand, approved by the ethical committee, with Professor Hans-Peter Brunner-La Rocca.
Patient consent for publication
This study involves human participants and complies with the Declaration of Helsinki. It was approved by the Maastricht UMC+ Ethics Committee (ethics ID/approval number METC 08-04-011). Participants gave informed consent to participate in the study before taking part.
Contributors CE made substantial contributions to the conception of the study and drafted the manuscript and interpretation of data. AT analysed the data and model. SS-vW, PR, VV, SB, SJRM and MF contributed to interpretation of data. CO, PR, MJZ and H-PB-LR made substantial contributions to the conception of the study and interpretation of data. H-PB-LR supervised the process. All authors have revised the manuscript critically for intellectual content and approved the final manuscript. H-PB-LR is responsible for the overall content as guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests PR is part owner, board member and head of Exploris, which is a privately owned Swiss research company focusing on the development of novel diagnostic solutions. CO is head of the business development team of Exploris. AT is head of the modelling and development team of Exploris. VV is a data analyst in Exploris. MF is head for regulatory affairs of Exploris. MJZ is an advisory board member of Exploris.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.