Article Text

Protocol
Artificial intelligence-based mining of electronic health record data to accelerate the digital transformation of the national cardiovascular ecosystem: design protocol of the CardioMining study
  1. Athanasios Samaras1,
  2. Alexandra Bekiaridou1,2,
  3. Andreas S Papazoglou1,
  4. Dimitrios V Moysidis1,
  5. Grigorios Tsoumakas3,
  6. Panagiotis Bamidis4,
  7. Grigorios Tsigkas5,
  8. George Lazaros6,
  9. George Kassimis1,7,
  10. Nikolaos Fragakis7,
  11. Vassilios Vassilikos8,
  12. Ioannis Zarifis9,
  13. Dimitrios N Tziakas10,
  14. Konstantinos Tsioufis6,
  15. Periklis Davlouros5,
  16. George Giannakoulas1
  17. CardioMining Study Group
    1. 11st Department of Cardiology, University General Hospital of Thessaloniki AHEPA, Thessaloniki, Greece
    2. 2Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, New York, New York, USA
    3. 3School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
    4. 4Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
    5. 5Department of Cardiology, University Hospital of Patras, Rio Patras, Greece
    6. 61st Cardiology Department, "Hippokration" General Hospital, University of Athens Medical School, Athens, Greece
    7. 72nd Cardiology Department, Hippokrateion General Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
    8. 83rd Cardiology Department, Hippokrateion General Hospital, Aristotle University of Thessaloniki, Thessaloniki, Greece
    9. 9Department of Cardiology, "George Papanikolaou" General Hospital, Thessaloniki, Greece
    10. 10Department of Cardiology, Democritus University of Thrace, University Hospital of Alexandroupolis, Alexandroupolis, Greece
    1. Correspondence to Dr George Giannakoulas; ggiannakoulas{at}auth.gr

    Abstract

    Introduction Mining of electronic health record (EHRs) data is increasingly being implemented all over the world but mainly focuses on structured data. The capabilities of artificial intelligence (AI) could reverse the underusage of unstructured EHR data and enhance the quality of medical research and clinical care. This study aims to develop an AI-based model to transform unstructured EHR data into an organised, interpretable dataset and form a national dataset of cardiac patients.

    Methods and analysis CardioMining is a retrospective, multicentre study based on large, longitudinal data obtained from unstructured EHRs of the largest tertiary hospitals in Greece. Demographics, hospital administrative data, medical history, medications, laboratory examinations, imaging reports, therapeutic interventions, in-hospital management and postdischarge instructions will be collected, coupled with structured prognostic data from the National Institute of Health. The target number of included patients is 100 000. Natural language processing techniques will facilitate data mining from the unstructured EHRs. The accuracy of the automated model will be compared with the manual data extraction by study investigators. Machine learning tools will provide data analytics. CardioMining aims to cultivate the digital transformation of the national cardiovascular system and fill the gap in medical recording and big data analysis using validated AI techniques.

    Ethics and dissemination This study will be conducted in keeping with the International Conference on Harmonisation Good Clinical Practice guidelines, the Declaration of Helsinki, the Data Protection Code of the European Data Protection Authority and the European General Data Protection Regulation. The Research Ethics Committee of the Aristotle University of Thessaloniki and Scientific and Ethics Council of the AHEPA University Hospital have approved this study. Study findings will be disseminated through peer-reviewed medical journals and international conferences. International collaborations with other cardiovascular registries will be attempted.

    Trial registration number NCT05176769.

    • CARDIOLOGY
    • Heart failure
    • Ischaemic heart disease
    • Health informatics
    • Risk management
    http://creativecommons.org/licenses/by-nc/4.0/

    This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

    Statistics from Altmetric.com

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    STRENGTHS AND LIMITATIONS OF THIS STUDY

    • Discharge letters and prognostic data for 100 000 cardiac patients will be collected.

    • Natural language processing techniques will facilitate automated clinical data mining from unstructured electronic health records.

    • The accuracy of the automated model will be compared with the manual data extraction by study investigators.

    • Machine learning tools will provide data analytics.

    • Generalising natural language processing models across languages still remains challenging.

    Introduction

    The combination of medicine with computer science and artificial intelligence (AI) techniques, including machine learning (ML) and natural language processing (NLP), is promising and is going to rapidly change the future of medicine in the upcoming years.1–3 NLP aims to interpret human language and quantify aspects of medical practice that were previously amenable only to laborious and costly work.4 ML focuses on the interpretation of data and has been used in different settings in medicine, including the automated interpretation of ECGs, image classification and risk stratification.3 5 Mainly at a research level, these methods are already offering novel clinical practice approaches and could have an impact on a plethora of cardiac diseases, including heart failure and coronary artery disease.6 Nevertheless, real-world clinical implementation remains a challenge and this lack of impact on everyday clinical practice stands in stark contrast to the enormous progress in research.

    With the growing number of patients and their concentration in large tertiary centres, it becomes attractive to collect large amounts of clinical data systematically. Such registries are essential for exploring the characteristics of different comorbidities and understanding real-world cardiac patients. However, with the unprecedented amount of data, manual collection and traditional processing methods become a challenge, as it is time-consuming and costly for healthcare systems.7 Other significant difficulties one faces are the unstructured free-form text of the electronic health records (EHRs) and the need for deidentification and safety of the vast amount of patient data. AI methods could fill this void in medical records, enabling the ability to analyse large amounts of information efficiently.8

    The use of ΑΙ automated processes constitutes a novelty in big data configuration, offering a quick, reliable and fully deidentified data extraction for further processing.9 10 The results from its efficient use can be easily extended to different healthcare systems, amplifying the produced knowledge and improving diagnostic and therapeutic accuracy, transforming the current clinical care practice.11 Transferring AI methods from the laboratory to everyday clinical practice is a difficult task that necessitates a high level of specialisation, financial resources and cross-disciplinary collaboration among academia, industry and clinical institutions.12

    This study aims to contribute to the development of a clinically useful and feasible AI model for accurate automated extraction and processing of large volumes of raw and unstructured clinical data from EHRs. The information acquired from automated procedures will form the largest national database of cardiac patients, derived from unstructured data. Ultimately, this study aspires to encourage the digital transformation of the national cardiovascular ecosystem, by improving clinical documentation towards an automated, rapid recording and utilisation of clinical data.

    Methods and analysis

    CardioMining study design

    CardioMining is the first nationwide study involving AI for the automated data extraction from EHRs of patients discharged from Greece’s largest Cardiology Departments of tertiary hospitals. This ongoing, retrospective, multicentre, observational cohort study aims to use novel NLP and ML techniques to efficiently extract and process large volumes of unstructured clinical data from electronic clinical narratives, forming the most extensive national dataset.

    The target cohort size is 100 000 consecutively enrolled adult patients hospitalised for any reason across all participating study sites. The study is active since January 2022 and is expected to be completed by January 2025. Inclusion and exclusion criteria are presented in box 1.

    Box 1

    Inclusion and exclusion criteria

    Inclusion criteria

    • Patients discharged from cardiology departments of tertiary hospitals in Greece.

    • Patients whose medical records are electronically stored in each hospital’s electronic information systems.

    Exclusion criteria

    • Patients who died during hospitalisation, and thus no discharge letter was issued.

    The resulting database will be a springboard for clinically valuable conclusions, such as outcome prediction, risk stratification and clinical decision support systems. Our protocol has been developed according to the Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence extension (online supplemental appendix).13

    Details of data

    Electronically registered medical records of patients discharged from Cardiology wards of tertiary hospitals in Greece from the day that each hospital developed EHRs until 2022 will be retrospectively collected from hospital discharge letters. Each discharge letter includes demographics, discharge diagnoses, medications, diagnostic examinations, therapeutic interventions, in-hospital management and postdischarge instructions in the Greek language and in unstructured form (box 2). Baseline clinical data are neither coded nor in structured form, apart from the laboratory exams which are structured in a predefined format but will, nevertheless, also require development of automated algorithms to be integrated in the final dataset. Patients who died during hospitalisation were excluded from the study, since no electronically registered discharge letter is issued for these patients in Greek hospitals. Hence, information for these patients is not available, apart from a handwritten administrative document that displays the reason of death as an International Classification of Diseases 10th Revision(ICD-10) code.

    Box 2

    Extracted data for the discharged patients

    1. Patient characteristics

    • Demographics.

    • History—comorbidities.

    • Clinical presentation.

    2. Discharge diagnosis

    • Coronary artery disease.

    • Acute coronary syndrome

      • ST-elevation myocardial infarction.

      • Non-ST-elevation myocardial infarction.

      • Unstable angina.

    • Heart failure.

    • Sinus tachycardia.

    • Sinus bradycardia.

    • Supraventricular arrhythmia.

    • Ventricular arrhythmia.

    • Bradyarrhythmias.

    • Cardiac arrest.

    • Cardiogenic shock.

    • Device implantation or malfunction.

    • Endocarditis.

    • Myocarditis.

    • Pericarditis.

    • Pericardial effusion.

    • Congenital heart disease.

    • Presyncope/syncope.

    • Valvular heart disease.

    • Cardiomyopathy.

    • Acute pulmonary oedema.

    • Pulmonary embolism.

    • Arterial hypertension.

    • Amyloidosis.

    3. Electrocardiography on admission and on discharge.

    4. Chest X-ray.

    5. Echocardiographic reports (quantitative and qualitative data).

    6. Catheterisation lab reports.

    7. Laboratory examination (serial measurements).

    8. In-hospital clinical course and medical management.

    9. Discharge medication.

    Data deidentification

    All information that could potentially be used to identify a person, such as names, postal codes, places of residence and occupation, will be deleted from these electronic files before data extraction. Thus, the data will not be able to be assigned to a specific subject, as no additional information or identifiers will be collected for the subjects. After the files are deidentified, each patient’s clinical note will be linked with a specific key (‘identifier’). The electronic file that contains the correlation of the ‘identifier’ with the patient’s clinical note will be stored in a secure hospital electronic location. Data will be centrally stored in a structured electronic database and only accessible by study staff. Strict subject confidentiality will be maintained through subject identification codes.

    Data and safety monitoring

    At multiple time points, a data and safety monitoring board consisting of study investigators and an independent statistician will review accumulating data for quality and safety and report back to the study’s steering committee. A simple data use agreement and verification that the researcher has undergone human subjects training will be required to limit access to the clinical database to authorised medical researchers.

    AI-model development for automated extraction of data from EHRs

    A sample of the fully deidentified files will undergo manual extraction (figure 1). It will serve as a data set for training and evaluating NLP techniques to extract cardiology entities from the records automatically. Of these records, 70% will be used for training, 15% for validation and 15% for testing the developed models. As a baseline, we will use a dictionary-based method containing various forms of the entities we aim to extract. We will employ two main approaches for automating the data extraction process, one operating at the level of the whole record and one operating at the finer-grain level of each entity mention. For the first one, we will investigate the state-of-the-art neural architecture of transformers, as well as more classical linear models and support vector machines, based on the bag-of-words representation of the records, treating the entities as labels in a multilabel classification task. For the second one, we will use the baseline to automatically tag particular words and phrases corresponding to entity mentions or alternatively employ manual annotation. Then we will use a transformer architecture to perform sequence tagging, that is, outputting the particular tokens in the record that correspond to each recognised entity, apart from the recognised entity. A variation of this second approach is first to use a Named Entity Recognition model for generic cardiology entities, followed by an entity normalisation model that links the entity mentions to the particular entity.14 In all cases, we will exploit information concerning the structure of the record into meaningful sections (figure 2).

    Figure 1

    Data extraction method. Data from deidentified electronic health records will undergo both manual and automated extraction. The results of the automated extraction using NLP techniques will be validated on the manually organised dataset using accuracy metrics of the NLP model. The obtained knowledge from the manual dataset will help with the development of an accurate trained AI-model for automated data extraction. EHR, electronic health record; ML, machine learning; NLP, natural language processing.

    Figure 2

    Utilisation text mining techniques to extract knowledge from unstructured clinical notes. Keyword recognition will provide the baseline information for treating the entities as labels in a multilabel classification task. Text mining techniques will allow the development of a structured database for further processing through machine learning models. PAF, paroxysmal atrial fibrillation; T2DM, Type 2 diabetes mellitus; aVF, augmented Vector Foot; LVEF, left ventricular ejection fraction; TAPSE, Tricuspid annular plane systolic excursion; RCA, right coronary artery; TTS, transdermal therapeutic system.

    Assuming our final model manages to achieve high accuracy on the test set, we will apply it to the complete set of records that have not been manually processed to extract all Cardiology entities from them automatically. This structured information from the discharge letters will be integrated with laboratory measurements and imaging data using a multimodal deep learning method to improve risk stratification and prognosis estimation for patients with different cardiac diseases and treatment recommendations. Following the development of the digital research infrastructure, a pilot prospective study in the participating cardiology departments will be performed to demonstrate the feasibility and accuracy of the AI-algorithm for automated extraction and processing of unstructured clinical data from EHRs. A successful validation of this tool will enable its rapid implementation in the clinical practice.

    Study endpoints

    The primary and secondary endpoints of this study are summarised in box 3.

    Box 3

    Primary and secondary endpoints

    Primary endpoint

    • The accuracy of an artificial intelligence-based model to automatically extract clinical data from patients’ medical records for further processing and analysis compared with traditional human intervention-based data extraction methods.

    Secondary endpoints

    • All-cause mortality.

    • Thromboembolic events.

    • Number and cause of rehospitalisations.

    • Development of major cardiovascular diseases (eg, heart failure, coronary artery disease, diabetes mellitus).

    • Postdischarge modifications in medical therapy.

    • Prescription and use of guideline-recommended drugs in various cardiovascular diseases.

    The primary endpoint is the measurement of the ‘test error’; the model’s accuracy to automatically extract clinical data from patients’ medical records for further processing and analysis compared with traditional human intervention-based data extraction methods. The obtained baseline clinical data will be merged with the study’s secondary endpoints that will be explored on the resulting dataset over the follow-up period of each patient. The endpoints include all-cause mortality, thromboembolic events, number and cause of rehospitalisation, development of new-onset cardiovascular diseases and postdischarge modifications in medication. Secondary endpoints will be either provided in a structured form or extracted from electronic healthcare systems using the aforementioned text mining capabilities of the deployed digital research infrastructure. These endpoints will be integrated in ML models to enable postprocessing data analytics, such as risk stratification for each clinical condition, phenotyping and patient clustering. Hence, the purpose of secondary endpoints is not to test the accuracy of the AI model but rather to provide clinical implications in the digital research infrastructure.

    In most healthcare systems, a large volume of clinical data is stored in electronic hospital systems which function as data repositories with no functionalities in terms of data analysis. All these stored data cannot be automatically reshaped in a structured format for analytical purposes and, therefore, require laborious and time-consuming manual extraction by humans. Thus, clinical data are underused and neglected, which results in lack of epidemiological data, research opportunities and loss of valuable clinical information. Optimal utilisation of the increasing volume of clinical data from unstructured clinical notes is a major unmet need in healthcare systems. Hence, the conceptual architecture of our study protocol (figure 3) includes the development of an AI-model that enables automated data extraction from unstructured EHRs, which will contribute to the rapid development of a structured cardiovascular database to facilitate further data processing and provide useful data analytics (prevalence of diseases, risk stratification, early diagnosis, clinical decision support systems, minimisation of human error).

    Figure 3

    The roadmap of the CardioMining study towards digital transformation of the national cardiovascular ecosystem. The digital transformation of a healthcare system at a national level is a great challenge but also a complex and difficult task. The low digital maturity of the health sector in Greece coupled with the ongoing rapid technological changes worldwide demand urgent action through the implementation of a paradigm shift. The CardioMining study will retrospectively collect unstructured data derived from electronic health records of cardiology departments. Data extraction will be performed both manually by humans and automatically using natural language processing algorithms. The validated artificial intelligence models will contribute to the development of a structured registry of cardiac patients to provide data analytics through machine learning models. CVD, cardiovascular disease.

    Follow-up

    The follow-up period in this retrospective study will be between the initial hospital discharge and the end of the follow-up, either the date of outcome occurrence or the current date (in outcome-free patients). Updated information for this study concerning secondary endpoints will be obtained from central databases and prescription registers, managed by the Hellenic Ministry of Health services.

    Statistical analysis

    Given the size of the participating clinics and the years during which the recording of EHRs in electronic form was applied, it is estimated that the sample of patient records will be about 100 000. Continuous variables will be tested for normality with the Kolmogorov-Smirnov test and presented as a mean SD or medians, with comparisons between groups made using the Wilcoxon rank-sum test. Categorical variables will be expressed as frequencies (%), with comparisons made using the Pearson’s χ² test. Outcome analysis will be performed using ML algorithms, such as regression, decision trees, random forests, support vector machines, extreme gradient boosting. Statistical analysis will be performed using SPSS V.27 (SPSS), Stata V.15.1 (StataCorp) and R packages.

    Patient and public involvement

    There has been no patient involvement in the design, or conduct, or reporting, or dissemination plans of our research.

    Ethics and dissemination

    All participating sites will obtain approval from appropriate independent ethics committees or institutional review boards prior to the initiation of the study. The Research Ethics Committee of the Aristotle University of Thessaloniki and Scientific and Ethics Council of the AHEPA University Hospital have approved this study. This study will be conducted in keeping with the International Conference on Harmonisation Good Clinical Practice guidelines, the Declaration of Helsinki, the Data Protection Code of the European Data Protection Authority and the European General Data Protection Regulation or otherwise that may replace it. To maintain the patient’s confidentiality, no demographic and personal identification data will be collected (eg, first name, date of birth). Data deidentification will be performed by the lead researcher. Only deidentified data will be obtained, which will be completely disconnected from the personal data of each patient. Study findings will be published in peer-reviewed medical journals and presented at international conferences. Collaborations with study groups sharing the same research focus will be attempted.

    Ethics statements

    Patient consent for publication

    Acknowledgments

    We would like to acknowledge the following investigators: Data collection: Petrina Miltiadous, Eleni Aintinidou, Anastasia Chatzisavvidou, Giannis Papamichail, Giannis Giannou, Konstantinos Konstantinidis, Andromachi Keskinidou, Konstantinos Rizos, Christos Petridis, Isidoros Karamanidis, Nikolaos Mantzou, Christina Moutafi, Zeta Kousourna, Theocharis Spiritinoudis, Mavridou Christina, Ioannis Botis, Eirini Lazari, Evaggelia Mermigka, Christina-Angeliki Papadaki, Ntalaoutis Andreas, Androniki Papadopoulou, Ioannis Leventis, Stylianos Zervakis, Anthi Pleuritaki, Eirini Savva, Spiridon Kassotakis, Vaso Papacosta, Themistoklis Pateromichelakis, Panagiota Serafeim, Eleni Sidiropoulou, Despoina Ntiloudi, Nikolaos Papakonstantinou. Data analysis: Evangelos Logaras, Dimitris Dimitriadis, Parmenion Charistos.

    References

    Supplementary materials

    • Supplementary Data

      This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Footnotes

    • Twitter @ampekiaridou

    • Collaborators CardioMining Study Group AHEPA University Hospital of Thessaloniki (Greece): Athanasios Feidakis, Anastasios Kartas, Vasiliki Patsiou, Eirinaios Tsiartas, Antonios Orfanidis, Triantafyllia Grantza, Chrysanthi Ioanna Lampropoulou, Dimitrios Kostakakis, Olga Kazarli, Maria Ioannou, Maria Eirini Kiriakideli, Melina Kyriakou, Dimitra Kontopyrgou, Martha Zergioti, Eleftherios Gemousakakis, Amalia Baroutidou, Alexios Vagianos, Alexandros Liatsos, Konstantinos Barmpagiannos, George Tyrikos, George Konstantinou, Anthi Vasilopoulou, Marina Spaho, Eleni Manthou, Panagiotis Zymaris, Eleni Baliafa, Maria Baloka, Iasonas Dermitzakis, Vasiliki Anagnostopoulou, Chrysi Solovou, Anna Maria Louka, Aliki Iliadou, Ioanna Filimidou, Aspasia Kyriafini, Odysseas Kamzolas, Ioannis Vouloagkas, Despoina Nteli, Nikolaos Outountzidis, Athanasia Vathi, Anastasia Foka, Michael Botis, Anastasia Christodoulou, George Vogiatzis, Eleni Vrana, Maria Nteli, Stefanos Antοniadis, Foteini Charisi, Mairifylli Vamvaka, Dimitrios Triantis, Efi Delilampou, Vaggelis Axarloglou, Georgios Charistos, George Anagnostou, Sofia Christodoulou, Anastasios Papanastasiou, Eleni Tziona, Nikolaos Batis, Katerina Gakidi, Artemis Iosifidou, Andreanna Moura, Christos Alexandropoulos, Theoni Exintaveloni, Asterios Karakoutas, Damianos Porfyropoulos, Michail Bountas, Athanasios Pachoumis, Eleftherios Markidis, Maria Sitmalidou, Athanasia Pappa, Konstantinos C Theodoropoulos, George Rampidis, Apostolos Tzikas, Stylianos Paraskevaidis, Georgios Efthimiadis, Antonios Ziakas. University Hospital of Patras (Greece): Theofilatos Athinagoras, Christoforos Travlos, Nikolaos Vythoulkas-Biotis, Kassiani Maria Nastouli, Nikolaos Kartas, Angeliki Vakka, Theoni Theodoropoulou, Maria Bozika, Virginia Anagnostopoulou, Georgios Tsioulos. Hippokration General Hospital of Athens (Greece): Emilia Lazarou, Panagiotis Tsioufis, Ioannis Kachrimanidis, Nick Argyriou. University Hospital of Heraklion (Greece): Emmanouil Kampanieris, Alexandros Patrianakos, George Kochiadakis. Alexandra General Hospital of Athens (Greece): Ioannis Kanakakis. University Hospital of Alexandroupolis (Greece): Marios Vasileios Koutroulos, Georgios K Chalikias. Hippokrateion University Hospital of Thessaloniki (Greece): Sophia Alexiou, Athena Nasoufidou, Panagiotis Stachteas, Constantinos Bakogiannis. University Hospital of Larissa (Greece): Tsantikos Christos, Grigorios Giamouzis, John Skoularigis. Evangelismos General Hospital of Athens (Greece): Ioannis Alexanian. Papageorgiou General Hospital of Thessaloniki (Greece): Dimitrios Farmakis, Ioannis Styliadis. Hellenic Red Cross Hospital of Athens (Greece): George Fotos, Nikolaos Bourboulis. Tzaneio General Hospital of Piraeus (Greece): Despoina Ntiloudi, Evangelos Pisimisis. Hygeia Hospitals Group (Greece) & European Heart Agency, European Society of Cardiology (Belgium): Panos E Vardas. Aristotle University of Thessaloniki (Greece): Antonis Billis, Ilias Kyparissidis. University of Western Macedonia (Greece): Dimitrios Tsalikakis. Beuth University of Applied Sciences (Berlin, Germany): Jens-Michael Papaioannou, Alexander Löser.

    • Contributors Substantial contributions to the conception or design of the work (AS, GG); or the acquisition, analysis or interpretation of data for the work (AS, AB, ASP, DVM, GTso, PB, GTsi, GL, GK, NF, VV, IZ, DNT, KT, PD, GG, CardioMining Study Group); and Drafting the work (AS, AB) or revising it critically for important intellectual content (ASP, DVM, GTso, PB, GTsi, GL, GK, NF, VV, IZ, DNT, KT, PD, GG, CardioMining Study Group); AND Final approval of the version to be published (AS, AB, ASP, DVM, GTso, PB, GTsi, GL, GK, NF, VV, IZ, DNT, KT, PD, GG, CardioMining Study Group); and Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved (AS, AB, ASP, DVM, GTso, PB, GTsi, GL, GK, NF, VV, IZ, DNT, KT, PD, GG, CardioMining Study Group).

    • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

    • Competing interests None declared.

    • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

    • Provenance and peer review Not commissioned; externally peer reviewed.

    • © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

    • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.