Article Text

PDF

SALMANTICOR study. Rationale and design of a population-based study to identify structural heart disease abnormalities: a spatial and machine learning analysis
  1. Jose Ignacio Melero-Alegria1,
  2. Manuel Cascon1,
  3. Alfonso Romero2,
  4. Pedro Pablo Vara1,
  5. Manuel Barreiro-Perez1,
  6. Victor Vicente-Palacios1,
  7. Fernando Perez-Escanilla3,
  8. Jesus Hernandez-Hernandez1,
  9. Beatriz Garde1,
  10. Sara Cascon4,
  11. Ana Martin-Garcia1,
  12. Elena Diaz-Pelaez1,
  13. Jose Maria de Dios5,
  14. Aitor Uribarri1,
  15. Javier Jimenez-Candil1,
  16. Ignacio Cruz-Gonzalez1,
  17. Baltasara Blazquez6,
  18. Jose Manuel Hernandez6,
  19. Clara Sanchez-Pablo1,
  20. Inmaculada Santolino7,
  21. Maria Concepcion Ledesma8,
  22. Paz Muriel2,
  23. P Ignacio Dorado-Diaz1,
  24. Pedro L Sanchez1
  1. 1 Department of Cardiology, Hospital Universitario de Salamanca, Instituto de Investigación Biomédica de Salamanca (IBSAL), Facultad de Medicina, Universidad de Salamanca, and CIBERCV, Salamanca, Spain
  2. 2 Miguel Armijo Primary Care Centre, Salamanca, Spain
  3. 3 San Juan Primary Care Centre, Salamanca, Spain
  4. 4 Robleda Primary Care Center, Salamanca, Spain
  5. 5 Salamanca Primary Care Centre Management, Salamanca, Spain
  6. 6 Miranda del Castañar Primary Care Centre, Salamanca, Spain
  7. 7 Santa Marta Primary Care Centre, Salamanca, Spain
  8. 8 Peñaranda de Bracamonte Primary Care Centre, Salamanca, Spain
  1. Correspondence to Dr Pedro L Sanchez; pedrolsanchez{at}secardiologia.es

Abstract

Introduction This study aims to obtain data on the prevalence and incidence of structural heart disease in a population setting and, to analyse and present those data on the application of spatial and machine learning methods that, although known to geography and statistics, need to become used for healthcare research and for political commitment to obtain resources and support effective public health programme implementation.

Methods and analysis We will perform a cross-sectional survey of randomly selected residents of Salamanca (Spain). 2400 individuals stratified by age and sex and by place of residence (rural and urban) will be studied. The variables to analyse will be obtained from the clinical history, different surveys including social status, Mediterranean diet, functional capacity, ECG, echocardiogram, VASERA and biochemical as well as genetic analysis.

Ethics and dissemination The study has been approved by the ethical committee of the healthcare community. All study participants will sign an informed consent for participation in the study. The results of this study will allow the understanding of the relationship between the different influencing factors and their relative importance weights in the development of structural heart disease. For the first time, a detailed cardiovascular map showing the spatial distribution and a predictive machine learning system of different structural heart diseases and associated risk factors will be created and will be used as a regional policy to establish effective public health programmes to fight heart disease. At least 10 publications in the first-quartile scientific journals are planned.

Trial registration number NCT03429452.

  • structural heart disease
  • population
  • rural
  • urban
  • spatial analysis
  • machine learning

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

View Full Text

Statistics from Altmetric.com

Strengths and limitations of this study

  • To obtain data on the prevalence and incidence of structural heart disease in the setting of a population-based study enrolling a total of 2400 individuals, stratified by age, sex and by place of residence (rural and urban), in a Spanish community.

  • To create a population-based established control group providing availability of normative reference values quantification for echocardiographic, ECG, VASERA, biochemical and genetic parameters.

  • To show the spatial distribution of the different patterns of structural heart disease through the spectrum of age and sex and between urban and rural residences.

  • To develop a predictive model of structural heart disease using cardiovascular heterogeneous data (including images and machine learning techniques).

  • To establish the study as the global observatory on cardiovascular health research and development of the regional healthcare government to support effective public health programme implementation.

Introduction 

Each year heart diseases cause almost 4 million deaths in Europe and the USA, that is, one out of four deaths.1 2 Although the number of deaths from heart disease has decreased, the burden of heart disease is increasing. In 2015, more than 85 million people in Europe were living with cardiovascular disease.2 The increases in the prevalence of classical cardiovascular risk factors, dietary factors, physical activity and probably other social factors make the largest contribution to the risk of heart disease. Overall, cardiovascular disease healthcare costs in the European Union and the USA have increased rapidly over the last 10 years; currently surpassing €200 billion a year.2 3

In this sense, public health delivery planning requires reliable information about contemporary population-level disease prevalence and incidence. Furthermore, community healthcare systems should obtain and provide their own data before implementing any effective health programme as these regional systems are highly influenced by geographical diversity, the availability of resources and infrastructure, and the characteristics of healthcare systems and patterns of reimbursement.4 This is well illustrated by the attention of myocardial infarction where the exchange of accurate and timely information between the healthcare community, decision-makers and the public programme effects has been essential.5–8

Policies need to consider both standardised rates, which describe disease prevalence and incidence independently of changes in population, and absolute numbers of patients affected, which describe the impact of the disease on the population, political commitment, resources and services of interest.4 9 Limited data exist on estimation of heart disease prevalence in a population setting. Previous studies have frequently been based on selected cohorts, which may not represent the general population.10–13 Other studies have restricted case identification to those made in general practice consultations or hospital admissions.14–16 However, it is only by considering presentations across the whole spectrum of structural heart disease that the full burden of the disease can be captured and an accurate distinction can be made between the incident and prevalent cases. Thus, contemporary population-based studies of heart disease prevalence and incidence are needed to inform resource planning and research prioritisation but current evidence is scarce.

Spatial analysis is a great tool to investigate population behaviour, relations and consequently determine future action plans or policies. Spatial methods are varied, ranging from descriptive spatial analysis to complex interpolation algorithms. Gaussian process (GP) procedures, such as cokriging, have distinct advantages over conventional spatial prediction techniques.17 They allow researchers to include measured spatial variability in the geostatistical estimation process and they smooth predicted values based on the proportion of total sample variability accounted by random noise. Furthermore, GP helps mitigate the effect of variable sample density caused by hot spots (some zones are usually oversampled). Hence, geostatistical techniques are suitable methods to apply on population studies.

Furthermore, the volume of quantitative and imaging data, generated by population studies, will also be a key driver in the future for research and how we provide care. In this sense, machine learning (ML) to train algorithms to recognise cardiac damage on a better level, avoiding diagnostic errors and improving the early identification of the disease offers new approaches to leveraging the increasing volume of data available for analyses.18–21 Thus, we are convinced that ML can play a key role in population-based epidemiological studies when trying to recognise patients disease vulnerability earlier.

The objectives of this study are: to obtain data on the prevalence and incidence of structural heart disease in a population setting; to show the spatial distribution of the different patterns of structural heart disease through the spectrum of age and sex and between urban and rural; to develop a predictive model of structural heart disease using cardiovascular heterogeneous data (including images and ML techniques); to generate new hypotheses which might contribute to healthcare research and to political commitment to obtain resources and support effective public health programme implementation.

In this article, we describe the design, data and imaging acquisition, analysis methods and quality assurance metrics for the SALMANTICOR study.

Methods

Study design and participants

The SALMANTICOR study is a cross-sectional descriptive population-based study of the prevalence of structural heart disease and their risk factors that will enrol a total of 2400 individuals, stratified by age, sex and by place of residence (rural and urban), in a Spanish community: Salamanca. Structural heart disease refers to any of the following heart abnormalities including congenital heart disease, cardiomyopathies, valvular heart disease, ischaemic heart disease, pericardial diseases and rhythm or conduction disorders.

The province of Salamanca is located on the Western Spain, bordered in the West by Portugal. It has an area of 12.349 km2 and had a population of 342 857 people in 2014; 167 459 (49%) male and 175.398 (51%) female citizens. It is divided into 362 municipalities; more than half are villages with fewer than 300 people. In fact, 227 878 (67%) people live in 10 municipalities of more than 5000 individuals that will be considered for future analysis as urban areas and 114 581 (33%) people live in the rest of municipalities and consequently will be considered as rural areas.

Spain’s and consequently Salamanca’s healthcare system is public, guaranteeing universal coverage. In total, 98.7% of the population are insured for this public Spanish healthcare system. In Salamanca, a total of 35 primary health centres throughout the province provide healthcare services to the overall population: 18 to the urban-considered municipalities and 17 to the rural-considered municipalities (figure 1).

Figure 1

Province of Salamanca map and distribution of the total of 35 primary health centres: 18 in urban-considered municipalities (blue) and 17 in rural-considered municipalities (red). Municipalities of more than 5000 individuals are considered as urban areas in the SALMANTICOR study.

Individuals aged ≥18 years included in the lists of all primary healthcare facilities of the province of Salamanca represented the reference population of 295 975 subjects: mean age 52.9±19.8 years; 52.4% females; 61.3% residing in urban areas. A sample size of 2400 subjects is calculated based on an expected prevalence of structural heart disease of 6% with a CI of 95% and a 1% precision. In order to obtain the necessary sample size, 35% more requests for participation will be made, estimating errors of location from the healthcare database or refuses to participate in the study. Thus, 3564 people will be randomly selected from the primary care lists.

Cohort participants will undergo a basal examination visit, in these primary healthcare centres, between 2015 and 2018. Surviving participants are expected to return for a 5 and 10-year follow-up visit. Institutional review committee approval was obtained and all participants will provide informed consent. The SALMANTICOR study is designed to provide echocardiographic parameters characterising cardiac structure and function in all individuals. SALMANTICOR participants will undergo surveillance for cardiovascular events, including heart failure, incident coronary heart disease and all-cause mortality.

Medical investigation process

Medical history, surveys completion and examinations will be obtained at the subject’s primary care referral centre and will be analysed and interpreted centrally at the University Hospital of Salamanca. A complete medical history, physical examination and the surveys completion checkout will be performed by a cardiologist in a separate office, where examinations and blood sample extraction will be performed. Echocardiographic measures will be initially performed. Participant’s blood pressure and VASERA measures will be taken within 30 min after starting the echocardiographic examination and after the subject will be resting for 10 min. ECG will be performed after VASERA to finalise with the blood sample extraction.

Questionnaires

After obtaining written informed consent, trained interviewers will use a structured questionnaire to collect baseline data in face-to-face interviews at the time of physical examination. Self-reported diseases will be verified by individuals’ primary care doctors according to recognised international standards. The questionnaire will collect information on demographics and cardiovascular risk factors, cardiovascular and non-cardiovascular medical history, physical examination, medication, socioeconomic status, dietary habits as well as lifestyle and physical activity (table 1).

Table 1

Questionnaires

Echocardiographic assessment

A standardised echocardiography ultrasound examination, including M-mode, two dimensional (2D), spectral, colour flow and tissue Doppler, will be performed by a certified technical professional using Philips CX-50 scanner with a standard 2.5–3.5 MHz phased-array probe. Image acquisition will be performed using a preprogrammed acquisition protocol (table 2); following the American and European Society of Echocardiography recommendations.22–24 All studies will be acquired and stored digitally on a local picture archiving and communication system and transferred from field primary care centres to a secure server at the Salamanca University Hospital on the same day via a dedicated virtual private network connection. Development of the imaging and analysis protocol, field centre echocardiography manual of operations, reading centre manual of operations, field centre sonographer, training of sonographer occurred from July 2015 to October 2015, followed by the initiation of the SALMANTICOR visit in November 2015, which was continued until May 2018.

Table 2

Echocardiographic imaging protocol required views

For patients in sinus rhythm, >3 full cardiac cycles will be recorded for each view, with recording beginning once the view is optimised. For subjects in atrial fibrillation, >5 s acquisitions per view will be recorded. Sonographers are instructed to continuously optimise both imaging depth and sector width to maintain a frame rate of 50–80 frames per second. Sonographers are also instructed to adjust 2D gain and compression, when necessary, to optimally demonstrate left ventricle endocardial borders. The colour Doppler Nyquist limit will be set at 64 cm/s. Colour Doppler gain will be set just below the level at which random background noise will be seen. Sonographers will optimally align spectral Doppler parallel to the direction of the blood flow of interest. Sonographers will optimise the baseline shift and velocity range so that the spectral envelope will occupy approximately three-fourths of the display. All spectral Doppler acquisitions will be performed with a sweep speed between 75 and 100 cm/s, and a sample volume length of 3 mm for pulsed-wave Doppler. The tissue Doppler sample volume will be placed at the level of an annulus (mitral and tricuspid) and the baseline shift and velocity range will be optimised. All tissue Doppler acquisitions will be performed with similar acquisitions of spectral Doppler with a filter setting of 100 Hz.

Echocardiograms will be obtained at the subject’s primary care referral centre and sonographers will not perform any measurements on the images obtained because all measurements will be analysed and interpreted centrally at the University Hospital of Salamanca. All SALMANTICOR echocardiograms will be read by a certified cardiologist and over-read by a board-certified cardiologist with expertise in echocardiography variables assessment (table 3). Over-reads of echocardiograms will be performed to confirm the accuracy of key quantitative measurements and to identify clinically important findings. Inter and intrareader reproducibility was assessed before initiating the trial. For inter-reader reproducibility, intraclass correlation values ranged from 0.85 to 0.99 with left atrial volume and left ventricular end-diastolic volumes having the highest intraclass correlation values (0.97–0.99). Intraclass correlation values were slightly better from intrareader assessments for all measures.

Table 3

Echocardiographic parameters

Vascular function assessment

Cardio-Ankle Vascular Index (CAVI), brachial-ankle pulse wave velocity (baPWV) and Ankle-Brachial Index (ABI) will be estimated using the VaSera VS-1500 device (Fukuda Denshi) as described by our group.25 The baPWV will be calculated, as well as CAVI, which provides a more accurate estimation of the atherosclerosis degree. CAVI integrates cardiovascular elasticity derived from the aorta to the ankle pulse velocity through an oscillometric method; it is used as a good measure of vascular stiffness and does not depend on blood pressure.26 CAVI values will be automatically calculated by substituting the stiffness parameters in the following equation to detect the vascular elasticity and the baPWV; where p is the blood density, Ps and Pd are systolic blood pressure and diastolic blood pressure in mm Hg, respectively, and baPWV is measured between the aortic valve and ankle.

Embedded Image .

The average coefficient of the variation of CAVI is <5%, which is small enough for clinical use and confirms that CAVI has favourable reproducibility.27 28 CAVI and ABI will be measured in the resting position. baPWV is estimated using the following equation; where tba is the time, the same waves were transmitted to the ankle.

Embedded Image

For the study, the lowest ABI and the highest CAVI and baPWV obtained will be considered. CAVI is classified as normal (CAVI <8), borderline (8≤CAVI<9) and abnormal (CAVI ≥9). Abnormal CAVI represents subclinical atherosclerosis, and baPWV ≥17.5 is considered abnormal.29 30 ABI ≤0.9 is considered abnormal.

ECG examination

ECG  examination will be performed using a General Electric MAC 3500 ECG System (Niskayuna, New York, USA), which automatically measures wave voltage and duration. ECG will be performed by the same nurse trained to carefully standardised procedures for ECG acquisition. The standard 12-lead ECGs will be obtained at a paper speed of 25 mm/s, an amplitude of 10 mm/1 mV and a filter range 0.04–40 Hz from all patients. ECG tracing will be interpreted in a similar way to the echocardiographic protocol by an independent cardiologist and over-read by a board-certified cardiologist with expertise in ECG at the University Hospital of Salamanca. ECG measurements and interpretations will be done following standard methods31 32 (table 4).

Table 4

12-lead ECG parameters

Laboratory test

Venous blood sampling will be performed at the end of the examination after participants have fasted and abstained from smoking, consumption of alcohol and caffeinated beverages for 12 hours, following the protocol used in our hospital for other multidisciplinary projects.25 A total of 20 mL of venous blood will be drawn for research testing. Blood will be drawn as follows: ethylenediaminetetraacetic acid (EDTA) 10 mL and serum 10 mL. Aliquots of plasma (3×2 mL), serum (4×2 mL) and white cell pellet (3×2 mL) will be stored in freezers (−80°C) until the analysis. All biomaterial (serum, plasma and white blood cells) will be stored in the Instituto de Investigación Biomédica de Salamanca biobank. Referral for biobanking is carried out through a specific electronic database. Biochemical tests include N-terminal pro-brain natriuretic peptide (NT-proBNP), troponin, haemoglobin, blood cell count, thrombocytes, ferritin and iron, transferrin and iron saturation, potassium, sodium and creatinine, glycated haemoglobin, plasma glucose, aspartate aminotransferase, alanine aminotransferase, total cholesterol, triglycerides, high-density lipoprotein (HDL) and low-density lipoprotein (LDL), uric acid, high-sensitive C reactive protein, thyroid-stimulating hormone. Further, biomarkers indicative of different pathophysiological mechanisms relevant to heart disease will be analysed. A white cell pellet will be used for genotyping.

Results and outcomes

After the clinical history is performed and the echocardiogram and ECG are interpreted, a clinical report is sent to the patient and to the primary care medical doctor. Individuals needing a further evaluation will be sent to the cardiology department through a preference standardised protocol.

Individuals will be contacted at 5 years intervals to ascertain the clinical status and to repeat the described basal evaluations. Clinical outcomes will include cardiovascular mayor adverse cardiac events (MACE), commencing dialysis and first hospitalisation.

Statistical analysis

Casual and multivariate inference

Data input will be stored in a database designed for the project. Normal distribution of variables will be verified using the Kolmogorov-Smirnov test. Quantitative variables will be displayed as mean±SD if normally distributed or as the median (IQR) if asymmetrically distributed and qualitative variables will be expressed as frequencies. Analysis of the difference of means between variables of two categories will be carried out using a Student’s t-test or a Mann-Whitney U test, as appropriate, while qualitative variables will be analysed using a χ² test. To analyse the relationship between qualitative variables of more than two categories and quantitative variables, an analysis of variance and the least significant difference test will be used in the post hoc tests. The relationship of quantitative variables to each other will be tested using Pearson’s or Spearman’s correlation as appropriate. Analysis of covariance will be performed to adjust the variables that can affect the results as confounders. A multivariate analysis of variance will be performed in cases with more than one dependent variable to identify whether changes in the independent variables have significant effects on the dependent variables. The association between the variables studied will be performed by multiple linear regression. Data will be analysed using the SPSS V.23.0 statistical package (SPSS). A p<0.05 will be considered as statistically significant.

Spatial analysis

Additionally, this research aims having a spatial understanding of the structural heart disease abnormalities in the province of Salamanca. Such a demanding task will be carried out by applying different statistic procedures as multiple factor analysis (MFA) and cokriging.

MFA is an extension of principal component analysis (PCA) tailored to handle distinct variables (quantitative, categorical or frequency) and different data tables collected on the same observations.33 MFA is put into practice depending on the data tables and the variables types: in the case of quantitative variables a PCA is applied; multiple correspondence analysis (MCA) is applied in case of categorical variables34 and correspondence analysis (CA) for frequency variables.35 Cokriging is a multivariate geostatistical procedure used for interpolation purposes.36 This method is a generalisation of a multivariate linear-weighted regression model, where weights depend on distance, direction and orientation of the neighbouring data to the unsampled location.

In the SALMANTICOR study, we will further combine MFA and cokriging. In our case, we have two different levels of observations, participants and municipalities. As a mathematical comparison, municipalities contain participants, therefore, if we want to extend our investigation to a spatial analysis we need to use the resulting MFA projections on their corresponding municipality areas and then apply a cokriging analysis on the unsampled municipalities (figure 2) (online supplementary data). This combination will provide a spatial understanding of the Salamanca population and will cover the whole analysis, however, if we want to focus on a specific questionnaire, we could skip the MFA and look at the results obtained from the MCA, PCA or CA and then apply a cokriging analysis. In addition, if we require analysing a particular item from a questionnaire, we could also perform the analysis. To summarise, we have a versatile methodology that permits to study as concrete aspects as a wider analysis of our study.

Figure 2

The left panel represents the spatial analysis pipeline that SALMANTICOR will use for map plotting purposes. We will combine multiple factor analysis and cokriging. We will inquire and analyse participants from municipalities and questionnaires. Initially, for quantitative variables principal component analysis (PCA) is applied; for categorical variables, multiple correspondence analysis (MCA); and for frequency variables, correspondence analysis (CA). We will then assemble the normalised data in a single table that is analysed via PCA to describe the spatial behaviours of our samples within crossvariograms (crossvariog). We then will apply a linear model coregionalization (LMC) to finally interpolate the results over the different municipalities of the province of Salamanca using cokriging. Maps in the right panel represent municipal spatial patterns examples of how we will represent municipal (Salamanca is divided into 362 municipalities) distribution of structural heart disease and dyslipidaemia prevalence.

The R packages FactoMineR and Gstat will be used in order to apply MFA and cokriging, respectively.37 38 An additional code will be shared in a public Github repository.

Machine learning

The SALMANTICOR study will also be analysed following the ML pipeline represented in figure 3. Our first step will consist in the development of scalable methods for ML optimisation with the aim to develop a first approach to the predictive structural heart disease model. Our ML model will start from ingesting raw data, leveraging data processing techniques to wrangle, process and engineer meaningful features and attributes from this data (feature engineering). The derived features are attributes or properties shared by all the independent units on which analysis or prediction is to be done. In our case, clinical variables and variables quantified from imaging data will be chosen. Features will be combined with scalable ML algorithms, including deep learning process and automatic extraction of data functionalities, in order to develop the model (fit model). The model’s basic behaviour and functionalities will be tested to develop a robust and reliable model (training model). We will validate, train and improve the ML model in a trial an error process until satisfactory model performance (validation). The SALMANTICOR study sample will be randomly divided into a train dataset (70% of the sample) and a validation dataset (30% of the sample), following previous published ML models.39 We will use our train dataset to fit our ML model and the validation dataset to evaluate our results. This process will be repeated multiple times to guarantee a robust fit without overfitting. We will build our predictor models using: random forest, gradient boosting, logistic regression, K-nearest neighbours, support vector machine, linear discriminant analysis and naive Bayesian network models (online supplementary data). Our ML pipeline set-up will compare the performance of each algorithm on the dataset using a set of carefully selected evaluation criteria (ie, classification accuracy, logarithmic loss, confusion matrix, area under curve, F1 score, mean absolute error, mean squared error) and the categorisation of the specific cardiac problem.

Figure 3

Machine learning (ML) pipeline for the SALMANTICOR study. The learning algorithm will take heterogeneous data that will be preprocessed to create input data for the ML algorithm. Furthermore, raw images will also be used in the ML algorithm using neural network modelling. The output of the ML algorithm will also need to be processed and improved until a satisfactory model is developed.

For the realisation of these ML models, we will use free software (Python) and free open-source unified workbench such as Scikit-learn.40

Quality control

Different processes will be carried out to guarantee study data quality and thus maximise the validity and reliability of measurements of the results. To this effect, field work operation manuals have been prepared. These documents specify the adequate procedure for performing each test. All of these actions will confirm adequate performance of each procedure. Monthly meetings will be held with the principal investigator of the study to analyse the entire process, and an annual report on study progress will be prepared.

Ethical review board and dissemination plan

Participants will be required to sign an informed consent form prior to participation in the study, in accordance with the Declaration of Helsinki and WHO standards for observational studies. Participants will be informed of the objectives of the project and of the risks and benefits of the examinations made. None of the examinations poses life-threatening risks for the type of participants to be included in the study. The study includes the obtaining of biological samples (including genetics analysis); the study participants will, therefore, be informed in detail. The confidentiality of the recruited participants will be ensured at all times in accordance with the provisions of current legislation on personal data protection (15/1999 of December 13), and the conditions contemplated by Act 14/2007 on biomedical research.

We will use a variety of methods to ensure that our work will achieve maximum visibility. Publication of our study protocol provides an important first step towards this direction. In this paper, we have sought to offer a comprehensive overview of relevant literature, while underlining current research gaps that necessitated the design and implementation of the SALMANTICOR study. Similarly, the study results, given their applicability and implications for the general population, will be disseminated in research meetings and in at least 10 articles published in scientific journals. Finally, population-based control groups are difficult to obtain, specifically in case–control cardiovascular studies where structural heart disease has to be rolled out. The SALMANTICOR study will provide availability of normative reference values quantification for echocardiographic, ECG, biochemical, genetics, VASERA and other parameters. Thus, international cooperation sharing data and participating in Horizon 2020 programms with the SALMANTICOR population are contemplated.

Patient and public involvement

Patients’ representatives will have an increasingly present voice in the SALMANTICOR study. There is currently an only patient organisation for heart disease in the province of Salamanca, ‘El paciente experto’. This organisation has provided counselling in the design of the study, will jointly interpret the results of the study with the investigators of SALMANTICOR, will help to disseminate them to society and will be involved when establishing new policies for health improvement and education empowerment with the administration to halt the epidemic of cardiovascular disease.

A clinical report will be sent to all participants and their primary care medical doctors immediately after the clinical history is performed and the echocardiogram and ECG interpreted. Finally, the global and most important observations from the SALMANTICOR study will be sent by letter to all participants and to all doctors, primary care and specialists, of the province of Salamanca through the Medical College of Salamanca and our health Administration.

Data statement

Our data will be accessed at the Institute of Research of the University Hospital of Salamanca. Furthermore, our dataset will be published in a public repository. Additional code for our spatial analysis will be shared in a public Github repository.

Discussion

A major strength of the SALMANTICOR study is the selection of a representative population-based cohort across primary care, with a probable significant number of structural heart disease cases of each age, sex and place of residence category to allow overall and subpopulation analyses. This population-based approach increases the generalisability of the findings compared with surveys that addressed cardiovascular risk factors but have never included an echocardiographic assessment.11 14 41–44 Moreover, in view of the similarity of trends in cardiovascular disease and population ageing from Spain with other developed countries,45 our findings are likely to be broadly applicable to them.

Echocardiography in the SALMANTICOR study is designed to address three specific aims. The first one is to characterise the abnormalities of cardiac structure and function in a community-based sample and to assess how these abnormalities vary by place of residence (rural or urban), age and sex. The study uses standard and novel echocardiographic techniques to characterise five specific domains of cardiac structure. These data will be used to define the population distribution of these measurements and to determine their relationship with the cardiovascular risk factors, including hypertension, diabetes mellitus, coronary disease, renal insufficiency and prognostically relevant biomarkers such as NT-proBNP and high-sensitivity troponin.

The second aim is to investigate ventricular–arterial coupling in addition to the association of cardiac structure and function with arterial stiffness assessed by CAVI, baPWV and ABI.

The third aim is to prospectively examine the extent to which these non-invasive measures associate with incidences of adverse cardiovascular outcomes and to determine the degree to which these associations also vary by age, sex and by place of residence (rural or urban). By accomplishing these objectives, this study is developing an echocardiographic imaging database that will facilitate future investigations to compare these echocardiographic measures both with studies previously performed in other countries,12 13 and to be used as a very well-established control group. Furthermore, our study will provide availability of normative reference values quantification for ECG, biochemical, genetics, VASERA and other parameters.

Adequate public health and service delivery planning requires reliable information about contemporary population-level disease incidence. Salud Castilla y León (SACYL) is the regional healthcare government authority of Castilla y Leon providing universal access to health services for 2.5 million people. SACYL is closely integrated with other public services and policies as part of a holistic approach to improving population health. In this sense, our study data will be used to understand the cardiovascular health needs of our community and to improve people’s health and well-being, and how they can be developed. SALMANTICOR will be established as the global observatory on cardiovascular health research and development of SACYL, since we will include real-time data about the burden of cardiovascular disease, people’s social circumstances and living conditions, lifestyles and diet, economic factors, access to healthcare and other services, as well as our genes, age and sex. In addition to understating the overall picture of our population’s health, the data will be disaggregated to identify inequalities for example by gender, sex and urban or rural place of residence. This will support the prioritisation of interventions depending on the needs of different groups and will require effective actions for the prediction and prevention of cardiovascular disease; from macropolicies down to individuals and families, empowering people to take control of their health. In this sense, two new medical technology research lines have been identified by the SALMANTICOR investigators: exploring the use of spatial methods and exploring modern computational methods developed in the field of ML.

The use of spatial methods in healthcare research enables disease distribution patterns to be identified and has become popular in the field of public health.46–48 Cancer and other disease mortality atlases have shown us that many risk factors of a territorial nature, influence geographical patterns, making it possible to select disease indicators and so reveal their geographical structure.49 50 However, the number of spatial analyses published in major epidemiology journals is still very low.51 One of the reasons is that the application of spatial methods requires specific training and has resulted in their substitution with less optimal methods from healthcare research. Therefore, it is important to promote spatial methods, especially those which are simple to interpret in the field of population-based studies and which could be potentially used in combination with other computational methods to facilitate interpretation, prediction and healthcare policies. Cardiology spatial analysis has been developed mainly in optimisation problems and prevalence prediction. As an example of optimisation, travel time isochrones analysis has been deployed in different facilities in order to identify exposed areas and act accordingly.52 Nevertheless, prevalence predictions are the most common geostatistical techniques in healthcare and it is not an exception in cardiology.53 54

The incorporation of ML in medicine holds promise for substantially improved healthcare delivery.18–21 ML provides methods, techniques and tools that can help solving diagnostic and prognostic problems in a variety of cardiac medical domains.55–63 Furthermore, ML offers new approaches to leveraging the growing volume of heterogeneous data, including imaging data, available for analyses. To date, ML has been used in two broad and highly interconnected areas: automatisation of tasks that might otherwise be performed by a human and generation of clinically important knowledge. However, it is argued that the successful implementation of ML methods can help the integration of computer-based systems in the healthcare environment providing opportunities to really improve the efficiency of medical care and to be used as a regional policy to establish effective public health programmes. In this sense, the SALMANTICOR study represents an excellent opportunity to explore ML algorithms for estimating and ranking the impact of environmental and classical risk factors in the development of structural heart disease in a population-based setting.

Acknowledgments

We thank all primary care physicians and personnel helping with the development of the study. We thank Philips Iberica and Obra Social ‘La Caixa’ for their support. We especially thank participants in the study and apologise for any inconvenience we could have caused. We thank the involvement of the Salamanca patient organisation ‘El paciente experto’, for providing counselling to Salmanticor and for further promoting the dissemination of the results to the society and to the regional government.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
View Abstract

Footnotes

  • Patient consent for publication Not required.

  • Contributors JIM-A: data acquisition, surveys completion, physical, electrocardiographic and VASERA examinations, design of the work, drafting the work and revising it critically, final approval of the version to be published. MC: data acquisition, surveys completion, conception and design of the work, drafting the work and revising it critically, final approval of the version to be published. AR: conception and design of the work, interpretation of data, drafting the work of revising it critically, primary care coordination, final approval of the version to be published. PPV: echocardiographic data acquisition, interpretation of data, final approval of the version to be published. MB-P: conception and design of the echocardiographic protocol, analysis and interpretation of echocardiographic data, drafting the work and revising it critically for important intellectual content, final approval of the version to be published. VV-P: conception and design of the spatial and machine learning analysis, analysis and interpretation of data, drafting the work and revising it critically for important intellectual content, final approval of the version to be published. FP-E: conception and design of the work, interpretation of data, primary care coordination, final approval of the version to be published. JH-H: conception and design of the electrocardiographic protocol, analysis and interpretation of ECG data, drafting the work and revising it critically for important intellectual content, final approval of the version to be published. BG: conception and design of the lifestyle, Mediterranean and exercise surveys, analysis and interpretation of data, final approval of the version to be published. SC: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. AM-G: analysis and interpretation of echocardiographic data, final approval of the version to be published. ED-P: analysis and interpretation of echocardiographic data, final approval of the version to be published. JMdD: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. AU: conception and design of the work (surveys), analysis and interpretation of data, final approval of the version to be published. JJ-C: conception and design of the work, analysis and interpretation of ECG data, final approval of the version to be published. IC-G: conception and design of the work (surveys), analysis and interpretation of data, final approval of the version to be published. BB: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. JMH: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. CS-P: data acquisition, surveys completion, physical, electrocardiographic and VASERA examinations, final approval of the version to be published. IS: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. MCL: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. PM: conception and design of the work, coordinator of 5 out of 35 primary care centres, acquisition of data, final approval of the version to be published. PID-D: conception and design of the spatial and machine learning analysis, analysis and interpretation of data, drafting the work and revising it critically for important intellectual content, final approval of the version to be published. PLS: conception and design of the study, interpretation of data, drafting the work, agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

  • Funding This study was supported by national (PI14/00695, Institute of Health Carlos III, Spanish Ministry of Economy and Competitiveness) and community (GRS1030/A/14, SACYL, Junta Castilla y León) competitive grants and by the Spanish Cardiovascular Network (RIC and CIBERCV) from the Institute of Health Carlos III, Spanish Ministry of Economy and Competitiveness, Obra Social ‘la Caixa’ and Philips Ibérica Healthcare division.

  • Competing interests None declared.

  • Ethics approval The study has been approved by the clinical research ethics committee (CEIC) of the health area of Salamanca (‘CEIC of Salamanca Health Area, 29 September 2014).

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.