Article Text

Download PDFPDF

Catalonia Suicide Risk Code Epidemiology (CSRC-Epi) study: protocol for a population-representative nested case–control study of suicide attempts in Catalonia, Spain
  1. Philippe Mortier1,2,
  2. Gemma Vilagut1,2,
  3. Beatriz Puértolas Gracia1,2,
  4. Ana De Inés Trujillo1,3,
  5. Itxaso Alayo Bueno1,2,
  6. Laura Ballester Coma1,2,4,
  7. María Jesús Blasco Cubedo1,2,5,
  8. Narcís Cardoner6,7,8,9,
  9. Cristina Colls10,
  10. Matilde Elices8,11,
  11. Anna Garcia-Altes2,10,12,
  12. Manel Gené Badia13,
  13. Javier Gómez Sánchez1,
  14. Mario Martín Sánchez14,
  15. Rosa Morros15,16,17,
  16. Bibiana Prat Pubill18,
  17. Ping Qin19,
  18. Lars Mehlum19,
  19. Ronald C Kessler20,
  20. Diego Palao6,7,8,9,
  21. Víctor Pérez Sola7,8,11,21,
  22. Jordi Alonso1,2,5
  23. On behalf of the CODIRISC Epidemiology Study Group
  1. 1Health Services Research Group, IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
  2. 2CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
  3. 3Department of Social Psychology, Autonomous University of Barcelona (UAB), Cerdanyola del Vallès, Barcelona, Spain
  4. 4Department of Psychology, University of Girona (UdG), Girona, Spain
  5. 5Department of Health & Experimental Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
  6. 6Depression and Anxiety Program, Department of Mental Health, Parc Taulí Sabadell, Hospital Universitari, Sabadell, Spain
  7. 7Department of Psychiatry and Legal Medicine, Universitat Autònoma de Barcelona (UAB), Cerdanyola Del Vallès, Barcelona, Spain
  8. 8Centro de Investigación en Red de Salud Mental, CIBERSAM, Madrid, Spain
  9. 9Institut d'Investigació i Innovació Parc Taulí (I3PT), Sabadell, Barcelona, Spain
  10. 10Agència de Qualitat i Avaluació Sanitàries de Catalunya - Health Evaluation and Quality Agency of Catalonia (AQuAS), Catalan Health Department, Barcelona, Spain
  11. 11Neurosciences Research Programme, IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
  12. 12Institut d’Investigació Biomèdica (IIB Sant Pau), Barcelona, Spain
  13. 13Legal Medicine Unit, Faculty of Medicine, University of Barcelona, Barcelona, Spain
  14. 14Preventive Medicine and Public Health Training Unit PSMar-UPF-ASPB, Parc de Salut Mar, Agència de Salut Pública de Barcelona, Pompeu Fabra University, Barcelona, Spain
  15. 15Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
  16. 16Departament de Farmacologia, de Terapèutica i de Toxicologia, Universitat Autònoma de Barcelona, Barcelona, Spain
  17. 17Institut Català de la Salut (ICS), Metropolitana Nord, Barcelona, Spain
  18. 18Master Plan on Mental Health and Addictions, Ministry of Health, Catalan Government, Barcelona, Spain
  19. 19National Centre for Suicide Research and Prevention, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
  20. 20Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
  21. 21Institut de Neuropsiquiatria i Addiccions, Hospital del Mar, Barcelona, Spain
  1. Correspondence to Dr Philippe Mortier; pmortier{at}


Introduction Suicide attempts represent an important public health burden. Centralised electronic health record (EHR) systems have high potential to provide suicide attempt surveillance, to inform public health action aimed at reducing risk for suicide attempt in the population, and to provide data-driven clinical decision support for suicide risk assessment across healthcare settings. To exploit this potential, we designed the Catalonia Suicide Risk Code Epidemiology (CSRC-Epi) study. Using centralised EHR data from the entire public healthcare system of Catalonia, Spain, the CSRC-Epi study aims to estimate reliable suicide attempt incidence rates, identify suicide attempt risk factors and develop validated suicide attempt risk prediction tools.

Methods and analysis The CSRC-Epi study is registry-based study, specifically, a two-stage exposure-enriched nested case–control study of suicide attempts during the period 2014–2019 in Catalonia, Spain. The primary study outcome consists of first and repeat attempts during the observation period. Cases will come from a case register linked to a suicide attempt surveillance programme, which offers in-depth psychiatric evaluations to all Catalan residents who present to clinical care with any suspected risk for suicide. Predictor variables will come from centralised EHR systems representing all relevant healthcare settings. The study’s sampling frame will be constructed using population-representative administrative lists of Catalan residents. Inverse probability weights will restore representativeness of the original population. Analysis will include the calculation of age-standardised and sex-standardised suicide attempt incidence rates. Logistic regression will identify suicide attempt risk factors on the individual level (ie, relative risk) and the population level (ie, population attributable risk proportions). Machine learning techniques will be used to develop suicide attempt risk prediction tools.

Ethics and dissemination This protocol is approved by the Parc de Salut Mar Clinical Research Ethics Committee (2017/7431/I). Dissemination will include peer-reviewed scientific publications, scientific reports for hospital and government authorities, and updated clinical guidelines.

Trial registration number NCT04235127.

  • suicide & self-harm
  • epidemiology
  • mental health
  • psychiatry
  • public health
  • statistics & research methods

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Suicide attempt cases (estimated n~6000) in the Catalonia Suicide Risk Code case registry are identified through in-depth psychiatric evaluation, which allows to carefully differentiate between suicidal and non-suicidal self-injurious behaviour.

  • A wide range of predictor variables will be included, taken from centralised electronic health record data representing five clinical settings, that is, emergency care, primary care, outpatient mental healthcare, and general and psychiatric hospitalisations.

  • A two-stage exposure-enriched nested case–control study combined with the use of inverse probability weights will enable efficient and population-representative estimations.

  • An unknown proportion of suicide attempt cases do not contact healthcare services, and are therefore not included in this study.

  • Limited information on history of suicide attempt before the study observation period will be available.


Suicide attempts constitute a major public health issue worldwide, despite the fact that prevention strategies have shown to be effective in reducing attempt rates.1 Population-based surveys estimate the lifetime prevalence of suicide attempts among adults at 2.7% (range 0.5%–5.0%),2 while a recent meta-analysis among children and adolescents found a pooled estimate of 6.0% (range 0.5%–34.1%) for suicide attempts in early life.3 Suicide attempts are related to subsequent suicide,4 which has a worldwide mortality rate estimated at 11.6 per 100 000 person-years, representing an annual loss of 34.6 million years of life.5 Apart from death by suicide, suicide attempts are also markers for subsequent persistent physical and mental health issues, repeat suicide attempt, psychiatric hospitalisations, impaired academic performance, unemployment, partner abuse victimisation and perpetration, having children removed by social services, loneliness, relationship difficulties, impaired social functioning and low life satisfaction.6–11

Despite the considerable burden that suicide attempts represent in our society, there is a lack of reliable surveillance data on suicide attempts that could inform public health action.12 This is in contrast with actual suicide rates, that are increasingly monitored in many countries worldwide.13 WHO advocates the use of centralised electronic health record (EHR) systems to develop national suicide attempt surveillance.14 However, currently used disease classification systems in EHR systems (eg, the International Classification of Diseases15) do not allow to distinguish between suicidal and non-suicidal intent of self-injurious behaviour.16 In addition, due to the often difficult ascertainment of self-injurious and suicidal intent, misclassification with regard to suicidal outcomes often occurs.17 Offering an in-depth psychiatric evaluation to each individual who presents to clinical care with suspected suicide risk, followed by a standardised registration of this clinical evaluation in a centralised EHR register, may therefore substantially improve the accuracy of public health surveillance of suicide attempts.

Apart from surveillance, centralised EHR systems have high potential to be used in epidemiological studies on suicidal behaviour in the population.18–20 Indeed, it has been estimated that up to 92% of individuals that eventually die of suicide have some type of healthcare contact in the year prior to death,21 with rates ranging from 54% to 80% for primary care contacts,21 22 31% to 66% for mental healthcare contacts,21 22 24% to 60% for emergency department visits21 23–25 and 21% for psychiatric hospitalisations.21 Clinical data collected through centralised EHR systems could therefore provide new insight in the population distribution of suicide attempt risk factors, outline the different healthcare trajectories preceding an attempt and provide estimates of potential reductions in suicide attempt cases when designing prevention interventions. The need for nested case–control studies using EHR data to investigate suicidal behaviour has been recently highlighted.26 Developments in statistical methods, including the use of inverse probability weighting in case–control studies27 as well as two-stage (exposure-enriched) case–control designs,28 now allow to study rare events such as suicidal behaviour in an efficient and population-representative way.

From a clinical viewpoint, the use of centralised EHR data has high potential to provide data-driven clinical decision support when evaluating risk for future suicide attempts.29 Such support is highly needed, as suicide attempts constitute a complex behavioural outcome, determined by highly multifactorial population processes, interdependencies and multilevel causality.30 Clinicians are prone to heuristic-based decision making,31 that is, a rapid problem-solving approach, including subconscious cognitive shortcuts, linking a limited set of risk factors directly to suicide attempt potential. This leads to failure to detect real suicide potential, poor patient experience and ineffective clinical decision making. In recent years, a number of studies started implementing advanced analytical techniques on EHR data—including machine learning techniques—to model the complex additive and interactive effects between large numbers of predictor variables, and to improve the classification accuracy of data-driven suicide attempt risk prediction tools.32 When routinely implemented at the healthcare system level, such data-driven decision support tools could guide the adequate allocation of clinical resources, such as in-depth suicide risk assessments and tailored treatment interventions in multistage screening approaches.33 34

Here, we present the protocol for the Catalonia Suicide Risk Code Epidemiology (CSRC-Epi) study, a large epidemiological study of suicide attempts occurring during the period 2014–2019 in Catalonia, Spain. The primary study outcome is suicide attempt among Catalan residents during the period 2014–2019. Secondary analyses will focus on actual suicide among those with previous suicide attempts. The CSRC study combines a nested case–control sampling design with the use of inverse probability weighting to enable the efficient analysis of a large amount of centralised EHR data. A unique aspect of the study is that data for the suicide attempt cases come from an especially designed suicide attempt surveillance protocol that stipulates that every Catalan resident presenting to clinical care with any suspected risk for suicide receives an in-depth psychiatric evaluation. These evaluations will allow us to differentiate carefully between suicidal and non-suicidal self-injurious behaviour in the study outcomes.

Study objectives

The CSRC-Epi study main objectives are:

  • To provide reliable incidence measures for suicide attempts occurring in Catalonia during the period 2014–2019.

  • To identify risk factor constellations for suicide attempts, both on the individual level (ie, the extent to which risk factors increase the risk for subsequent suicide attempt in an individual) and on the population level (ie, the proportion of the total cases of suicide attempt that are potentially attributable to risk factors).

  • To develop suicide attempt risk prediction tools using machine learning techniques, and test their predictive accuracy.

Methods and analysis

Data sources

All data for this study will be obtained from the Health Evaluation and Quality Agency of Catalonia (AQuAS35), a public entity attached to the Catalan Health Department. As from 2017, AQuAS manages the Public Data Analysis for Health Research and Innovation Programme (PADRIS35) to provide researchers with access to large amounts of centralised EHR data, and to foster innovative health research.

Administrative data

A first source of data for the CSRC-Epi study will consist of six population-representative administrative lists of Catalan residents, one for each year in the 2014–2019 period. These lists constitute annual censuses of individuals with access to public healthcare, which, by law, includes every Catalan resident. These lists include sociodemographic variables (ie, sex, age, nationality, a range of socioeconomic indicators and healthcare catchment region) as well as associated small area geocode data (ie, data available on the healthcare catchment region level),33 the dates of immigration and emigration in/out of Catalonia, the date of death, as well as a range of healthcare summary variables that are constructed to monitor healthcare needs in the Catalan population (ie, 12-month depression, 12-month complex mental disease and the number of 12-month healthcare contacts for each healthcare setting). These lists will be used to construct a sampling frame when conducting the nested case–control sampling, as further explained below.

CSRC case register

In 2014, the Catalan Health Department and the Catalan Health Service structurally implemented the CSRC surveillance programme36 in the Catalan Public Healthcare system. The CSRC surveillance programme is a specifically designed suicide attempt surveillance protocol that stipulates that every Catalan resident presenting with any suspected risk for suicide in any public healthcare setting receives a face-to-face in-depth psychiatric evaluation at the nearest emergency department. This assessment includes differentiating suicide attempts from non-suicidal self-injurious behaviour, or from adverse mental health states without self-injurious intention. Each individual deemed at high risk for (repeat) suicide attempt is subsequently eligible for two brief follow-up interventions (ie, a mental healthcare visit within 10 days (within 72 hours when aged 17 or less), and a phone call after 30 days) to increase access to adequate mental healthcare use. Clinical data of all individuals that received a specialised assessment are registered centrally in the CSRC case register.

The CSRC case register subsequently includes all individuals with a suicide attempt during the observation period 2014–2019, including the exact date of event. For each event included in the SRC case register, a range of predictor variables for future suicidal behaviour are assessed: the Suicidal Scale of the Mini-International Neuropsychiatric Interview (MINI) 5.0.0,37 38 the type and lethality of the suicide attempt that warranted evaluation, presence and type of mental disorder, hopelessness, impulsivity, aggressiveness, altered state of conscience, use or dependence of alcohol, use or dependence of illicit drugs, serious somatic disease (including chronic diseases, chronic pain and disabilities), living status, presence of family or social support, social problems, stressful life events, access to lethal means and family history of suicide. These variables will be used as predictor variables in analyses predicting repetition of suicide attempt and suicide after a previous attempt.

EHR data

A third source of data will consist of centralised EHR registers, one for each of five clinical healthcare settings, that is, emergency care, primary care, outpatient mental healthcare and general and psychiatric hospitalisations. These registers include a wide range of relevant predictor variables for suicidal behaviour, that is, history of self-injurious behaviours; all types of somatic conditions; neurodevelopmental, mental, behavioural, personality and substance use disorders; all types of medical procedures performed; and detailed information on the number and type of healthcare contacts. Diagnoses and procedures in the EHR data are coded using the International Classification of Diseases-9th revision-Clinical Modification (ICD-9-CM) and ICD-10-CM disease classification system. The year of inception of the different registers is 2012 for emergency care and primary care and 2008 for the other registers.

Pharmaceutical register

A fourth source of data will consist of a register containing all prescription drugs that have been delivered by officially recognised pharmacies, including the date of delivery. Note that this excludes prescribed medication that was not collected at the pharmacy. This register will provide an additional range of predictor variables for suicidal behaviour, that is, all prescriptions for psychopharmacological products, as well as prescriptions for a wide range of medication used to treat relevant somatic conditions or known to have psychotropic effects.

Mortality register

Suicide cases among those with a suicide attempt during the study observation period will be identified using data from the mortality register, managed by the Catalan Department of Forensic Medicine, which provides detailed data on causes of death using the ICD system (9th or 10th revision). State-of-the art forensic techniques, including psychological autopsy by a multidisciplinary team, is used to determine death by suicide in the mortality register.39 Nevertheless, forensic examination of suicidal intention is difficult, and misclassification may occur.

Study design

Figure 1 shows an overview of the CSRC-Epi study design. The CSRC-Epi study is a register-based study, that is, a study that uses the exposure and outcome data from registries,40 which in turn, are representative for the target population. The target population consists of the dynamic cohort of all Catalan residents during the 6-year period, 2014–2019. As explained in detail below, we will conduct a two-stage nested case–control study within this dynamic cohort,41 42 which, in combination with the use of inverse probability weights, will allow us to construct a dataset representative for the original cohort of Catalan residents, and analyse the data accordingly.

Figure 1

Overview of the CSRC-Epi study design. CSRC-Epi, Catalonia Suicide Risk Code Epidemiology; EHR, Electronic healthcare record; M, million; W, weight.

Annual total population of Catalonia is between 7.5 and 7.6 million, with annual rates for immigration, emigration, birth and death being ~2.5%, ~1.9%, ~0.9% and ~0.8%, respectively.39 Based on these figures, we expect a maximum of ~9.1 million individuals with Catalan residency status on at least one point in time over the 2014–2019 period. However, we expect suicide attempts in Catalonia to be extremely rare before the age of 10, in line with findings that self-injurious behaviour generally occurs as from the adolescent period.43 We will, therefore, exclude cases and controls that have not reached age 10 by the end of the 2014–2019 period (ie, ~10.9%), lowering the total expected target population to ~8.1 million.

Case selection

Suicide attempt cases between 2014 and 2019 will be identified using the CSRC case register. Based on preliminary data exploration, we expect to include ~6000 cases of clinically confirmed suicide attempt by the end of 2019, of which ~8% (~480) will be repeat attempters. This substantially exceeds the number of suicide attempt cases included in previous register-based studies (median=1562, IQR=1562–325032).

One of the main objectives of the CSRC programme is to enable reliable surveillance of suicide attempt in the population, and to tackle under-registration and misclassification using regular EHR systems.16 Nevertheless, failure to adhere to the CSRC programme protocol may result in an unknown number of suicide attempt cases that remain undetected. Therefore, ICD disease classification codes in the five centralised EHR registers will be inspected to identify potentially missed cases of suicide attempt. For that purpose, a wide range of ICD codes related to suicide attempt (see table 1) was identified through an extensive MEDLINE search, including a recent overview article with recommendations on the use of ICD codes for the surveillance of self-injurious behaviour.16 ICD codes are unable to determine suicidal intent, but do allow to identify subjects with intentional self-injurious behaviour and to differentiate them from subjects with self-injurious behaviour of undetermined intent (ie, intentional or accidental self-injurious behaviour).16 Outcome definition algorithms (ie, predefined sets of ICD codes) have shown promising in increasing the accuracy of suicide attempt case detection using ICD codes.17 Therefore, we intent to validate the range of ICD codes we identified against golden standard identification methods (ie, manual review as well as text mining of clinical notes) to increase the accurate detection of potentially missed cases for our study.

Table 1

ICD codes to identify potentially missed cases of suicide attempt

Control selection

In a first stage, we will select a 20% stratified random sample of the 2014–2019 dynamic cohort members, using the six population-representative administrative lists of Catalan residents described above. Constructing this preliminary 20% subsample is necessary for two reasons: (1) we need to reduce the amount of data AQuAS will need to handle when conducting the age-matched and sex-matched incidence density sampling in the second stage; and (2) based on publicly available data, we estimate the probability of selecting controls with 12-month healthcare contacts for mental disorders to be relatively low (ie, ranging from ~17% for primary care visits to ~0.1% for general hospitalisations). Therefore, in order to enrich the data for relevant exposure information, controls in the 20% subsample will be oversampled for number and specific types of healthcare use, using the past year healthcare summary variables available in the administrative lists. This will result in a higher number of controls with (mental) healthcare diagnoses eligible in the second stage.

In a second stage, we will create 67 risk sets, one for each month in the 6-year observation period (ie, June 2014 to December 2019). Within each risk set, a number of 30 age-matched and sex-matched controls will be randomly selected for each case (ie, case or potentially missed case) without replacement (incidence density sampling or risk set sampling41). Eligible controls will include future cases and controls will be allowed to be selected multiple times across risk sets. To allow for the joint and separate analyses of suicide attempt and potential suicide attempt (see the Data analytical plan section), controls are selected for first suicide attempt within individuals (eligible controls including previous potential cases, but not previous suicide attempt cases), and if applicable, also for the first potential suicide attempt within individuals (eligible controls including those without previous suicide attempt or potential suicide attempt only). After the final selection of controls, inverse probability weights will be constructed to restore population-representativeness of the original dynamic population cohort. Weights will be equal to 1 for cases and potentially missed cases, (ie, all are selected in both sampling stages); for controls, weights will reflect the selection probabilities at stage 1 (including the oversampling according to healthcare summary variables), as well as at stage 2 (including the age–sex matching and the total time at risk of each control during the observation period).42 44

A specific objective of the SRC-Epi project is the construction of suicide attempt risk prediction tools by healthcare setting. For this purpose, a separate series of controls will be selected at the second sampling stage, this time matching by age, sex as well as type and timing of last healthcare contact. For example, for a suicide attempt assessed at the emergency department at time y and a last healthcare visit at primary care at time x, 30 age-matched and sex-matched controls will be selected among those individuals that have not committed a suicide attempt up to time y, restricting to those controls with primary healthcare visits around time x.

Data analytical plan

The primary outcome of this study is suicide attempt during the period 2014–2019. We will both focus on first suicide attempt during the observation period, as well as on repetition of attempts, defined as suicide attempts among those with a previous suicide attempt during the observation period. As explained above, we will also identify potentially missed cases of suicide attempt, using the ICD codes identified in the literature (see table 1). Cases and potentially missed cases will be considered both as joint as well as separate outcomes in the analyses. In addition, we will conduct separate analyses focusing on clinical severity of suicide attempts (eg, lethality and method of attempt). A secondary study outcome consists of suicide among those with a suicide attempt during 2014–2019. Cases of suicide will be identified through the mortality register, managed by the Catalan Department of Forensic Medicine (see Mortality register).

Suicide attempt occurrence

Population-representative annual incidence rates will be estimated by dividing annual number of cases by total annual sums of person-years at risk, multiplied by 100 000, using the weighted nested case–control dataset. We will calculate both crude as well as age-standardised and sex-standardised incidence rates, stratified by relevant sociodemographic variables (eg, socioeconomic status, healthcare region, etc). As incidence rates do not inform of the distribution of cases over time, we will also estimate and visualise incidence proportions (or cumulative incidence) over time using the Kaplan-Meier estimator to estimate one minus the survival function.

Suicide attempt risk factor associations

To estimate the individual-level associations between predictor variables and outcome variable, (conditional) bivariable and multivariable binomial logistic regression will be applied. For a first suicide attempt during the observation period as the outcome variable, the weighted nested case–control dataset will be used. In order for the OR to be a valid estimation of the HR (and hence, relative risk) multiple inputs for those controls selected multiple times and for those cases also selected as controls will be included in the analyses.45 Time-varying predictor variables (eg, the exact dates of diagnoses or medical prescriptions) will be recoded by categorising time-to-event into discrete time intervals. Relevant time-to-event cut-offs for these intervals will be identified by examining the changes in odds across time-to-event for a high number of short time intervals. Given the high amount of predictor variables under study, the Least Absolute Shrinkage and Selection Operator46 method will be employed as to select a subset of predictor variables that predict the outcome best while maintaining a good model fit. For suicide and for repetition of suicide attempt as the outcome variable, the analysis will be very similar, but will now reduce to a cohort analysis of those individuals with first suicide attempt during 2014–2019 (which are all selected), and regression models will also include the clinical variables from the CRSC case register obtained though in-depth psychiatric evaluation.

Population-level effect sizes will be estimated by calculating bivariable and multivariable population-attributable risk proportions (PARP47), based on summary measures of individual predicted probabilities obtained from the logistic regression models described above, comparing original models versus models in which regression coefficients of the predictor variable(s) under study are set to zero. PARP subsequently provide an estimate of the proportion of cases that could potentially be prevented if certain risk factor(s) in the population were to be eliminated, assuming a causal relationship between risk factor and outcome variable. PARP estimates are of high value for policy makers involved in suicide prevention, as they provide insight in the population-level impact of risk factor(s) with regard to suicide risk. Specifically, PARP take into account that high-prevalence risk factors carrying low individual risk may be equally or even more important to consider than low- prevalence risk factors carrying high risk for the affected individuals. Taking into account such knowledge is important, as it is the combination of both individual-level and population-level interventions that has shown to be successful in reducing adverse outcomes with complex multicausal aetiologies.30

Suicide attempt risk predictions

Risk prediction tools for (repetition of) suicide attempt will be constructed using machine learning techniques. A first series of algorithms will focus on estimating suicide attempt risk for different prediction windows, that is, censoring data of both cases and controls at different time points relative to the time of event.48 A second series of algorithms will predict suicide attempt after specific healthcare contacts using the separate series of controls selected in the second sampling stage (see Control selection). Machine learning techniques will include elastic net penalised logistic regression,49 naïve Bayes classifiers,50 multivariate adaptive regression splines,51 Bayesian additive regression trees,52 random forests,53 gradient boosting,54 k-nearest neighbour algorithms,55 support vector machines56 and artificial neural networks.57 Stacking ensemble techniques (super learning)58 will be implemented to further optimise prediction accuracy, by using the predictions from the above mentioned algorithms (the base learners) as input to train new models (meta learners). To avoid model overfit, model development and tuning will be conducted on a training dataset using k-fold cross-validation, and model predictive accuracy will be evaluated in a separate validation dataset using a recalibrated algorithm. Sample selection bias introduced by the two-stage nested case–control design will be addressed using appropriately corrected classifiers.59 Model predictive accuracy will be evaluated by the area under the receiver operating characteristic curve, as well as accuracy measures calculated for different thresholds of the continuous predicted probability (ie, thresholds set to delineate the top x% at highest risk60), including positive predictive values (precision, or the probability that a predicted case is actually a true case), sensitivity values (recall, or the proportion of true cases that has been predicted correctly), and F1-scores (ie, the harmonic mean of recall and precision).

Study limitations

A first major limitation of the CSRC-Epi study is that any history of suicide attempt before the study observation period is unknown for both cases and controls. However, indirect information will be available, consisting of (1) the Suicidal Scale of the MINI (item 6) used in the CSRC programme protocol, which assesses lifetime history of suicide attempt among the cases. The timing of these previous attempts will be unknown, and this information will be based on patients’ self-reported information and (2) the EHR registers, which include episodes of self-harm before 2014, among both cases and controls. It will, however, not be possible to determine the suicidal intent of these episodes, and EHR register data are only available as from 2008. A second main limitation of the CSRC-Epi study is that, although the entire Catalan population has free access to public healthcare, about 20% of the population opts for private coverage or uses both public and private healthcare systems. This limits the population-representativeness of the EHR data to an unknown extent. As mentioned above, a third limitation is that, due to variable adherence to the CSRC surveillance programme, suicide attempt cases may still go undetected. This will be countered by identifying potentially missed cases of suicide attempt in the EHR registers. Related to this, it should be acknowledged that an unknown proportion of suicide attempt cases do not contact healthcare services, and are therefore not included in this study. This limitation is inherent to studies using EHR data, and points to the need on complementing the knowledge that can be gained from registry-based studies with findings from general population epidemiological survey research.

Patient and public involvement

No patients involved.

Ethics and dissemination

The protocol of this study has been approved by the Parc de Salut Mar Clinical Research Ethics Committee (CEIC protocol 2017/7431/I). The study is in line with the principles established in the Declaration of Helsinki, with the Charter of Fundamental Rights of the European Union (2000/C 364/01), and the European Convention on Human Rights. All data for this study come from the PADRIS programme, and will involve processing of completely anonymised EHR data. For record linkage activities, the Spanish Order SAS/3470/2009 for data obtained in observational studies is followed. This is also in line with the Data Protection Directive of the European Union (Directive 95/46/EC).

We aim to create awareness of the proposed action in the general public, by providing comprehensive information on the need for detecting new suicide attempt risk factors constellations, on the need to improve suicide attempt risk estimation, and on the ongoing exploration of clinical decision support for the improved assessment of suicide risk. The following communication measures will be taken: (1) the design of a website providing clear and balanced information on the project; (2) balanced newspaper articles and interviews to the press and (3) providing all healthcare settings with patient folders on the project, providing a clear and balanced summary of the project.

Communication with patients and with next of kin will be in lay terms only. Feasible formats are internet websites and patient forums, patient folders and carefully planned releases to the press. We will provide clear and balanced information on the project, acknowledging the unique experience of each patient and stressing our final aim of improving (not replacing) human clinical practice and care. Patient groups will also be involved in the design of the overall communication strategy (codesign).

Targeted expert audiences will consist of those involved in suicide research, as well as in general psychiatry, psychiatric epidemiology and translational psychiatry. Scientific publications will be sent to peer-reviewed journals through open-access publishing, and added to the Pompeu Fabra University’s repository of open access articles.61 Further dissemination of results will be through scientific conferences and workshops. Hospital and government authorities involved with mental healthcare will be informed of the study results through scientific reports, which will present series of suicide intervention prevention frameworks. In addition, we will provide the Spanish and Catalan Department of Health with updated clinical guidelines for the assessment of suicide risk. Clinicians with mental healthcare expertise as well as emergency department clinicians and general practitioners will be informed of these recommendations through the project’s website and the professional associations’ websites, periodicals and meetings.



  • Collaborators The CODIRISC Epidemiology Study Group are: Jordi Alonso, Itxaso Alayo Bueno, Laura Ballester Coma, Jordi Blanch, María Jesús Blasco Cubedo, Ana De Inés Trujillo, Maria Teresa Campillo Sanz, Narcís Cardoner, Anna Isabel Cebrià, Cristina Colls, Matilde Elices, Anna García-Altés, Ricard Gavaldà, Manel Gené Badia, Javier Gómez Sánchez, Ronald C. Kessler, Lars Mehlum, Cristina Molina Parilla, Rosa Morros Pedrós, Philippe Mortier, Jordi Ortiz, Diego Palao, Rosa Maria Pérez Pérez, Víctor Pérez, Maria J. Portella, Bibiana Prat Pubill, Beatriz Puértolas Gracia, Raquel Suárez Pérez, Ping Qin, and Gemma Vilagut.

  • Contributors Initial draft of the protocol: PM, GV, BPG, IAB and JA. Critical review of the protocol: LM, PQ, and RCK. Initial draft of the manuscript: PM, GV, BPG, IAB, and JA. Critical review of the manuscript: PM, GV, BPG, ADIT, IAB, LBC, MJBC, NC, CC, ME, AG-A, MGB, JGS, MMS, RM, BPP, PQ, RCK, DP, VPS and JA. All authors read and approved the final manuscript.

  • Funding This project was supported by ISCIII/FEDER PI17/00521, ISCIII/FEDER PI17/01205, and Generalitat de Catalunya (2017 SGR 452). The Catalonia Suicide Risk Code surveillance programme is an initiative of the Mental Health and Addictions Plan of the Department of Health of the Catalan Government. PM has a Sara Borrell research contract awarded by the ISCIII (CD18/00049). ME has a Juan de la Cierva research contract awarded by the ISCIII (FJCI-2017–31738). VPS and ME want to thank unrestricted research funding from Secretaria d′Universitats i Recerca del Departament d′Economia i Coneixement (2017 SGR 134 to ‘Mental Health Research Group’), and Generalitat de Catalunya (Government of Catalonia). BPG and ADIT received funding from ISCIII FI18/00012 and FPU2017-06447, respectively. LBC received funding by Ministerio de Educación, Cultura y Deporte (FPU15/05728). DP and JA received funding by ISCIII/FEDER PI17/01205.

  • Competing interests DP has received grants and also served as consultant or advisor for Angelini, Janssen, Lundbeck and Servier. The other authors have no competing interests to declare.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.