Article Text

Download PDFPDF

Suicide Information Database-Cymru: a protocol for a population-based, routinely collected data linkage study to explore risks and patterns of healthcare contact prior to suicide to identify opportunities for intervention
  1. Ann John1,
  2. M Dennis1,
  3. L Kosnes1,
  4. D Gunnell2,
  5. J Scourfield3,
  6. D V Ford1,
  7. K Lloyd1
  1. 1College of Medicine, Swansea University, Institute of Life Sciences 2, Swansea, UK
  2. 2School of Social and Community Medicine, University of Bristol, Bristol, UK
  3. 3School of Social Sciences, Cardiff University, Cardiff, UK
  1. Correspondence to Dr Ann John; a.john{at}


Introduction Prevention of suicide is a global public health challenge extending beyond mental health services. Linking routinely collected health and social care system data records for the same individual across different services and over time has enormous potential in suicide research. Most previous research linking suicide mortality data with routinely collected electronic health records involves only one or two domains of healthcare provision such as psychiatric inpatient care. This protocol paper describes the development of a population-based, routinely collected data linkage study: the Suicide Information Database Cymru (SID-Cymru). SID-Cymru aims to contribute to the information available on people who complete suicide.

Methods and analysis SID-Cymru will facilitate a series of electronic case–control studies based in the Secure Anonymised Information Linkage (SAIL) Databank. We have identified 2664 cases of suicide in Wales between 2003 and 2011 from routinely collected mortality data using International Classification of Diseases, Tenth Revision, codes X60–X84 (intentional self-harm) and Y10–Y34 (undetermined intent). Each case will be matched by age and sex to at least five controls. Records will be collated and linked from routinely collected health and social data in Wales for each individual. Conditional logistic regression will be applied to produce crude and confounder (including general practice, socioeconomic status) adjusted ORs.

Ethics and dissemination The SAIL Databank has the required ethical permissions in place to analyse anonymised data. Ethical approval has been granted by the Information Governance Review Panel (IGRP). Findings will be disseminated through peer-reviewed publications, consultations with stakeholders and national/international conference presentations. The improved understanding of the prior health, nature of previous contacts with services and wider social circumstances of those who complete suicide will assist in prevention policy, service organisation and delivery. SID-Cymru is funded through the National Institute for Social Care and Health Research, Welsh Government (RFS-12-25).


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Based on the general population.

  • Includes all deaths through suicide, not only those known to mental health services.

  • Based on routinely collected health and social care data linkage.

  • Linkage across multiple domains. Most suicide data linkage studies are limited to only one or two domains of healthcare provision.

  • Actual use can only be inferred from prescriptions issued.


Every year approximately 800 000 people die by suicide worldwide, 1–2 in every 100 deaths. Prevention of suicide is a global public health challenge. Collaborative working across government departments, with a public health approach extending beyond mental health service care is essential.1 Global patterns and national trends in the incidence of suicide and its key risk factors change over time. The therapeutic and preventive challenges of understanding and responding to these changes are considerable.

Vast amounts of personal data are routinely collected on a daily basis by health and social care systems around the world to support clinical management and patient care. Linking these data records for the same individuals across different services and over time offers a powerful, population-wide resource. Such integrated data sets have been used to study a range of health issues to identify risk and protective factors and to examine outcomes. The secondary use of these data has enormous potential in suicide research. Improved consideration of the prior health, wider social circumstances and points of access to services of all individuals who complete suicide can be achieved.2–4

Studies from the Nordic countries have demonstrated the usefulness of data linkage across register-based studies in suicide research.5–10 Others11–13 have demonstrated that collating and linking sets of routinely collected whole population-based data, such as General Practice (GP) records, outpatient data and inpatient activity, with mortality data enable more detailed analysis of risk factors for those people completing suicide. Most previous research linking suicide mortality data with routinely collected electronic health records involves only one or two domains of healthcare provision such as psychiatric inpatient care.

In the UK, various systems exist to examine suicide deaths. The National Confidential Inquiry into Suicide and Homicide (NCISH) by people with Mental Illness focuses on suicide cases who were in contact with mental health services (approximately 25% of total) in the year prior to their deaths.3 However, this provides limited information on issues of suicide in the general population and may hamper the effectiveness of wider preventive efforts.1 In a recent report, NCISH14 examined aspects of primary healthcare prior to all suicides in England between 2002 and 2011; however, no linkage was made with data from other service providers such as emergency departments.

Scotland has recently established enhanced data collection in relation to suicide, however, further development is needed in order for the Scottish Suicide Information Database (‘ScotSID’) to be able to examine healthcare pathways and contact with more than one health service;15 England does not currently have a dedicated repository for suicide data though studies have made use of the Clinical Practice Research Datalink (CPRD;, which represents approximately 8.5% of the UK population from 600 general practices (GP's) in England.16 CPRD can be linked with data from the National Health Service (NHS) Hospital Episode Statistics (HES; and mortality data from the Office of National Statistics (ONS; but has limited emergency department data.

SID-Cymru will access and link information on prior health, nature of previous contacts with services, and wider social circumstances of all those who complete suicide (known and unknown to mental health services) within the population of Wales via anonymised routinely collected electronic data held in healthcare and social data sets from the Secure Anonymised Information Linkage (SAIL) Databank ( ,18 This brings together, links and anonymises the widest possible range of person-based data currently available in the UK. The SAIL Databank was originally set up by the Health Information Research Unit (HIRU) at the College of Medicine at Swansea University.

SID-Cymru is part of the research programme related to the Health e-Research Collaboration UK (HeRC UK), led by the Medical Research Council (MRC) and based in the Centre for the Improvement of Population Health through e-Records Research (CIPHER). CIPHER is a UK Clinical Research Collaboration (UKCRC) Public Health Research Centre of Excellence set within the College of Medicine at Swansea University. SID-Cymru will provide timely robust data to inform the future strategic direction of and the first step in designing and evaluating effective interventions to prevent suicide. It will use the International Classification of Disease, Tenth Revision,19 (ICD-10) definitions and instructions for classifying causes of death, which will allow for comparisons with other countries and thus support research of relevance globally. An emphasis will be given to issues where there are opportunities for intervention or where electronic data linkage confers an advantage to its investigation.


The main aim is to establish SID-Cymru as a population-based resource for studying factors and service contacts associated with all suicide deaths through routinely collected data linkage case–control studies.


Phase 1

  • To identify, via the SAIL Databank, all those with ICD-10 codes for probable and possible suicide in Wales 2003–2011, and matched controls.

  • To explore and address the methodological issues relating to the development of this database of completed suicides, and the linkage of data across different data sets and settings.

Phase 2

  • To investigate risk factors and trends for suicide, including: primary care diagnosis of depression; levels of treatment with antidepressants and trends in such treatment over time; rural and urban geography; educational attainment; levels of physical illness.

  • To investigate settings and pathways of care where people are in contact with services in the year leading up to their suicide across the whole population and in specific groups such as the elderly.

Methods and analyses


SID-Cymru will facilitate a series of electronic population-based, routinely collected data linkage case–control studies on completed suicide in Wales between 2003 and 2011. Wales has a population of 3.1 million20 and is part of the UK. There are approximately 32 000 deaths of Welsh nationals registered each year of which around 300 (approximately 1%) are registered as suicides.21

Data source

SID-Cymru will be assembled within the SAIL Databank,17 a growing resource that already holds over a billion anonymised records from 13 databases, which can be anonymously linked at the individual record level.18 The SAIL Databank has been previously used for linkage of routine data.22–26

Within the SAIL Databank a split-file approach to anonymisation is used to overcome issues of confidentiality and disclosure in health-related data warehousing.18 Demographic data are sent to a partner organisation, NHS Wales Informatics Service (NWIS), where all identifiable information is removed; clinical data are sent directly to HIRU, where, for each data set within the SAIL Databank, an individual is assigned an encrypted Anonymised Linking Field (ALF). The ALF is used to link anonymised individuals across data sets, thus supporting the opportunity to conduct longitudinal analyses of an individual's journey through multiple health, education and social data sets.17 Additionally, Residential ALFs (RALFs) have been created for all residences in Wales and enables linkage of anonymised household and environment data with the health records of individual residents without the identity of the residences or residents being known to researchers.26

The primary study base will be the Welsh Demographic Service27 (WDS) hosted within the SAIL Databank. The WDS is a core data set available within the SAIL Databank and part of a set of services to manage administrative information (demographic data) for NHS patients in Wales. The WDS was introduced early in 2009 replacing a similar service known as the NHS Wales Administrative Register (NHS AR). The WDS data is collected from GP's via the Exeter System; more than five million individuals are currently present in the WDS data set within the SAIL Databank. The WDS is a register of all individuals who have at some point in time been registered with a Welsh GP or required some form of NHS healthcare provision in Wales. The electronic collation of WDS/NHS AR data originated in 1960, and is updated and maintained by NWIS,27 ensuring that address changes (within and out of Wales) and death notices are included in the register. The original (non-anonymised) version of the NHS AR has been used in the HIRU Matching Algorithm for Consistent Results in Anonymised Linkage (MACRAL), making the WDS/NHS AR the master list for all Welsh residents and using probabilistic matching to find the associated NHS numbers that are then encrypted into ALF's.

Deaths in Wales should be registered within 5 days of the date of death (DOD). However, legislation in Wales means that when a coroner's inquest takes place, the death cannot be registered until the inquest is complete. Since the Office for National Statistics (ONS), the national agency where all deaths are collated, has no conclusive information about the death until it is registered, there is a delay between the date the death occurred and when the death is added to the annual ONS mortality data set. The ONS found that less than 41% of deaths going to inquest were registered within 3 months, though 96% were registered within 1 year.28 Thus, precise information on the annual incidence of suicide may be delayed by up to 2 years. Information collected at death registration is recorded on the Registration Online system by registrars. Most of the information is normally supplied by the informant (usually a close relative of the deceased), while the cause of death (COD) is usually obtained from the Medical Certificate of Cause of Death (MCCD), completed by a medical practitioner when the death is certified using ICD-10 coding, or the coroner if there is an inquest.19 Notably, a death is not officially registered within the Annual District Deaths Extract (ADDE) until the COD has been finalised, and thus the year of death and the year of registration may not concur. The primary data set used to construct SID-Cymru is the ADDE from the ONS. The ADDE is inclusive of Welsh residents who died outside of Wales, and holds information about COD derived from death certificates on all deaths in Wales.

Definition of suicide for cases

The true number of suicides is difficult to determine because a coroner's conclusion of suicide must be ‘beyond a reasonable doubt’, that is, that the death was intentionally self-inflicted and in some areas coroners have increasingly (since 2001) reported narrative conclusions rather than reporting it as suicide.29 ,30 Previously, when insufficient information was recorded by the coroner, ONS coders would record the death as an accident, which inevitably led to some suicides being classified as accidents or misadventure. The ONS has recently issued guidance on this issue following a coding practice review.31 Current ONS practice includes deaths where intention is ‘undetermined whether accidentally or purposefully inflicted’; thus deaths where there may be no intention to take life, such as in relation to injury or poisoning, are included in suicide figures by ONS. Currently, there is no access to coroners’ narrative verdicts within the SAIL Databank as a possible method for review of case inclusion.

There is evidence to suggest that a high proportion of deaths from poisoning and hanging that receive accidental verdicts are found, when subjected to clinical review, to be suicides.32 Such possible deaths through suicide will be included in SID-Cymru as an opportunity for further separate and combined analysis; thus the additional ICD-10 codes relating to ‘accidental poisoning with prescribed drugs’ (X40–X41, X43–X49) and ‘accidental hanging’ (W75–W76) may be used along with ‘sequelae of external causes of morbidity and mortality’ (Y87, Y87.2, Y89, Y89.9).19 Thus with SID-Cymru we aim to establish a resource for studying factors associated with all probable, as well as possible, deaths through suicide, by including accidental hanging/strangulation and accidental poisoning, excluding narcotics and psychodysleptics (ie, possible suicides), along with the ONS traditional method of defining suicides, which includes suicides and deaths of undetermined intent (probable suicides).2

Identification of cases and controls

SID-Cymru cases encompass suicides (‘intentional self-harm’ (ICD-10: X60–X84)) and probable deaths through suicide (‘undetermined intent’ (ICD-10: Y10–Y34, excl.Y33.9)) recorded from the MCCD and presented within ADDE as the underlying COD. Those coded Y33.9/U50.9 (pending verdicts) are excluded, since a large proportion of these are subsequently found to be homicides. Probable suicide defined above can be supplemented with possible deaths through suicide if required. Cases of probable suicide will be identified and extracted by use of the ICD-10 codes19 defined above and depicted in table 1.

Table 1

ICD-10 codes used to identify and extract cases of death through probable suicide for SID-Cymru

ONS figures include only those over 15 years of age, due to the possibility that deaths in younger children coded as undetermined events may be caused by unverifiable accidents, neglect or abuse.33 SID-Cymru will allow for the analysis of suicides and probable suicides for the 10–14-year age band. Official ONS mortality statistics are produced based on the number of deaths registered in a particular calendar year, rather than the number of deaths that occurred in that year. This means their figures include some deaths that occurred in years prior to the reference year (approximately 4%). As SID-Cymru will link and review data in the period leading up to death it is important to ensure the match/end date reflects the correct time period (specific to each individual case), that is,  DOD not date of registration, to afford an accurate perspective on utilisation of resources and help-seeking behaviours. Consequently, the actual DOD (ADOD) will be used in the matching criteria to establish a data review ‘end date’ for controls, rather than the registered DOD (RDOD) referred to in ONS reports. Mortality data within the SAIL Databank is only available from 2003, thus the earliest case inclusion relates to ADOD's from 1 January 2003. To minimise the underestimation of cases identified (ie, due to delays in COD confirmation and registration of DOD) the case inclusion for SID-Cymru includes cases aged 10 years and over at ADOD, where the ADOD took place between 01 January 2003 and 31 December 2011.

Table 2 presents the number of registered deaths through suicide per year and actual deaths per year between 2003 and 2011 as reported by the ONS and as identified for SID-Cymru within the SAIL Databank.

Table 2

Probable suicide* deaths for Wales 2003–2011 identified within the SAIL Databank for SID-Cymru

Matched Controls will be identified within the WDS as live individuals matched on age (to the nearest year) and gender who were registered within the WDS for at least 1 year prior to matched case's ADOD. The controls will be required to be alive at the time of the matched case's ADOD (ie, the match/end date). The use of live controls limits the introduction of bias relating to deaths, particularly in the younger age groups, of those undertaking risky behaviours resulting in premature death (eg, substance use, accidents) that may be associated with unrecognised suicidal behaviours or known risk factors. To add to the power of the study we aim to identify at least five controls to every one case.34

Routine data sources

For SID-Cymru the data collected on identified cases and controls, via ADDE and the WDS, respectively, will be linked to other routinely collected data sets, retrospectively allowing a review of each individual's pathway through the various services. Linkage with, for example, GP system data provides varying information about patients going back several years, including previous diagnosis, presenting symptoms and previous medications prescribed. This data set can be used to review contacts with the GP and, consequently, infer the development/diagnosis of any new medical conditions including depression and self-harm prior to suicide. Linkage with inpatient data will allow a review of hospital contacts and Emergency Department Data Sets will give information on crisis contacts. This will provide comprehensive insights into help seeking behaviours and management across settings. Data sets currently accessible via the SAIL Databank, which will be linked to SID-Cymru, are presented in table 3. In the future there are several other data sets, currently under consideration for inclusion, within the SAIL Databank to which SID-Cymru could link; including Department of Work and Pensions' (DWP) employment and incapacity status, Looked After Children, Fostering, Substance Misuse Services, Sexually Transmitted Infections and Police Data.

Table 3

Data sets available within the Secure Anonymised Information Linkage (SAIL) Databank for linkage with Suicide Information Database (SID)-Cymru cases and controls


Data variables/characteristics

Data to be included in SID-Cymru will be extracted from the SAIL Databank and will include basic demographics; educational data; ADOD and RDOD; numbers and percentages for deprivation; proportions known to different healthcare settings in the period prior to death; and inclusive medical history, that is, primary care contact and diagnosis (by Read Codes), information about hospital/psychiatric admission and diagnosis (by ICD-10 codes), and nature of service contact, for example, for self-harm, substance misuse. Therefore, the routinely collected data held by the NHS and other public bodies supplying the existing SAIL Databank will maximise the narrative of a death through suicide while being less resource intensive than psychological autopsies. The initial variables to be extracted and linked across data sets are described in table 4.

Table 4

Characteristics to be identified and collected through data linkage for SID-Cymru cases and controls

Planned analyses

The primary objective is to establish SID-Cymru as a resource for future analysis. It is important to maximise the utility of the resource, and some general principles of analysis have been determined.

Descriptive epidemiology

Phase 1

  • Identification and description of cases: number of deaths registered with ONS and available within the SAIL Databank, with relevant ICD-10 codes defined as suicides, between 2003 and 2011.

  • Identification and description of matched controls.

  • Basic demographics.

  • Table of delay (days) in registering suicides and undetermined deaths.

Phase 2

  • Proportions known to different healthcare settings: number and percentages, with main diagnosis, that had a general hospital admission; emergency department contact for self-harm and other indications; psychiatric admission and primary care contact in the year prior to probable suicide.

  • Numbers and percentages for deprivation, employment status, educational achievement and medical history (eg, chronic pain, terminal illness, medication, previous self-harm and substance misuse) will also be sought.

  • Number of cases with missing data across data sets for variables of interest will be noted.

Area-based measures of socioeconomic deprivation

Deprivation will be measured at lower super output area (LSOA) level using the Welsh Index of Multiple Deprivation35 (WIMD) and Townsend Index Score.36 All suicides and matched controls will be assigned to a LSOA. There are 1909 LSOAs in Wales with an average population of 1500 people (range: 1000–3000).37 Linkage to WIMD and Townsend Index information is available in the SAIL Databank. These will be ranked for deprivation, divided into quintiles and standardised rates calculated.

ORs for the described exposures in the casecontrol study

A case–control study utilising SID-Cymru will be population based and so the relative risk of suicide will be estimated by conditional logistic regression model with SPSS (V.20). Crude ORs will be adjusted for general practice and/or LSOA by matching cases and controls. Unadjusted estimates, confounder-adjusted estimates and their precision (eg, 95% CI) will be produced. Interactions between variables will be assessed with the log likelihood ratio test based on results from the adjusted analysis. The population-attributable risk will be calculated38 on the basis of adjusted relative risks from the full analysis and the distribution of exposures in the cases.

We will also report information on the completeness of linkage with each data set.

Ethics and dissemination


A large amount of preliminary work on anonymisation methodologies was undertaken to create the SAIL Databank system,17 ,18 and the SAIL Databank has the required ethical permissions and processes in place to analyse anonymised data. It operates within a robust series of guidelines in line with the Caldicott principles and the National Information Governance Board for Health and Social Care.18 In compliance with the Information Governance Review Panel rulings and the Data Protection Act 1998,39 individual-level data and personal identifier linkage codes will not be removed from the SAIL Databank and all analyses will be carried out within the SAIL Databank gateway at Swansea University, a secure access point to data within the SAIL Databank. The key points of the MRC/Wellcome Trust data sharing policy will be followed.40 ,41


This paper describes the protocol for the development of SID-Cymru, and the research opportunities available from an electronic case–control study of suicides within a whole population. SID-Cymru will have the ability to link suicide cases anonymously to primary and secondary health information along with other social care data, allowing us to review each case's journey through these data sets. The establishment of SID-Cymru and exploration of the linkage methodologies will improve our understanding of those who complete suicide (particularly those not known to mental health services) and will be used to inform service planning and policy decision making and implementation. It will help identify key opportunities and settings for prevention of this tragic event. By so doing, SID-Cymru will join other international databases of suicide research and provide a platform for further investigation and data linkages.

In order for SID-Cymru to become a functional resource it is important to be aware of the limits of health data available; though widely used in research, and offering a broad range of information about treatment and associated conditions, there are issues relating to determining the quality of patient records, the completeness of data available and any conclusions that may be drawn from them, perhaps particularly concerning primary care records.42 That is, working with routinely collected data presented in its ‘raw’ format, where duplicates, missing and erroneous entries are common occurrences, requires a certain level of database analysis skills. While some such administrative-based/system-based recording issues are easy to identify and account for in individual data sets, it is not always apparent what is correct and what is erroneous at the combined level. Indeed, this problem is confounded when linkage of data reveals conflicting information causing routine data to appear inaccessible and attempts at linkage discouraged. Thus, a secondary aim for SID-Cymru is to share the skills developed as part of establishing a suicide database, which can aid colleagues who may lack such analytical expertise and foster greater multidisciplinary collaborations and advance suicide research.

The UK has a strong presence in the form of a wide range of publications and expertise relating to suicide research. Successful and dedicated Suicide Research Centres exist in Bristol, Manchester and Oxford,43–45 and Scotland recently started work on a ‘ScotSID’.46 These centres of excellence report broadly about suicide, though for logistical reasons, often, a regional focus is retained relating to project funding, analytical and interventional work; for that reason the establishment of SID in Wales will build on and enhance existing UK suicide resources and infrastructure since the level of linkage available within the SAIL Databank is unique in this field. Unique opportunities within the SAIL Databank include linkage across primary, secondary and emergency department data, and with education data. Findings will be disseminated through publications in peer-reviewed journals and presentations at local, national and international conferences. Communication and consultation with key stakeholders from health and social care (eg, primary care, mental health, Royal Colleges), government and other policy makers, as well as the third sector, will occur. Dissemination will be facilitated by the wider roles and responsibilities in suicide prevention, nationally and internationally, of members of the research team.

Implications and significance

Valuable opportunities exist for a wide range of epidemiological and clinical studies on suicide in Wales and SID-Cymru has the potential to become an important resource in facilitating such research, which will be of relevance internationally. In addition to the records that have already been included for linkage with SID-Cymru it is expected that, over time, relevant information from other data sources (eg, the DWP) will be linked to SID-Cymru to provide a wider range of information on issues such as individuals’ circumstances, the nature of their deaths and their contact with extended services. Additionally, linkage can be made with non-routinely collected data sets, such as those held by NCISH. Specific hypotheses that will be explored include: recency of primary care, hospital and emergency department contacts including attendance for self-harm, primary care diagnosis of depression, levels of treatment with antidepressants and trends in such treatment over time, rural and urban geography, contacts for the elderly and levels of physical illness.

Findings of current and projected public health importance will be assessed and presented to support policy makers, commissioners and providers of health and social care in Wales. Non-identifiable information from this project will be made available to researchers in Wales, the UK and international collaborators. The initial project focus will be on identification of cases and controls, data linkage opportunities and methodological issues relating to the establishment of SID-Cymru and routine data linkage (phase 1), before starting data extraction and analysis (phase 2). The proposed data collation and linkage of primary, secondary and emergency department health information together with educational data are currently unique in the UK. Thus it is possible to develop a central repository for information relating to suicide in a whole population, which will be of relevance internationally. This paper describes the design and development of a Suicide Information Database in its infancy.



  • Contributors AJ, the principal investigator, was responsible for the conceptualisation of SID-Cymru. AJ, KL, MD, DG and JS were responsible for the design of SID-Cymru; AJ and LK were responsible for its on-going operationalisation, and drafted the manuscript. All authors read and approved the final manuscript.

  • Funding This study was funded by a grant from the National Institute for Social Care and Health Research, Welsh Government, grant number RFS-12-25. DG is a National Institute for Health Research's (England) senior investigator.

  • Competing interests None.

  • Ethics approval Ethical approval has been granted for SID-Cymru from Health Information Research Unit's Information Governance Review Panel (IGRP) at the College of Medicine at Swansea University, an independent body consisting of a range of government, regulatory and professional agencies.

  • Provenance and peer review Not commissioned; peer reviewed for ethical and funding approval prior to submission.

  • Data sharing statement It will be possible to access the data after the publication of the results. Researchers interested in collaborations or further information are invited to contact AJ at