Article Text

Download PDFPDF

Cohort profile
Cohort profile: the East London Health and Care Partnership Data Repository: using novel integrated data to support commissioning and research
  1. Amy Ronaldson1,2,
  2. Evangelos Chandakas3,
  3. Qiongwen Kang3,
  4. Katie Brennan3,
  5. Aminat Akande3,
  6. Irene Ebyarimpa3,
  7. Eleanor Wyllie4,
  8. George Howard4,
  9. Richard Fradgley5,
  10. Mark Freestone2,
  11. Kamaldeep Bhui2,6
  1. 1Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
  2. 2Centre for Psychiatry, Wolfson Institute of Preventive Medicine, Barts & The London School of Medicine, Queen Mary University of London, London, UK
  3. 3NHS Tower Hamlets Clinical Commissioning Group, London, London, UK
  4. 4Healthy London Partnership, London, UK
  5. 5East London NHS Foundation Trust, London, London, UK
  6. 6Department of Psychiatry, University of Oxford, Oxford, UK
  1. Correspondence to Dr Amy Ronaldson; amy.ronaldson{at}kcl.ac.uk

Abstract

Purpose The East London Health and Care Partnership (ELHCP) Data Repository was established to support commissioning decisions in London. This dataset comprises routine clinical data for the general practitioner (GP)-registered populations of two London boroughs, Tower Hamlets and City and Hackney, and provides a rich source of demographic, clinical and health service use data of relevance to clinicians, commissioners, researchers and policy makers. This paper describes the dataset in its current form, its representativeness and data completeness.

Participants There were 351 749 and 344 511 members of the GP-registered population in the two boroughs, respectively, for the financial year 2017/2018. Demographic information and prevalence data were available for 9 mental health and 15 physical health conditions. Prevalence rates from the cohort were compared with local and national data. In order to illustrate the health service use data available in the dataset, emergency department use across mental health conditions was described. Information about data completeness was provided.

Findings to date The ELHCP Data Repository provides a rich source of information about a relatively young, urban, ethnically diverse, population within areas of socioeconomic deprivation. Prevalence data were in line with local and national statistics with some exceptions. Physical health conditions were more common in those with mental health conditions, reflecting that comorbidities are the norm rather than the exception. This has implications for integrated care. Data completeness for risk factors (eg, blood pressure, cholesterol) was high in patients with long-term conditions.

Future plans The data are being further cleaned and evaluated using imputation, Bayesian and economic methods, principally focusing on specific cohorts, including type II diabetes, depression and personality disorder. Data continue to be collected for the foreseeable future to support commissioning decisions, which will also enable more long-term prospective analysis as data become available at the end of each financial year.

  • mental health
  • health services
  • research cohort
  • commissioning
  • health policy
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Strengths and limitations of this study

  • This dataset will allow for the analysis of a relatively young, urban, ethnically diverse, considerably deprived population—groups that are hard to research with traditional survey methods.

  • The data are linked across care settings, allowing for the evaluation of integrated interventions.

  • Large amounts of data around health service utilisation and associated spend means this dataset can be used to support healthcare commissioning and health service delivery.

  • Reliance on primary care data for diagnostic information means that prevalence rates can be underestimated and certain information can be missed.

  • As is the case with all routine clinical data, the dataset requires extensive cleaning to remove errors.

Introduction

The use of observational data is fundamental to healthcare service provision and health research, and increasingly large amounts of routinely collected clinical data are becoming available. The main advantage of these datasets is their comprehensive nature and large patient numbers.1 The development of these large routine datasets also allow for the application of big data analytic techniques within health research.2 These techniques can evaluate population characteristics, identify risk factors and allow for the development of predictive models relating to clinical outcomes.1 3 Moreover, these datasets identify areas of high health service utilisation and high-cost patients, which might support decision making around health service delivery for commissioners, policy makers and healthcare providers. To date, several large datasets comprising routine clinical data have been developed in the UK; these are being used to inform clinical practice and healthcare delivery.4–6

Recently, a large-scale routine clinical dataset has been developed in the Tower Hamlets borough of London, UK, comprising data from the general practitioner (GP)-registered populations of Tower Hamlets and its neighbouring borough, City and Hackney. The East London Health and Care Partnership (ELHCP) Data Repository was developed primarily to support commissioning decisions and health service delivery in East London and comprises rich information relevant to population health and clinical research. These areas of London are unique innercity areas with diverse populations and high levels of deprivation. Tower Hamlets is home to the largest Bangladeshi community in England.7 It also has the highest rate of poverty, child poverty and unemployment of any London borough.8 What might be of interest to researchers of severe mental illness (SMI) is that the borough of City and Hackney has the highest incidence of psychosis in England.9

The aim of this cohort profile is to introduce and describe the dataset in Tower Hamlets and City and Hackney, comparing population demographics, prevalence rates of mental and physical health conditions, health service utilisation and data completeness. We believe that this cohort profile might facilitate the development of other similar datasets. Moreover, these data will prove useful for clinicians, researchers, commissioners and policy makers, particularly those with an interest in understanding health in urban, diverse and deprived areas.

The development of the ELHCP Data Repository

The ELHCP Data Repository was developed by Tower Hamlets Clinical Commissioning Group (CCG) (https://www.towerhamletsccg.nhs.uk/) and North East London Commissioning Support Unit (NEL CSU) (http://www.nelcsu.nhs.uk/), with funding provided by Tower Hamlets CCG and Tower Hamlets Together (THT) Vanguard (https://www.towerhamletstogether.com/). Development of the dataset started in 2015 and efforts to expand the dataset (eg, to all areas in North East London) and improve data quality are ongoing.

The dataset comprises up-to-date patient-level data for GP-registered populations in the London boroughs of Tower Hamlets and City and Hackney and is updated every 3 months. All patient data are depersonalised and linked by replacing patient National Health Service (NHS) numbers with unique ID numbers. Data pertaining to geographic area (ie, Lower Super Output Area (LSOA)) and other population-level data are drawn from data aggregated by the Office of National Statistics. Presently, secure data streams are in place for: acute (inpatient, outpatient and emergency) services; community services; mental health services; substance misuse services and primary care services (eg, diagnostic information, prescribing). Efforts are underway to establish secure data streams for social care data, public health data (eg, sexual health) and urgent care data (eg, walk-in-centres). However, challenges surrounding linking social care data mean that sensitive information relating to housing and benefits will not be integrated into the dataset.

As illustrated in figure 1, data for acute, community and mental health services flow to the CCG via NHS digital, The Data Services for Commissioners Regional Office (DSCRO) and local CSUs. Pseudonymisation of the data is performed by the DSCRO hosted by the CSU. The data then flows to the local CSU, which formats and organises the data for use at the CCG. Primary care data flows to NEL CSU where a pseudonymisation process is carried out which mirrors that performed by the DSCRO. These data then flow to the local CSU alongside the acute, community and mental health data where they are added to the ELHCP Data Repository.

Figure 1

Diagram of data flows from healthcare providers to tower Hamlets and City Hackney CCGs. CCG, Clinical Commissioning Group; CDS, Commissioning Data Sets; CSU, Commissioning Support Unit; DCB, Data Coordination Board; DSCRO, Data Services for Commissioners Regional Office; NEL, North East London; NHS, National Health Service.

Methods

This cohort profile is a descriptive piece detailing the following:

  1. Demographic information (eg, age, sex, ethnicity and deprivation).

  2. Prevalence rates of mental and physical health conditions.

  3. Data relating to health service utilisation in patients with mental health conditions

  4. Information about data completeness.

All data presented in this profile were from financial year 2017/2018.

Population demographics

Demographic information for these populations included age, sex, ethnicity and deprivation index. There was also information relating to body mass index (BMI) and smoking status. Information about age, sex and ethnicity came from primary care records. Patients were classified into nine ethnic groups: white or not stated; Indian; Pakistani; Bangladeshi; other Asian; Black Caribbean; Black African; Chinese; other ethnic group. Deprivation was measured using the Index of Deprivation. This measure is provided for LSOAs—small areas that are focused on social homogeneity and a defined population size (1000–1500 residents).10 The ELHCP dataset allocates each LSOA in Tower Hamlets and City and Hackney to a decile of deprivation with ‘1’ being the most deprived decile. Information relating to BMI and smoking status came from primary care records.

Mental and physical health conditions

Primary care data were used to identify and define 9 mental health conditions and 15 physical health conditions. These conditions are listed in box 1. Variables pertaining to each condition were formulated by NEL CSU using EMIS Read Codes (V.2 and V.3). Lists of the codes used to formulate each variable are available on request. All physical health variables were deemed long-term conditions, and therefore, a patient who had ever had a primary care diagnosis recorded was included in the physical health cohort. This was the same for SMI (ie, schizophrenia, psychosis, bipolar disorder), personality disorder, anorexia, bulimia, dementia and learning disabilities. As depression, low mood and anxiety can potentially be more episodic conditions, further work was done to determine the timelines that should be used to define these cohorts. Based on local clinical and academic advice and available evidence,11 we decided to create two patient cohorts for each conditions—patients who had ever received a diagnosis and patients who had a diagnosis recorded in the last 12 months.

Box 1

Mental and physical health conditions recorded in primary care records

Mental health conditions

Depression; low mood*; anxiety; severe mental illness; personality disorder; anorexia; bulimia; dementia; learning disability.

Physical health conditions

Type 1 diabetes, type 2 diabetes, asthma, chronic obstructive pulmonary disorder, cancer (excluding basal cell carcinoma), stroke, coronary heart disease, heart failure, atrial fibrillation, hypertension, peripheral artery disease, chronic kidney disease, liver/pancreas complications (eg, hepatitis, liver disease, chronic pancreatitis), osteoporosis, epilepsy.

  • *Low mood is often recorded in primary care in place of depression and is therefore important to consider in any analyses relating to mental health.

Looking at the impact of the method of definition on activity in secondary care services indicated that the preferable way to define the depression, low mood and anxiety cohorts was to include those who had ever had a diagnosis recorded in primary care. This was partly because only looking at those who had received a diagnosis in the past 12 months meant that we omitted a considerable proportion of patients who likely had common mental health disorders (CMDs, ie, depression and/or anxiety) from our analyses. If required, it is possible to apply defined periods of time to conditions where full prolonged remission or cure is a possibility, for example, cancer.

Prevalence rates for each condition in both Tower Hamlets and City and Hackney were calculated from the data. We assessed population prevalence in the dataset and compared with both the local and the national population from other data sources. The prevalence rates of all 15 physical health conditions across 6 of the mental health conditions were also calculated in order to illustrate the bidirectional links between mental and physical health.

Health service utilisation

Health service utilisation data outlined in this profile covered use of emergency services for exemplary purposes. Data relating to specific mental health conditions were described. Health service utilisation data for elective and non-elective inpatient admissions, and outpatient appointments are also available in the ELHCP Data Repository. These data come from the Secondary Uses Service (SUS) dataset and provide number of attendances/admissions/appointments per patient per year. There is also data pertaining to associated spend (eg, spend per activity, spend per patient per year), which is currently being validated and is therefore not presented in this paper.

Data completeness

As the ELHCP Data Repository comprises routinely collected data from several different data flows, there is considerable variation in completeness across variables. Completeness of certain variables appears to be better in patients with diagnoses of mental/physical health conditions compared with those with no diagnoses. We used type 2 diabetes and chronic obstructive pulmonary disorder (COPD) as exemplar physical conditions, and depression and SMI as exemplar mental conditions to demonstrate how data quality differs across patient groups. We provided completeness counts on several demographic and clinical variables including age, sex, deprivation level, ethnicity, smoking status, alcohol use, BMI, weight (kg), systolic blood pressure, diastolic blood pressure and cholesterol. All variables were from primary care datasets. Alcohol use was measured using a composite variable of combined scores from the Alcohol Use Disorders Identification Test (AUDIT) and AUDIT-consumption (AUDIT-C) completed in primary care.

Cohort description and findings to date

Population demographics

The GP-registered populations for Tower Hamlets and City and Hackney in financial year 2017/2018 were 351 749 and 344 511, respectively. See table 1 for a summary of population demographics for both Tower Hamlets and City and Hackney. Data for age, sex and smoking status were recorded for the complete GP-registered populations in both East London boroughs. Deprivation index was recorded for most of the population (99.8% complete in both boroughs). Ethnicity data was recorded to a good level of completeness at 86.6% and 84.6% in Tower Hamlets and City and Hackney, respectively. BMI data was more problematic and was recorded in the primary care records for less than a quarter of GP-registered patients (Tower Hamlets: 23.4%, City and Hackney: 24.0%).

Table 1

Population demographics for the GP-registered populations of Tower Hamlets and City and Hackney

Demographic data suggested that both Tower Hamlets and City and Hackney are ‘young’ boroughs relative to English national data12 (median age 39 years) with mean ages of 32 (median=31 years) and 34 years (median=33 years), respectively. In terms of ethnicity, the majority of the GP-registered population in Tower Hamlets (38.2%) and City and Hackney (46.8%) were of white (or not stated) ethnicity. A considerable proportion of the population of Tower Hamlets were of Bangladeshi ethnicity (27.4%) and a significant amount of the City and Hackney population belonged to ‘other ethnic group’ (16.8%). Most of the GP-registered population in both East London boroughs were living in areas of high deprivation: in Tower Hamlets and City and Hackney, 58.8% and 54.2% of the population, respectively, were in deciles 1 and 2, indicative of high deprivation. Approximately 20% of the population were recorded in primary care as being smokers. Although BMI is poorly recorded in primary care records, mean BMI scores indicated that the GP-registered population on average were overweight (Tower Hamlets: 26.3±8.1; City and Hackney: 27.1±8.6).

Prevalence rates of mental and physical health conditions

Prevalence rates for GP-coded diagnoses of mental health conditions in both Tower Hamlets and City and Hackney are provided in figure 2. Prevalence rates were similar across both boroughs for all conditions. Comparative local and national prevalence data for each condition is provided in table 2. We did not provide age-standardised and sex-standardised prevalence rates from the ELHCP Data Repository to allow direct comparison with unstandardised local prevalence data from Public Health England (PHE). The GP-coded prevalence of CMDs was 16.8% and 17.5% in Tower Hamlets and City and Hackney, respectively. In this case, CMD comprised diagnoses of depression, anxiety, as well as low mood. Low mood is often recorded both in place of and alongside depression in primary care and is therefore important to consider in a measure of CMD. For example, in Tower Hamlets of those with a diagnosis of depression, 33% also had a diagnosis of low mood. CMD prevalence was in line with data from the latest Adult Psychiatric Morbidity Survey (APMS), which reported that 17% of the English population have a CMD.13 However, the GP-coded prevalence of depression in Tower Hamlets (8.7%) and City and Hackney (11.4%) was higher than PHE prevalence data14 from the same areas from the same financial year (7.3%, 9.6%). This could be due to the GP-coded prevalence in this profile reflecting diagnoses of depression that were ever recorded, which likely led to somewhat inflated rates.

Figure 2

GP-coded prevalence rates of mental health conditions in Tower Hamlets and City and Hackney, financial year 2017/2018. *CMD incorporates depression, anxiety and low mood. CMD, common mental health disorder; GP, general practitioner.

Table 2

GP-coded prevalence of mental health conditions in Tower Hamlets and City and Hackney compared with local and national prevalence data (sources: APMS 2014, PHE Public Health Profiles, PHE Psychosis Data Report 2016)

GP-coded prevalence of SMI (1.3%) from the ELHCP Data Repository was in line with data from the PHE Psychosis Data Report from 2016,9 which reported prevalence rates of SMI for both Tower Hamlets and City and Hackney at 1.32% and 1.36%, respectively. Prevalence rates for learning disability were also in line with local and national data provided by PHE.14 However, there were discrepancies between the GP-coded prevalence of personality disorder, eating disorders and dementia in the ELHCP Data Repository and local and national PHE14 and APMS13 data. What this suggests is that these conditions might be under-recorded in primary care in these boroughs.

Prevalence rates for GP-coded diagnoses of physical health conditions in the ELHCP Data Repository are provided in figure 3. Prevalence rates were similar across boroughs for most conditions. However, primary care records showed that levels of type 2 diabetes were higher in Tower Hamlets and levels of hypertension were higher in City and Hackney. Comparative local and national prevalence data for each condition is provided in table 3. Local and national prevalence data was sourced from PHE public health profiles.14 Overall, GP-coded prevalence of physical health conditions from the ELHCP Data Repository seems to be in line with PHE local data for the same areas. There is one notable discrepancy, however. Levels of asthma were much higher in the ELHCP Data Repository. This might be due to overdiagnosis of asthma in primary care, which has been reported in both adults15 and children.16 Moreover, PHE prevalence data is taken directly from Quality Outcomes Framework (QOF) registers. Prevalence data from QOF registers might differ from data from other sources due to coding or definitional issues, as well as population churn. Within the ELHCP Data Repository, ‘liver disease’ comprises many liver complications, for example, hepatitis, fatty liver disease, alcoholic liver disease. Therefore, it was not possible to directly compare prevalence data relating to liver disease with local and national statistics as PHE provide individual prevalence rates for each liver disease subtype.

Figure 3

General practitioner-coded prevalence rates of physical health conditions in Tower Hamlets and City and Hackney, financial year 2017/2018. COPD, chronic obstructive pulmonary disorder.

Table 3

GP-coded prevalence of physical health conditions in Tower Hamlets and City and Hackney compared with local and national prevalence data (source: PHE Public Health Profiles)

Prevalence of physical health conditions among mental health conditions

Mental and physical health are inextricably linked.17 Having a mental health condition predisposes an individual to develop problems with their physical health. At the same time, living with a physical health condition can greatly impact mental health. Figure 4A,B illustrates the prevalence of physical health conditions among mental health conditions in the ELHCP Data Repository. Mental conditions included in this analysis were depression, low mood, anxiety, SMI and personality disorder.

Figure 4

(A) Prevalence of physical health conditions among mental health conditions in Tower Hamlets. (B) Prevalence of physical health conditions among mental health conditions in City and Hackney. CHD,coronary heart disease; CKD, chronic kidney disease; COPD,chronic obstructive pulmonary disorder.

Results indicate that levels of all 15 physical health conditions were increased in those with mental health conditions. The distribution of the prevalence of physical health conditions among mental health conditions was similar across boroughs. The most pronounced increases in physical health conditions were seen for those flagged by their GP as having depression, SMI and personality disorder. Some striking findings included the increase in levels of type 2 diabetes and hypertension among patients flagged by their GP as having SMI. Type 2 diabetes is known to be elevated in patients with SMI.18 19 This increase might be brought about through use of antipsychotic medications known to be both diabetogenic and linked with weight gain,20 as well as through social deprivation, poor nutrition, chronic stress and lifestyle changes.21 Associations between SMI and hypertension reported in the literature have been more mixed.21 Another striking finding that occurred in both boroughs was the high prevalence of asthma for people with a primary care diagnosis of personality disorder. This may be linked to smoking, which is known to be increased in patients with personality disorder.22 Personality disorder has been linked to hypochondriasis and perceived breathlessness,23 which might also be contributing factors to higher levels of asthma diagnosis in primary care for these patients.

Accident and emergency attendances across mental health conditions

We used accident and emergency (A&E) as an example of a health service utilisation outcome measure to illustrate some of the activity data available in the ELHCP Data Repository. Data relating to use of A&E across both boroughs are detailed in table 4. Based on crude attendance rates (per 1000 population), patients with mental health conditions attended A&E 54% and 76% more in Tower Hamlets and City and Hackney, respectively, compared with those without GP-recorded mental health conditions. Increased attendance rates were particularly pronounced in patients with dementia, personality disorder and SMI. Similar health service utilisation data exists relating to inpatient admissions and use of outpatient services.

Table 4

The impact of mental health conditions on A&E attendances, financial year 2017/2018

Data completeness

The ELHCP Data Repository relies on data inputted by healthcare provider organisations (GP practices, clinical coders in acute care settings, etc). It is comprised of routinely collected data that can vary in completeness across variables. Examination of four example physical and mental health conditions illustrated how data completeness varies across patient groups. Table 5 details completeness on key demographic and clinical variables for patients with type 2 diabetes, COPD, depression and SMI in Tower Hamlets. These data were all recorded in primary care databases. Data pertaining to age, sex, deprivation level, ethnicity and smoking status were complete for all patients included in the analysis. Data relating to alcohol use and clinical factors differed between those with and without health conditions, in that the data was considerably more complete for patients registered as having a condition. This improvement in completeness is likely down to the implementation of financial incentive schemes introduced as part of the UK’s QOF to improve clinical management of patients with chronic conditions.4 24 What this illustrates is that the ELHCP Data Repository is well-suited to perform research on individual physical and mental health conditions using primary care data, where data completeness is to a high standard.

Table 5

ELHLP data completeness (%) across physical and mental health conditions in Tower Hamlets, financial year 2017/2018

Patient and public involvement

The primary purpose of this project was for the health and care sector to use for system planning and effective commissioning, and therefore public stakeholders in this case comprised primary care managers, commissioners (GPs), public health experts and clinicians in primary and secondary care. A clinical expert group was convened consisting of 15 clinicians, commissioners and public health experts from primary care and secondary care (mental health) services. This group met on a quarterly basis during initial profiling of the cohort and provided feedback into the analysis and emerging clinical and demographic characteristics.

Data security

The dataset contains sensitive patient-level data; therefore, it was and remains important to ensure appropriate protections are in place. All data are deidentified, with no information such as name, home address or NHS number. Also, certain sensitive health data such as records relating to trauma and abuse, or specific sexual health data, are not visible.

To further preserve anonymity and to ensure proportionate and appropriate use of the data, access to records is limited to only being able to view aggregated records where there is a minimum of five individuals. Searches where there are fewer than five records are suppressed. Access to the complete deidentified records is further limited to specific individuals who are employed by, or under contract with, the health and care sector have knowledge of how to use the data effectively and have undergone appropriate training.

This dataset has been compiled for the health and care sector to use for system planning. Currently, plans are being made to deploy the data into an electronic platform for internal use: recognising the possible value this data holds for researchers, there may be scope to support analysis deemed to be in the public interest, with appropriate data protections in place. Currently, permissions must be sought on a case-by-case basis and approved by relevant IG governance groups.

Strengths and limitations

Strengths

One of the main strengths of the dataset is that it provides the opportunity to analyse a relatively young, urban, ethnically diverse and considerably deprived population: groups that are hard to reach with traditional survey methods. Given patient records are linked across care settings, the data can be used to evaluate the impacts of integrated interventions. In this profile, we have illustrated that the ELHCP Data Repository is representative in terms of the prevalence of most mental and physical health conditions compared with other sources of local and national prevalence data supporting its suitability for real-world research and decision making. This dataset also allows for the analysis of comorbidities and multimorbidity. Moreover, there is a large amount of data available around health service utilisation and associated spend meaning that the dataset can be used to understand what drives use and spend and support healthcare commissioning and health service delivery.

The dataset is in regular use by commissioners to better understand the health and care needs of the population, highlight where there is scope for improved patient care and evaluate the impact of health and care interventions of patients and the system. This has provided commissioners, and others working on health and care system integration, with more sophisticated evidence and analysis to inform decisions about where to target improvement efforts to maximise benefit to both patients and the system. The findings presented in this cohort profile have highlighted where links between mental and physical health are most prevalent in the population and helped to shape commissioning priorities accordingly. For example, analysis highlighting mental and physical comorbidity supported conversations about expanding Improving Access to Psychological Therapies (IAPT) services within long-term condition pathways, noting that it may be particularly impactful for people with type 2 diabetes, respiratory conditions and cancer. Furthermore, over the next year, Tower Hamlets is investing in a new 24/7 core fidelity crisis service and the transformation of community mental health services to become embedded in primary care networks. These new models will support the assessment and treatment of patients at the right place at the right time. The ELHCP Data Repository described in this cohort profile will be an important tool in evaluating the impact these service changes have on patients and the healthcare system.

In terms of completeness, this paper highlighted that data were considerably more complete for patients with long-term mental or physical illness (ie, type 2 diabetes, COPD, depression, SMI) compared with patients without. This supports the suitability of the dataset for the analysis of patients with long-term conditions.

Limitations

In terms of limitations associated with the dataset, in some cases prevalence data was not in line with local and national data, meaning that it is possible some conditions are under-recorded in primary care, that is, personality disorder, eating disorders and dementia. Also, as prevalence data relies on GP diagnoses, it is not possible to gauge disease duration in patients as the date in which a diagnosis is first recorded might not be the date the patient received the diagnosis. For example, the diagnosis might have been recorded on the day the patient registered with their GP, reflecting that date rather than the date of diagnosis. In terms of data completeness, although this is good for patients with long-term conditions, there is considerable missingness among ‘healthy’ patients in the dataset. This has also been reported in a study of a large primary care databases in the UK, which partially attributed the discrepancy in missing data to GP incentive schemes.25 It is possible to resolve this with imputation to some degree. As is the case with all routine clinical data, the dataset requires extensive cleaning to remove duplicate entries and spurious values out of clinical range (eg, BMI, blood pressure) prior to analysis. Another limitation involves the amount of data available. Currently, there are two financial years of data that are at an acceptable standard for research and/or decision support. New data continues to be collected and flow into this data set, so the time series available will increase over time. A further limitation surrounds the ability to measure population mobility in a dataset of this kind. One way is to look at the length of GP registration as a proxy for this, but this is problematic as it is not possible to determine whether the mobility is within the borough or whether patients have come from outside.

Conclusions and future collaborations

This cohort profile describes a unique patient-linked dataset in East London, which can be used to facilitate the development of similar datasets in other parts of the UK. Moreover, this dataset provides a unique opportunity to analyse a relatively young, urban, diverse and deprived population in East London. The dataset is growing year on year and data quality should improve over time, making this an invaluable resource for researchers, clinicians, commissioners and policy makers. The data are being further cleaned and evaluated using imputation, Bayesian and economic methods, including engagement with machine learning work funded by the Alan Turing Institute to improve and automate the analysis of large-scale electronic medical records (KNIFE project). Queries regarding data access should be directed to Katie Brennan (katie.brennan1@nhs.net) or Nathan Cheetham (nathan.cheetham1@nhs.net) at the NHS Tower Hamlets CCG.

References

Footnotes

  • Twitter @DrRonaldson, @QiongwenKang, @ksbhui

  • Correction notice The title of the paper has been corrected.

  • Contributors AR drafted the manuscript with input and critical revisions from all authors. QK, KBrennan, EW, GH, RF, MF and KBhui provided supervision throughout. KBhui is the principal investigator and grant holder. Analysis of the data was performed by AA, EC and IE. All authors reviewed the paper before submission.

  • Funding Funding for this project was granted by East London NHS Foundation Trust (ELFT) (184091163), the Tower Hamlets Together Board, the Health London Partnership Mental Health Transformation Board, and the City and Hackney Transformation Board. This report is independent research supported by the National Institute for Health Research ARC North Thames.

  • Competing interests MF received research grants from East London NHS Foundation Trust and The Alan Turing Institute during the conduct of this study.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. The data is housed at the NHS Tower Hamlets Clinical Commissioning Group. Permission to access the data will need to be requested from Katie Brennan (katie.brennan1@nhs.net) or Nathan Cheetham (nathan.cheetham1@nhs.net) at the NHS Tower Hamlets Clinical Commissioning Group.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.