Objectives The Italian project MATRICE aimed to assess how well cases of type 2 diabetes (T2DM), hypertension, ischaemic heart disease (IHD) and heart failure (HF) and their levels of severity can be automatically extracted from the Health Search/CSD Longitudinal Patient Database (HSD). From the medical records of the general practitioners (GP) who volunteered to participate, cases were extracted by algorithms based on diagnosis codes, keywords, drug prescriptions and results of diagnostic tests. A random sample of identified cases was validated by interviewing their GPs.
Setting HSD is a database of primary care medical records. A panel of 12 GPs participated in this validation study.
Participants 300 patients were sampled for each disease, except for HF, where 243 patients were assessed.
Outcome measures The positive predictive value (PPV) was assessed for the presence/absence of each condition against the GP's response to the questionnaire, and Cohen's κ was calculated for agreement on the severity level.
Results The PPV was 100% (99% to 100%) for T2DM and hypertension, 98% (96% to 100%) for IHD and 55% (49% to 61%) for HF. Cohen's kappa for agreement on the severity level was 0.70 for T2DM and 0.69 for hypertension and IHD.
Conclusions This study shows that individuals with T2DM, hypertension or IHD can be validly identified in HSD by automated identification algorithms. Automatic queries for levels of severity of the same diseases compare well with the corresponding clinical definitions, but some misclassification occurs. For HF, further research is needed to refine the current algorithm.
- Validation study
- electronic medical records
- PRIMARY CARE
- DIABETES & ENDOCRINOLOGY
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
This is the first validation study specifically performed in the Italian General Practitioners' (GP) medical records database Health Search.
The positive predictive value of automatic case-finding algorithms for four chronic diseases and their levels of severity was estimated, using manual assessment of the GPs themselves as a gold standard.
A total of 12 GPs contributed to the study and 300 cases were assessed for type 2 diabetes, hypertension and ischaemic heart disease, 243 for heart failure.
This study is part of a national Italian project funded by the Ministry of Health and aimed to validate information systems available in the country to monitor quality of healthcare for chronic diseases.
Participation of the GPs was voluntary, and sensitivity could not be assessed.
Italy is facing an increasing burden of chronic health conditions due to ageing of the population. To provide adequate and fair healthcare across regions, Italy was advised by the Organisation for Economic Co-operation and Development to develop a set of standards around the processes and outcomes of primary care and to develop a national quality governance model to support regions in delivering care of uniform quality across the country.1 The Italian National Agency for Regional Healthcare Services started the MATRICE Project in 2011. MATRICE was aimed at developing tools to compare quality of healthcare across Italian regions of four chronic diseases: type 2 diabetes (T2DM), hypertension, ischaemic heart disease (IHD) and heart failure (HF). One of the objectives was to assess the validity of routine care data to monitor quality of healthcare supply.2 ,3
The Health Search IMS Health Longitudinal Patient Database (HSD) is a longitudinal primary care medical record database that was set up by members of the Italian College of General Practitioners (SIMG). More than 900 physicians, uniformly distributed across Italy, share their de-identified clinical records in the HSD. These data are extensively used for epidemiological and public health research.4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12
The HSD database is very similar to other Primary Care databases: for instance, in the UK, the Clinical Practice Research Datalink (CPRD, formerly GPRD),13 The Health Information Network (THIN)14 and QResearch;15 in Canada, the Canadian Primary Care Sentinel Surveillance Network (CPCSSN);16 in the Netherlands, the Integrated Primary Care Information database (IPCI);17 in Spain, the database Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria (BIFAP).18 A common feature of these countries is that the GP serves a well-defined population and is the gatekeeper to secondary care.
In this type of medical records, every visit is recorded and all diagnoses, prescriptions and measurements are recorded as part of a general practitioner's daily practice. Moreover, information from specialist referrals is reported back to the general practitioner (GP) and stored in the same medical records. The medical records replace the paper records that once existed and may be considered a rather comprehensive list of health problems requiring care. Owing to its longitudinal, population-based nature, this type of databases serves well for many research purposes, such as estimate of burden of disease, pharmacoepidemiologic and health services research. Since information is collected primarily for the provision of care, the quality of coding diagnoses may not always be accurate and also this varies by type of disease.19 ,20 Moreover, the same code may be used for different clinical definitions of a disease, when diagnostic standards are not uniform across healthcare communities or change over time. As a consequence, case-finding algorithms that retrieve participants from such data sources using diagnostic codes may unintentionally retrieve participants whose clinical condition does not correspond to the one intended for a study. On the other hand, GPs may record clinical conditions as a free text note, and a retrieval strategy using only coded diagnoses may miss some cases. As a result, algorithms using other sources, such as results from laboratory tests, have been developed to query this type of data sources.21 Disease-specific validation studies of case-finding algorithms in GP medical record databases have been performed in CPRD,22 ,23 CPCSSN,20 ,24 IPCI and HSD itself.25 ,26
A panel of experts in the MATRICE Project established precise clinical definitions of T2DM, hypertension, IHD and HF, and identified levels of severity of the conditions that had to be distinguished for the purpose of monitoring healthcare quality. The aim of this study was to estimate the positive predictive value (PPV) of case-finding algorithms to detect such conditions and levels of severity from the GP medical records collected by HSD, against a gold standard based on manual comparison with the clinical definitions chosen by the MATRICE panel.
Italy has a tax-based, universal coverage national health system. Every Italian resident is entitled to choose a GP, although parents might instead opt for a specialist paediatrician for their children, up to the age of 15. Therefore, each resident from the age of 16 onwards is specifically registered with a GP. GPs are the ‘gatekeepers’ of the system, meaning that patients can only access secondary care within the healthcare system on referral of their GP.27 ,1 Secondary care is accessed either free of charge or on a small copayment.
During their daily practice, GPs record all clinical findings, diagnoses and prescriptions in their electronic medical records. GPs participating in HSD all use the same software, which requires that each prescription is associated with a specific disease code. A disease code may be labelled as ‘suspect’ when further clinical ascertainment is needed. Results from laboratory tests may be recorded as well. Moreover, free-text fields are available in the software to collect clinical notes on diagnoses, signs, symptoms and referral letters from specialists or from hospitals. Every 6 months, GPs send their data to a central repository, after anonymisation. The central repository performs quality controls, like estimation of prevalence of common diseases, and selects GPs whose data prove to be accurate.4 Currently, data of 700 of 900 GPs, uniformly distributed across Italy, are considered accurate according to data quality checking.12
Clinical definition of the diseases and of their levels of severity
A panel of cardiologists, diabetologists, epidemiologists and experts in organisation of primary care services participating in the MATRICE Project first established clinical definitions of the four diseases and of their levels of severity. The levels of severity were selected according to whether national and international clinical guidelines contained specific indications for treatment or diagnostic follow-up in the patients with that condition. For instance, a patient with IHD after an episode of acute myocardial infarction (AMI) has an indication for treatment with β-blockers;28 hence, history of AMI is a relevant level of severity for IHD.
The detailed definitions of the diseases and of the levels of severity are depicted in table 1. For T2DM, the clinical definition was at least two abnormal measurements among fasting plasma glucose, or 2-hour plasma glucose after a load of glucose, or glycated haemoglobin; or just one abnormal measurement of plasma glucose if symptoms of hyperglycaemia were observed. Four levels of severity of the disease were identified, according to the presence/absence of indication for insulin and the presence/absence of complications or organ damage. For hypertension, diagnostic criteria were two abnormal measurements for systolic and diastolic blood pressure confirmed either by a Holter blood pressure measurement or by home blood pressure monitoring. Three levels of severity were identified: no organ damage or diabetes or stroke; organ damage or diabetes or stroke without HF; hypertension with HF. For IHD, the clinical definition referred to symptoms (angina pain) or to a history of AMI or to a bioimaging observation of coronary ischaemia. Five levels of severity of IHD were identified: the most severe was HF, and among those free from HF the presence/absence of history of AMI and the presence/absence of percutaneous transluminal coronary angioplasty (PTCA) classified the four levels. HF was also identified as a condition on its own, and the clinical definition was stage C or D of the classification by the American College of Cardiology and of the American Heart Association.29
Case identification in the primary care medical records
A panel comprising epidemiologists from HSD and GPs belonging to SIMG, with expertise in the clinical areas of interest, developed ad hoc algorithms to identify from the GP medical records cases matching the clinical definitions of the MATRICE Project. In each algorithm, the inclusion and exclusion criteria of the clinical definition were mapped to a list of ICD9CM codes or, when deemed necessary, to free text and to conditions on diagnostic tests.
An invitation to participate in the validation study was circulated by SIMG to a sample of GPs, and participation was voluntary.
A data collection plugin for the medical record software was developed by HSD and installed in the computers of the GPs who accepted to participate in the study.
For each disease, the plugin applied the algorithm to the whole list of active patients on the date of data collection and selected a random sample of 25 patients from the resulting list of cases. The plugin then showed the names of the patients in the sample to the GP for assessment. On the same screen, the clinical definitions of the disease and of its levels of severity were presented. The GP was aware that the cases were selected for a specific condition among T2DM, hypertension, IHD and HF, but was blinded to the level of severity. The GP had the choice of indicating that the patient was not affected by the disease (false positive: FP) or that the patient was affected but the level of severity could not be assessed on the basis of the information available to the GP (not staged: NS), or to assign a level. In the process, the GP was free to access the patient's medical record. Figure 1 shows a screenshot of the data collection plugin (patients listed are not real).
When the GP had completed the manual assessment of the four samples of patients, the plugin applied the algorithms for the level of severity, linked the new columns to the data set resulting from the questionnaire, anonymised the final data set and transmitted it to HSD for statistical analysis.
Data collection was performed in July 2013.
The formula to compute the PPV, for presence/absence of the condition and for each of the levels of severity, was and 95% CIs were estimated. For the three diseases where levels of severity had been validated (T2DM, hypertension and IHD), Cohen's κ was computed to the categorical distribution of levels of severity. Cohen's kappa discounts from observed concordance (the percentage of participants who are classified in the same level by the algorithms and the GP), the expected concordance (the percentage of participants who would be classified the same if assignment had been performed randomly) by means of the following formula
Cohen's κ provides an overall measure of agreement about levels of severity. Analysis was performed on the whole sample of patients, irrespectively on the GP. Analysis was performed using Stata V.12.
Algorithms for primary care medical records
The algorithms detecting the diseases and levels of severity from the primary care medical records are listed in table 2. Each algorithm consists of a sequence of rules, each acting as an inclusion criterion, a refinement criterion (linked to the inclusion criterion with the logical connector AND) or a refinement criterion (linked to the inclusion criterion with the logical connectors AND NOT). Each rule is itself composed by subqueries (represented in the table by keywords in round parentheses), and subjects matching at least one of the subqueries are included in the rule, that is, the subqueries are linked to each other with the logical connector OR. No specific temporal sequence between subqueries is requested. Every subquery selects records matching a specific list of codes, free text keywords and/or diagnostic test levels, which are listed in supplementary material 1. All the subqueries are applied to the whole set of longitudinal observations of the patients up to the index date, except when specified otherwise.
In all subqueries, records labelled with ‘suspect’ were excluded, except in the case of the subquery detecting patients with AMI, used to detect levels of severity 3 and 4 in patients with IHD.
A panel of 12 GPs participated in the validation study. A total of 300 patients were identified and validated for each disease, except for HF, where due to low prevalence of the condition only 243 patients were included.
The PPV of the algorithms were 100% (CI 99% to 100%) for T2DM and hypertension and 98% (CI 96% to 100%) for IHD. For HF, PPV was 55% (CI 49% to 61%).
For T2DM, the second and fourth levels of severity had very high PPV (88% and 93%, respectively). Around 20% of patients without indication for insulin (first and third levels of severity) had their presence of complications misclassified. Both possibilities took place: patients with complications were identified as being free from them, and vice versa. Overall, Cohen's kappa was 0.70, indicating a good level of agreement (table 3).
Among hypertensive patients, every level of severity was misclassified in <20% of patients, and in the case of the middle level (organ damage and/or diabetes, no HF) patients were in fact almost all less severe with respect to the level they were automatically assigned to. Cohen's κ was 0.69, showing good agreement (table 3).
In the case of IHD, the first and fifth levels of severity had excellent PPV, while AMI was incorrectly identified by the algorithm in 22% of cases. The two levels of severity characterised by the presence of PTCA were never manually indicated by the GPs, and the automatic algorithm was in almost perfect agreement in both cases. Overall, Cohen's κ was 0.69, showing good agreement (table 3).
This study shows that almost all of the automatically detected cases of T2DM, hypertension and IHD, but only the 55% of cases of HF were true cases as assessed by the GP, on the basis of their own records and personal knowledge. Automatic classification of levels of severity of T2DM, hypertension and IHD was acceptable although less accurate. In the case of IHD, mild cases were misclassified as severe, while in T2DM and hypertension both possibilities took place: severe patients were automatically classified as mild and vice versa. Our results provide guidance on the interpretation of results of studies using those algorithms to define variables in medical records or HSD, for instance to monitor quality of healthcare.
The excellent PPV of some algorithms is not unexpected: algorithms to detect diabetes (irrespective of type) and hypertension, as well as other chronic conditions, had similarly high validity in the CPCSSN.20 ,24 The low PPV that we observed in the case of HF is unsurprising as well. Indeed, HF is a syndrome, and several different clinical definitions of HF have been used among clinicians in the recent past, such as the Framingham and European Society of Cardiology criteria. It has been shown that the changing definition has a noticeable impact on the epidemiology of the condition.30 Our definition comprises two of the four stages of the classification of the American College of Cardiology and of the American Heart Association, which has been itself revised repeatedly in recent years.29 It is most likely that the GPs in the sample themselves adopt a different definition to diagnose HF with respect to the one proposed by the MATRICE panel. Further research could investigate whether a clinical definition modelled on the Italian guidelines may be more easily identified in primary care medical records. Finally, poor performance of ICD9CM codes in identifying a specific clinical definition of HF has been consistently reported in the literature.31
The results we observed for levels of severity are probably due to reasons which are more specific to HSD. In the case of T2DM, some misclassification occurred between level 1 and level 3, that is, patients with or without complications but without indication for insulin use. Adding rules that explore free text comments to the diagnostic codes may improve those algorithms. In the algorithm developed for this study, patients with IHD labelled with ‘suspect’ AMI were included in the ‘AMI’ level of severity, because in a previous study specific algorithms to identify AMI had been observed to have low sensitivity in HSD.25 As a result, our sensitive algorithm did indeed capture all the cases of AMI in patients with IHD; however, 22% of patients had not had an AMI, so this strategy needs to be reconsidered for future research. Remarkably, GPs were not aware of a single case of PTCA in their patients, as confirmed by automatic querying of their records. The absence of PTCA could be due to the fact that this procedure is only performed in hospital. In Italy, hospitals do not provide GPs with discharge letters; rather patients themselves must describe to the GP what happened during an inpatient care episode, and the PTCA procedure may be communicated inadequately to the GP.
Strategies to improve communication between hospital and primary care should be implemented in Italy for the purpose of improving quality of primary care medical records, as well as to improve healthcare for patients with severe cardiovascular conditions.
The sample of GPs was self-selected and may have been composed of those with more accurate data recording attitudes. In particular, two GPs in the validation sample also participated in the panel that created the algorithms. The PPV of the algorithms may be lower in the general group of GPs contributing to HSD.
Providing a direct estimate of sensitivity of the case-finding algorithms was not an aim of this study, because it would have been unfeasible for GPs to assess a large enough sample of their patients. Indeed, the number of patients needed to estimate the sensitivity of a test is bigger with less prevalent conditions: for instance, to estimate sensitivity of a case-finding algorithm for T2DM with a marginal error of 5%, even assuming 10% prevalence (an overestimation, according to common estimates in Italy32) and 95% sensitivity would have required an additional sample of more than 700 participants; to obtain a valid estimate of sensitivity for HF, which has much lower prevalence and a lower expected sensitivity, GPs would have needed to assess thousands of participants.33
This study shows that participants with T2DM, hypertension or IHD can be validly identified in HSD by automated identification algorithms. Automatic queries for levels of severity of the same diseases compare well with the corresponding clinical definitions, but some misclassification occurs. For HF, further research is needed to refine the current algorithm.
The authors wish to thank Adam Atkinson and Annamaria Cucinotta for linguistic revision.
Contributors RG, MJS, GM, PF, NK and MS conceived the study. CM designed and developed the data collection tool. GM, PF, FL and AP developed the algorithms. CM, AP, IC and CC conducted data collection. AP and EB conducted data analysis. VB and GR revised and translated the clinical terminology. RG wrote the manuscript and all the authors contributed with revisions and insight.
Funding This study was supported by the project named ‘Integrazione dei contenuti informativi per la gestione sul territorio di pazienti con patologie complesse o con patologie croniche’, MATRICE, funded by the Italian Ministry of Health in the framework of the MATTONI Program.
Competing interests MJS is an employee of a pharmaceutical company. Genomedics is a private company owned by IC. RG, FL, GR and MS have conducted pharmacoepidemiological studies funded by pharmaceutical companies.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.