Objectives This study was conducted to assess the validity of recording (and the original diagnostic practice) of type 2 diabetes mellitus at a hospital whose records were integrated to a centralised database (the standardised common data model (CDM) of the Saudi National Pharmacoepidemiologic Database (NPED)).
Design A retrospective single-centre validation study.
Settings Data of the study participants were extracted from the CDM of the NPED (only records of one tertiary care hospital were integrated at the time of the study) between 1 January 2013 and 1 July 2018.
Participants A random sample of patients with type 2 diabetes mellitus (≥18 years old and with a code of type 2 diabetes mellitus) matched with a control group (patients without diabetes) based on age and sex.
Outcome measures The standardised coding of type 2 diabetes in the CDM was validated by comparing the presence of diabetes in the CDM versus the original electronic records at the hospital, the recording in paper-based medical records, and the physician re-assessment of diabetes in the included cases and controls, respectively. Sensitivity, specificity, positive predictive value and negative predictive value were estimated for each pairwise comparison using RStudio V.1.4.1103.
Results A total of 437 random sample of patients with type 2 diabetes mellitus was identified and matched with 437 controls. Only 190 of 437 (43.0%) had paper-based medical records. All estimates were above 90% except for sensitivity and specificity of CDM versus paper-based records (54%; 95% CI 47% to 61% and 68%; 95% CI 62% to 73%, respectively).
Conclusions This study provided an assessment to the extent of which only type 2 diabetes mellitus code can be used to identify patients with this disease at a Saudi centralised database. A future multi-centre study would help adding more emphasis to the study findings.
- Health informatics
- DIABETES & ENDOCRINOLOGY
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
We examined the validity of using only the code of type 2 diabetes mellitus in cohort and outcome identification versus three reference standards in the standardised electronic health records (EHRs) of a single hospital.
We further examined the validity of two additional algorithms to identify type 2 diabetes in cohort and outcome identification in the original EHRs of the hospital.
To our knowledge, our study was the first of its kind in the region.
Our study was limited by including only one centre.
Not including all type 2 diabetes mellitus-related diagnostic codes was another limitation.
Data collected electronically from the provision of routine clinical care (ie, real-world data (RWD)) have been used to generate evidence (real-world evidence (RWE)) on benefits, risks and the usage of pharmaceuticals.1–10 In Saudi Arabia, the electronic recording of health data in hospital settings has increased at the major tertiary hospitals during the last decade.11–14 In 2018, the Saudi Food and Drug Authority established the National Pharmacoepidemiologic Database (NPED) to integrate and standardise electronic health records (EHRs) from different hospitals in Saudi Arabia.11 The NPED was initiated to maximise the usage of RWE in supporting drug regulatory decision-making processes.11 The NPED will also be used in determining disease natural histories and trends in Saudi Arabia.11 A standardisation was performed for the EHRs that were imported from the first hospital using the Observational Health Data Sciences and Informatics common data model (CDM).11 The standardisation process was followed by an initial data quality assessment (no alarming concerns were identified). However, this quality assessment did not include assessing the validity of the recorded data.11 Additionally, and up to our knowledge, no study has been published to assess the validity of the health recording practice at any of the Saudi hospitals (especially those of disease diagnostic codes).
The validity of RWD is integral in conducting pharmacoepidemiological research studies.8 15 16 Conducting validation studies in the Saudi healthcare system would assist not only in improving the quality of the generated RWE but also in supporting stakeholders in implementing their quality improvement initiatives. Validating the diagnostic codes of diabetes (especially type 2 diabetes mellitus) is a priority given its high prevalence in Saudi Arabia (up to 25% of the Saudi population was estimated to have diabetes with an increased prevalence of 51% among the 70 to 79-year-old population), and given the lack of well-designed and large-scale pharmacoepidemiological studies in the Saudi population with diabetes.17–19 Studies have shown that the validity of recording diabetes mellitus in the context of RWD has been assessed in different health records using different types of data sources (eg, physician claims, hospital discharge data, EHRs), with different reference standards (mostly medical records, self-reported or telephone surveys) and different case definitions (eg, using one diagnostic code or one claim for diabetes mellitus (and/or another indicator of diabetes mellitus such as high glucose levels), two or more codes/claims).20–24
This study was conducted to assess the validity of the original, the extracted and the standardised diagnostic codes of type 2 diabetes mellitus of the EHRs that were imported from the first hospital to the NPED. The validity of the original diagnosis of type 2 diabetes mellitus at that hospital was also assessed. Finally, the study was aimed to assess whether the diagnostic code of type 2 diabetes can be used to identify patients with type 2 diabetes (or diabetes as an outcome) in the standardised EHRs of that hospital.
Study design, data source and patient population
This study was a retrospective single-centre validation study. The study was carried out using the EHRs that were imported and mapped from a 129-bed private tertiary care hospital in Riyadh (the imported EHRs included a record of at least 500 000 patients) to the NPED. A sample of patients with type 2 diabetes mellitus (one code of type 2 diabetes mellitus), who visited the hospital in the period between 1 January 2013 and 1 July 2018, was randomly selected from the standardised EHRs of the hospital (ie, CDM), then (using the standardised EHRs) a control group (patients without type 2 diabetes mellitus) was randomly matched based on age and sex (a control group was included for the estimation of specificities and negative predictive values). The included participants were required to be ≥18 years and have at least one health record.
The standardised diagnostic code of diabetes in the CDM was validated using a three-step validation approach. The first validation step was aimed to confirm the presence (in the included cases) and the absence (in the included controls) of type 2 diabetes in the sample that was extracted from the CDM by reviewing the patients’ original EHRs at the hospital (the first reference). Diseases were coded during the study period at the hospital using the International Statistical Classification of Diseases and Related Health Problems 9th Revision (ICD-9) code (ICD-9 code of type 2 diabetes mellitus: 250.00). This validation step will help in assessing the accuracy of the CDM standardisation process at the NPED, the completeness of the EHR extraction process, and the validity of the original coding ICD coding of type 2 diabetes mellitus at the hospital.
The second validation assessment was performed using a different reference: the patients’ paper-based medical records. This validation step also included a comparison between the original EHRs (the first reference) versus paper-based medical records (the second reference) as an additional step to validate the former reference. In the final validation step, which was also a step to validate the original diagnosis of type 2 diabetes mellitus at the hospital, all study patients (both cases and controls) were re-assessed for the presence of type 2 diabetes by one of the study physicians (the third reference) based on the hospital criteria which are adopted from the American Diabetes Association classification and diagnosis of diabetes (box 1).25 The physician was allowed to use all resources at the hospital that are necessary to complete the diagnosis. The code of patients with type 2 diabetes in the standardised CDM was compared with the third reference. The findings from the assessment of the third reference were also compared with those of the first (original EHRs) and the second (paper-based medical records) references as an additional validation step of the latter two references. An additional step to confirm the diagnosis of type 2 diabetes mellitus in the original EHRs was performed using two algorithms:
First algorithm: type 2 diabetes code+a prescription of an antidiabetic medication.
Second algorithm: type 2 diabetes code+a prescription of an antidiabetic medication+a blood measurement reflective of diabetes.
Criteria for diagnosing type 2 diabetes mellitus
FPG ≥126 mg/dL (7.0 mmol/L).*
2-h PG ≥200 mg/dL (11.1 mmol/L) during an OGTT.
Haemoglobin A1c ≥6.5% (48 mmol/mol).
Symptoms of hyperglycaemia or hyperglycaemic crisis (polyuria, polydipsia and unexplained weight loss), AND a random plasma glucose ≥200 mg/dL (11.1 mmol/L).
On therapy for diabetes mellitus (antidiabetic medications) and previous diagnosis of diabetes mellitus in medical records.
FPG, fasting olasma glucose; 2-h PG, 2-h plasma glucose; OGTT, oral glucose tolerance test.
*In the absence of unequivocal hyperglycaemia, these criteria should be confirmed by repeat testing on a different day.
These algorithms were chosen based on clinical judgement and based on other previously published algorithms.20 21
The degree to which the results of the algorithm assessments agree with the code-only analysis will also provide more information on whether only the code can be used to identify this population in both the CDM and EHRs. Sensitivity, specificity, the positive predictive value (PPV) and the negative predictive value (NPV) were the targeted parameters in this study. The values of these validation estimates might give an indication of the extent to which only the diagnostic code of type 2 diabetes mellitus can be used to identify type 2 diabetes mellitus as an outcome in the CDM.
The validation parameters were estimated for each pairwise comparison with their corresponding 95% CIs. To demonstrate a sensitivity of 85% and an expected width of 95% CI of 10% (taking into account the 25% prevalence of type 2 diabetes mellitus in Saudi Arabia), a minimum sample size of 196 patients (98 cases and 98 controls) was needed.17 18 26 The minimum total sample sizes for validating the first and second algorithms in the original EHRs were 138 and 73, respectively. All statistical analyses were performed using RStudio V.1.4.1103.
Patient and public involvement
Patients and the public were not involved in the design, conduct or reporting/dissemination of this study.
Table 1 shows the number of patients who were included in each pairwise comparison. A total of 437 random patients with type 2 diabetes mellitus (427 (98.0%) were on antidiabetics and/or had haemoglobin A1c (HbA1c) measurement of >6.5%) were identified and matched with 437 controls. Almost one-third of the cases were identified before 2016 (141 of 437 (32.3%)). Of the totally matched pairs, only 190 (43.0%) had paper-based medical records. The median age of the included patients (both the cases and controls) was 56 years (IQR=21), and 522 of the included patients (60.0%) were men. Type 2 diabetes mellitus (among the cases) was diagnosed between 2007 and 2018. The majority of the cases (83.6%) had abnormal HbA1c levels at the time of (or within 6 months) the index date (the date of diagnosing type 2 diabetes mellitus).
The estimates of validating the standardised code of type 2 diabetes mellitus in the CDM versus two references (EHRs and the re-diagnosis) were all above 90% (table 1). The validation estimates for EHRs versus re-diagnosis were also above 90%. The PPV and specificity for CDM versus paper-based documentation were 46% and 32% lower compared with those versus EHRs and re-diagnosis (table 1). Of 190 cases that were included in the validation assessment with the paper-based documentation as a reference, 87 (46%) did not have any records for type 2 diabetes mellitus in their medical charts. Type 2 diabetes mellitus among these 87 was mostly diagnosed after 2013 (the year of the large-scale usage of EHRs at the hospital), and only 8 patients were diagnosed with diabetes before 2013. This may justify the absence of diabetes recording in their paper-based medical charts. The results of the validation assessment of the first and second algorithms in the EHRs were comparable with that of the type 2 diabetes mellitus code-only analysis (table 2).
This study assessed an approach to the population of type 2 diabetes mellitus using the disease-only code in standardised EHRs of a Saudi hospital. With the exception of the assessment versus paper-based records, all validation estimates of the standardised and the original codes of type 2 diabetes mellitus were above 90% (the estimates of the algorithms were almost comparable with these estimates). These findings might be supportive of using only the standardised diagnostic code to identify type 2 diabetes mellitus as an outcome in these records.
In our assessment of the validity of the standardised code of type diabetes mellitus in the CDM, the minimum value of sensitivity (93% vs paper-based records) was higher than the average minimum value observed in the previous diabetes-case definition validation studies (26.9%).20 21 23–25 On the other hand, the minimum value of specificity (68%) was lower compared with the average minimum value observed in the published studies (88%).20–24 The minimum values of PPV and NPV were almost comparable with the average minimum values that were observed in the published studies (54% vs 54% and 92% vs 90.8%, respectively).20–24 Two of the references in our study (EHRs and re-diagnosis) were used as references in the previous diabetes-case validation studies.20–24 In 44.4% (8 of 18) of the published diabetes-case definition validation studies, the original EHRs were used as a reference (the other references were the physician re-diagnosis, self-reported or telephone surveys, and a multisource approach).20–23 Type 2 diabetes mellitus was confirmed in 100% of the cases in our study, which is almost comparable to the confirmation results in a previous study in which the re-diagnosis was used as a reference.24 The value of validation estimates in our study might be supportive of using only the diagnostic code to identify patients with type 2 diabetes mellitus as cohorts or to identify diabetes as an outcome in the standardised EHRs that were imported from the first hospital.
Our study was the first diabetes-case definition validation study (and the first validation study for a diagnostic code) in the region. Three reference standards were used in our study, and validity was assessed at different three levels (code extraction, code standardisation and the original diagnosis of diabetes) and was compared with those of algorithms. Our study has two limitations. First, the study was a single-centre study. The generalisability may improve by conducting a multi-centre study that takes the variability of hospital coding systems into account. Second, ICD-9 was used to code type 2 diabetes mellitus at the hospital during the study period; however, the hospital (and other hospitals) started upgrading their coding system to ICD-10. Additionally, we did not include other type 2 diabetes-related ICD-9 codes (codes for uncontrolled diabetes and diabetic complications) in our assessment. Including the same hospital in a future single or multi-centre with an updated sample of ICD-10 codes of type 2 diabetes mellitus and/or its complications would help in adding more emphasis to the study findings.
We assessed the validity of the (standardised) diagnostic code of type 2 diabetes mellitus at different recording levels and provided an indication to the extent to which this code can be used to identify this disease as an outcome. A future multi-centre study that includes an updated sample from the hospital with ICD-10 codes of type 2 diabetes mellitus and/or its complications would help in adding more emphasis to the study findings.
Data availability statement
Data are available upon reasonable request.
Patient consent for publication
The study was approved by the SFDA ethics committee (ethics approval number: 2020_012).
We thank the information technology (IT) and the medical records (MR) departments in the Kingdom Hospital for the help in accessing the patient records and retrieving the data required for this research.
Contributors TAA, MWK and TMA designed the study. MMA, RTT, RAA and FAA contributed to the data collection process. All authors were involved in designing, analysing and conducting the study. MAM was involved in confirming the diagnosis. TAA drafted the report. All authors critically reviewed the manuscript. TAA and TMA accept full responsibility for the work and conduct of the study, had full access to the data and controlled the decision to publish. TMA is responsible for the overall content as the guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.