Introduction In an increasingly digital age for healthcare around the world, administrative data have become rich and accessible tools for potentially identifying and monitoring population trends in diseases including epilepsy. However, it remains unclear (1) how accurate administrative data are at identifying epilepsy within a population and (2) the optimal algorithms needed for administrative data to correctly identify people with epilepsy within a population. To address this knowledge gap, we will conduct a novel systematic review of all identified studies validating administrative healthcare data in epilepsy identification. We provide here a protocol that will outline the methods and analyses planned for the systematic review.
Methods and analysis The systematic review described in this protocol will be conducted to follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. MEDLINE and Embase will be searched for studies validating administrative data in epilepsy published from 1975 to current (01 June 2018). Included studies will validate the International Classification of Disease (ICD), Ninth Revision (ICD-9) onwards (ie, ICD-9 code 345 and ICD-10 codes G40–G41) as well as other non-ICD disease classification systems used, such as Read Codes in the UK. The primary outcome will be providing pooled estimates of accuracy for identifying epilepsy within the administrative databases validated using sensitivity, specificity, positive and negative predictive values, and area under the receiver operating characteristic curves. Heterogeneity will be assessed using the I2 statistic and descriptive analyses used where this is present. The secondary outcome will be the optimal administrative data algorithms for correctly identifying epilepsy. These will be identified using multivariable logistic regression models. 95% confidence intervals will be quoted throughout. We will make an assessment of risk of bias, quality of evidence, and completeness of reporting for included studies.
Ethics and dissemination Ethical approval is not required as primary data will not be collected. Results will be disseminated in peer-reviewed journals, conference presentations and in press releases.
PROSPERO registration CRD42017081212.
- factual database
- administrative claims
- validation studies
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
The protocol describes what will be the first systematic review to conduct a worldwide assessment of the accuracy of administrative data in identifying epilepsy and the optimal disease-identification algorithms.
This protocol also describes what will be the first systematic review to make an assessment of risk of bias, quality of evidence, and completeness of reporting for studies validating administrative healthcare data in epilepsy identification.
The review described in this protocol will be limited to assessing the use of administrative data in diagnosing epilepsy within observational studies, which are more prone to bias than randomised controlled trials.
A systematic review of the diagnostic accuracy of administrative data within randomised controlled trials in epilepsy remains to be completed and is out of the scope of the current review.
Administrative healthcare databases are electronic data sources that consist of demographic, diagnostic and clinical information routinely collected about patients when they use a healthcare service.1 They are often national and mandatory, and therefore they have the potential to provide a relatively cheap, widely available and less intrusive resource for medical research.1 However, the accuracy of the information held in an administrative database needs to be validated before such use can be made. This is because the administrative data were not originally collected for research, but for other purposes such as assisting in health insurance claims. The clinical information held may therefore lack the rigour in accuracy that might be expected in scientifically collected data. Furthermore, the data may be limited by inaccurate or incomplete hospital discharge letters or clinical coding transcription errors.2
The validation of administrative data involves comparing the diagnostic codes held within the administrative database against a reference standard (such as medical records) in order to quantify the number of instances in which the administrative diagnosis made matches the diagnosis in the reference standard (deemed to be the true diagnosis). In this way, the administrative database can be handled like a diagnostic test and measures of disease-identification accuracy calculated. These measures usually include the sensitivity, specificity and the positive or negative predictive value (PPV or NPV, respectively). Optimal disease-identification algorithms can also be determined by making relative comparisons of predictive values after adding in data from other variables recorded in an administrative database, such as drug combinations, investigations, and procedures.
There are many administrative databases worldwide in which the process of validation has been successfully performed for many diagnostic codes.3–9 There are also examples of where the results of these have been pooled successfully into systematic review to increase confidence in the estimates made and scrutinise the quality of evidence, and this has led to changes in practice.4 There has been limited systematic review of the validation of administrative databases in capturing epilepsy as a diagnosis. The only systematic review10 on this subject included only studies from the USA or Canada and therefore excluded 121 studies because the data sources were not from these two countries. Furthermore, the 11 studies included were published between 2000 and 2010, making the conclusions nearly a decade old now. With health informatics now at the forefront of epidemiological disease surveillance, it is important to have an update on performance of the administrative disease-identification codes. Only one of the included studies evaluated the performance of the International Classification of Disease (ICD), 10th Revision (ICD-10) system in capturing epilepsy within administrative datasets11; the remainder evaluated the older ICD, 9th Revision (ICD-9) system.10 The review also made no assessment of risk of bias, quality of evidence and completeness of reporting for included studies. This limits the confidence with which conclusions can be interpreted. There is now need for a more contemporary systematic review of the validation of administrative databases in capturing epilepsy. This should include evaluating performance of the ICD-10 system, as well as other non-ICD disease classification systems used, such as Read Codes in the UK.12 13 The review should include studies from anywhere in the world in order to give clinicians and researchers a representative picture of the performance of administrative data in capturing epilepsy as a diagnosis and in order to allow more generalisable diagnostic algorithms to be suggested. Furthermore, the review should make an assessment of risk of bias, quality of evidence and completeness of reporting for included studies. These are the aims of the proposed systematic review described in this protocol. This will help researchers and clinicians better understand the accuracy of global estimates for incidence, prevalence and population characteristics in epilepsy, which have largely been made using administrative data.
Aims and objectives
The study hypotheses are:
Administrative data can correctly identify people with epilepsy within a population with a high degree of accuracy. We predict the PPV to be above 80%.
The optimal disease-identification algorithms for epilepsy within administrative datasets take into account diagnoses, investigations and drug combinations.11 14–20
The aim of the systematic review is to quantify the disease-identification accuracy and algorithm performance of administrative healthcare data in epilepsy. To this end, the research questions are:
How accurately do administrative data identify epilepsy within a population as measured by sensitivity, specificity, PPV, NPV, or area under the receiver operating characteristic curve (AUC) analysis (which are the approved accuracy measures described in the Standards for Reporting Diagnostic Accuracy (STARD) statement)?21 This will be the primary outcome.
What are the optimal administrative data algorithms for correctly identifying epilepsy within a population? This will be the secondary outcome.
A preliminary feasibility search of the MEDLINE database via PubMed identifies at least nine studies validating diagnostic epilepsy codes held within administrative databases around the world that could be used to answer these research questions.11 13–20
Methods and analysis
This protocol follows the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) checklist.22 23 The systematic review will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist.24
We will include studies according to the following criteria:
Language: there will be no language restriction on full-length articles, although abstracts will need to be in English to allow the authors to screen them. We will seek translations for full-length articles not written in English that appear eligible in abstract. These will remain in the section for ‘studies still awaiting classification’ and will feature in subsequent updates to the review if not translated by the time of initial publication.
Setting: there will be no restrictions by study location worldwide. Where possible, we will show pooled accuracy and best algorithm data for administrative datasets from individual countries in addition to pooled global estimates of these measures.
Databases: the data sources will be routine administrative healthcare databases. This means that the data should have been routinely and passively collected without an a priori research question.4 We will include databases containing diagnostic codes for epilepsy classified on the ICD system, where we will restrict this to studies using the ICD-9 onwards (active from 1975 to 1994).25 This is because although the ICD system is currently in its 10th revision (ICD-10; active from 1994 to present) and the primary coding system is used by many countries around the world, a significant proportion of countries, particularly developing ones, still use the ICD-9 system.11 ICD-9 code 345 and ICD-10 codes G40 and or G41 will be used to identify epilepsy diagnoses. We will provide summary measures of accuracy and best algorithms for any other disease classification systems used in studies separately, for example, the primary care Read Code system used in the UK.12 13
Study design: prospective or retrospective observational studies including cohort or case–control designs that are community-based/population-based or primary/secondary/tertiary care-based and have used administrative databases.
Participants: people with epilepsy of all ages. Where available, we will additionally report data for adults and children separately.
Observations and outcomes: studies will need to employ a validation process for diagnostic epilepsy codes, that is, estimate the disease-identification accuracy of the epilepsy codes held within the database using sensitivity, specificity, PPV, NPV or AUC analysis.21 In this, true positives (TPs) and false negatives (FNs) will be considered as the patient has the disease and the administrative diagnosis is positive or negative, respectively. True negatives (TNs) and false positives (FPs) will be considered as the patient does not have the disease and the administrative diagnosis is negative or positive, respectively. Sensitivity will be considered as the ability of the administrative database to correctly identify those patients with the disease (TP/(TP+FN)). Specificity will be considered as the ability of the administrative database to correctly identify those patients without the disease (TN/(TN+FP)). PPV will be considered as how likely it is that a patient has the disease if the administrative diagnosis is positive (TP/(TP+FP)). NPV will be considered as how likely it is that this patient does not have the disease if the administrative diagnosis is negative (TN/(TN+FN)). AUC analysis will be considered as where TP and FP are plotted against each other in a perfect correlation as reference to show poor test accuracy, then accuracy measured as the area under the curve created by plotting the actual values against each other.26
Studies may also use diagnostic as well as other variables (eg, admissions, drugs or investigations) to calculate optimal disease-identification algorithms for epilepsy within the database. Studies will need to provide a clinical reference standard. An appropriate clinical reference standard will be medical records, clinical assessment, or a validated disease registry.21
Timeframe: studies conducted from 1 January 1975 to 01 June 2018. The year 1975 represents the advent of ICD-9.25
We will exclude studies according to the following criteria:
Data reported in systematic reviews unless we can identify the primary data, for example, by contacting authors of the original source.
Conference proceedings abstracts or studies not written in English where we are unable to obtain the meta-data from authors or full-length manuscript translations remain awaited, respectively.
Studies will be identified from the following sources
Electronic databases: we will search 01 January 1975–01 June 2018 for studies meeting the inclusion criteria within the MEDLINE (Ovid interface) and Embase (Ovid interface) databases. The search strategies are outlined in table 1.
Conference proceedings: for conference abstracts that appear to meet the inclusion criteria but do not have a full-length article published, we will contact authors directly to request metadata.
Reference lists: we will also identify studies meeting the inclusion criteria from the reference lists of included studies and relevant reviews identified through the electronic database searches.
Literature search results will be uploaded onto Review Manager 5, an internet-based software program that facilitates collaboration among reviewers. Citation titles and abstracts will be uploaded. GKM will then screen titles and abstracts to identify and exclude duplicate publications. Duplicate publications will be identified by comparing author names, study titles, sample sizes, outcomes used, and any other information held in the abstracts. All reviewer authors will have access to the systematic review process via the internet-based review software program, and this will create an audit trail of studies included/excluded, data analysis steps, and subsequent manuscript revisions. All data will be held within the management software and password protected.
Once duplicates have been excluded, two review authors (GKM and KB) will independently screen the titles and abstracts yielded from the databases searches against the inclusion criteria. Where titles and abstracts indicate that a study may meet the inclusion criteria or where there is uncertainty about this, the full-length manuscripts will be downloaded and used to help decide. Where details in the manuscript are still insufficient for a decision to be made about eligibility, we will seek additional information from the study authors and automatically exclude studies where there is no response from authors after three weeks. We will record the reasons for excluding all excluded studies. The two review authors will compare their list of included and excluded trials and any disagreements will be resolved by mutual discussion and, where necessary, adjunction by a third reviewer (RFMC/SED/CRS). Review authors will not be blind to the journal titles, study authors, or institutions.
Data collection process and items
Two review authors (GKM and KB) will independently abstract data about the primary and secondary outcomes from included studies using the data collection tool shown below. The additional information extracted maps onto the items contained within the STARD guidelines modified for epilepsy studies reporting diagnostic accuracy of administrative databases.21 This will allow us to extract sufficient information to make a quality assessment of the completeness of included studies against the STARD checklist.21 Any disagreements in the contents of data abstraction will be resolved by mutual discussion and, where necessary, adjunction by a third reviewer (RFMC/SED/CRS).
Data collection tool
What is the study title?
Who are the study authors?
What is the year of study publication?
What is the journal of publication?
What country(s) was the study conducted in?
Does the study explicitly identify as utilising ‘administrative data’ (yes/no)?
If not, how is this identified by the reviewer? For example, from descriptions given of the databases utilised and background knowledge about them or from correspondence with authors.
Does the abstract provide a structured summary of study design, methods, results and conclusions (yes/no/unclear)?
Does the introduction give a scientific and clinical background including intended use and clinical role of administrative data (yes/no/unclear)?
Are the study objectives and hypotheses described (yes/no/unclear)?
If so, what are they?
What was the intended study sample size and how was it determined?
What is the study design?
Was a study/participant flow diagram used?
What are the eligibility criteria for participation in the validation cohort (ie, the cohort of patients to which the reference standard will be applied)?
Where, when and how were potentially eligible validation cohort participants identified? Include within this:
What is the name of the administrative database(s) on which the validation cohort was identified?
What are the setting and location of the administrative database(s) from which the validation cohort was identified? For example, is it primary care, secondary care, tertiary care, outpatient care and emergency care?
What are the names of any hospitals/organisations affiliated with or using the administrative database routinely?
What is the size of the administrative database(s) on which the validation cohort was identified? That is, how many people/records does it hold in total?
What were the epilepsy ICD codes (or other disease classification system codes) used to identify the validation cohort within the administrative database(s)? That is, what are the diagnostic epilepsy codes that will be validated by the study?
What was the size of the validation cohort identified by the epilepsy codes? That is, give the number of participants identified by these diagnostic epilepsy codes.
Did the validation cohort include identifying a sample of people (1) without epilepsy and (2) with epilepsy ‘mimicker codes’?
If so, give details of the codes used and the number of participants for each of these groups.
Samples of people without epilepsy are often used to help calculate the specificity and NPV of an administrative database. ‘ Mimicker codes’ are often interrogated as the conditions may resemble epilepsy. These may include, for example, classical migraine (ICD-9 code 346.x and ICD-10 code G43.1), transient cerebral ischa emia (ICD-9 code 435 and ICD-10 code G45), syncope (ICD-9 code 780.2 and ICD-10 code R55) or convulsion (ICD-9 code 780.3 and ICD-10 code R56.0 or R56.8), which are intended to be used for organic convulsions but not for epilepsy.11
What other information was obtained about an individual to help identify epilepsy on the administrative database? For example, describe if they linked an individual’s ICD epilepsy codes with the investigations they underwent (such as an electroencephalogram (EEG)) or the antiepileptic drug (AED) they were taking.
What were the demographic and clinical characteristics of the validation cohort? That is, age, gender, type of epilepsy, comorbidities and AEDs.
Describe the reference standard. Include the following:
Name of the reference standard used.
What type of reference standard it was, for example, clinical assessment, medical records, or validated disease registry.
Any rational given for choosing this reference standard (if alternatives exist).
What were the number, training and expertise of persons reading the reference standard?
If more than one person read the reference standard, what were the measures of consistency given? For example, kappa statistic.
Describe any methods used to blind persons reading the reference standard to how the validation cohort were coded diagnostically on the administrative database; that is, how a person reviewing the medical records diagnosis of an individual was made unaware of their administrative ICD diagnosis.
Describe any methods used to blind persons reading the diagnostic administrative data codes to results of the reference standard diagnoses.
What method was used to estimate the disease-identification accuracy of the administrative database (ie, sensitivity, specificity, PPV, NPV or AUC)?
What were the results of this?
That is, provide figures for these estimates and, where possible, also the individual TP, FP, TN and FN figures.
Include the results of any cross-tabulation of the administrative data diagnoses results against the results of the reference standard diagnoses.
Describe the methods used and results of the measures used to estimate variability or precision of the diagnostic accuracy results (eg, 95% CI).
What method was used to compare the diagnostic accuracy of variables within the administrative database? That is, describe the method used to determine an optimal diagnostic algorithm.
What were the results of this?
Describe the methods used and results of the measures used to estimate variability or precision of the algorithm estimates (eg, 95% CI).
How were indeterminate or missing administrative database diagnoses or reference standard results handled?
What were the time interval and any clinical interventions given between reference standard diagnosis and administrative database diagnosis?
Describe any adverse events found from using the administrative database or reference standard.
Summarise the study limitations described by the authors including any sources of potential bias, statistical uncertainty, generalisability limitation, and what they described as implications for practice.
What were the study’s sources of funding?
Systematic review and meta-analysis outcomes
The primary outcome will be providing pooled disease-identification accuracy estimates of the included administrative databases using sensitivity, specificity, PPV, NPV and AUC as the measures of accuracy. This will answer research question 1: how accurately do administrative data identify epilepsy within a population? We will provide an overall estimate of the accuracy of the ICD-9 coding system and that of the ICD-10 coding system in correctly identifying epilepsy cases. This will be done by pooling together individual accuracy estimates from all included studies in which ICD-9 or ICD-10 were used provided there is no significant heterogeneity between studies (as measured using the I2 statistic). The preferred estimators will be means with standard errors (SEs) or medians with interquartile ranges (IQRs), dependent on distribution. We will also quote the 95% CIs for estimates. Where there is significant heterogeneity (I2 statistic >50%), we will provide a descriptive analysis of the results and include ranges. Just as there may be heterogeneity introduced by making comparisons across different trial designs, we might expect there to be heterogeneity introduced by making comparisons across different healthcare systems. This is because there are likely to be differences in the accuracy of administrative healthcare data owing to differences in coding practice and/or the quality of reference standards between different healthcare systems. Therefore, we will also conduct subgroup analysis in which diagnostic accuracy results are pooled together for studies that have used the same or similar healthcare systems, for example, the National Health Service in the UK, Veterans Health Administration in the USA, healthcare systems with geographical overlap, and private-funded versus state-funded healthcare systems. We will also create subgroups in which results are pooled together within the following study design groups: prospective cohorts, retrospective cohorts, case–control studies, primary care studies, secondary care studies, tertiary care studies, paediatric studies (age <18 years), adult studies, and studies from the same country. Differences in the results for each subgroup may provide an important guide for future studies in the field, and they may also help to explain any statistical heterogeneity seen.
The secondary outcome will be the optimal administrative data algorithms for correctly identifying epilepsy within a population. For this, we will assign a dummy variable with a binary 0 = ‘no’ or 1 = ‘yes’ category to participants having the following:
A reference standard diagnosis of epilepsy (yes/no).
An administrative diagnosis code for epilepsy (yes/no).
Multiple administrative epilepsy diagnoses codes over time (yes/no).
Having previously had an EEG (yes/no).
Having previously had a computed tomography (CT) or magnetic resonance image (MRI) of the brain (yes/no).
Having previously had epilepsy surgery (yes/no);
Being on an individual AED (yes/no).
Being on two or more AEDs (yes/no).
Multivariable logistic regression models with A as the outcome variable and B–H individually and in combinations as the independent variables will be used in order to demonstrate the algorithm(s) best fitting the data across the included studies and to assess the significance of each variable’s contribution to the model. The results of the logistic models will be displayed as sensitivity, specificity, PPV, NPV and AUC where possible, with 95% CIs and measures of interstudy heterogeneity provided using the I2 statistic.
Risk of bias analysis
We will use the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2)27 tool to assess risk of bias within and across studies, modified for studies validating administrative data. This is summarised in the table 2 and will be completed independently by two review authors (GKM and KB) for each study, with disagreements resolved by mutual discussion and, where necessary, adjunction by a third reviewer (RFMC/SED/CRS). The tool consists of four key domains (see row 1) covering: (1) patient selection, (2) the administrative database, (3) the reference standard and (4) flow of patients through the study and timing of the administrative database and reference standard. Each domain is assessed in terms of the risk of bias (graded as high, low or unclear; see row 4) and the first three domains are also assessed in terms of concerns regarding applicability (see row 5). The description (see row 2) contains information used to support the risk of bias judgement. To help reach a judgement on the risk of bias, signalling questions are included (see row 3). These flag aspects of study design related to the potential for bias and aim to help reviewers make risk of bias judgements. If all signalling questions for a domain are answered ‘yes’, then risk of bias is judged ‘low’. If any signalling question is answered ‘no’, then risk of bias is judged ‘high’. If any signalling question is answered ‘unclear’, then risk of bias is judged ‘unclear’. Applicability sections are structured in a similar way to the bias sections but do not include signalling questions. Review authors are asked to record the information on which the judgement of applicability is made and then to rate their concern that the study does not match the review question.27 28
On completing the QUADAS-2 table, we will provide a risk of bias and applicability concerns graph demonstrating the review authors’ judgements about each domain, presented as percentages across included studies. We will also provide a risk of bias and applicability concerns summary demonstrating review authors’ judgements about each domain for each included study. We will use the Deek’s test29 to interrogate for publication bias. This test is specifically designed for detecting funnel plot asymmetry in reviews of diagnostic studies.28
Confidence in cumulative evidence
We will use the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to assess strength of the body of evidence.30 The GRADE system classifies the quality of evidence into one of four grades:
High: further research is very unlikely to change our conﬁdence in the estimate of effect.
Moderate: further research is likely to have an important impact on our conﬁdence in the estimate of effect and may change the estimate.
Low: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low: any estimate of effect is very uncertain.31
A judgement is made on the individual studies used to provide the pooled effect estimates, and the quality of evidence is then downgraded by the cumulative presence of: (1) bias (see risk of bias analysis), (2) inconsistency (ie, heterogeneity present on I2 statistic), (3) indirectness (ie, high concerns regarding applicability; see table 2), (4) imprecision (small sample sizes, wide CIs, and inadequately powered studies) and (5) publication bias (see Deek’s test29 comments).32 GRADE classifications will be independently conducted by two review authors (GKM and KB) with any disagreements resolved by mutual discussion and, where necessary, adjunction by a third reviewer (RFMC/SED/CRS).
We will rate the completeness of reporting for each study out of 30 using the STARD 2015 checklist.21 A score of 0–10, 11–20 and 21–30 will indicate a low, moderate and high quality of completeness of reporting, respectively.
Patient and public involvement
Patients and the public were not involved in development of the research question and outcome measures, nor the study design. The study does not involve patient recruitment, and patients were not involved in conduct of the study. We plan to liaise closely with patients, special interest groups, and charities in the dissemination of our results in printed and electronic media. Meta-data and information about the study will also be made available through our website (www.muirmaxwellcentre.com).
Ethics and dissemination
Ethical approval is not required as primary data will not be collected. Results will be disseminated in peer-reviewed journals, conference presentations, and in press releases. Meta-data and information about the study will also be made available through our website (www.muirmaxwellcentre.com) and via social media.
Contributors GKM and RFMC conceived the idea for the protocol and made the main contribution to planning and preparation of timelines for its completion. GKM and KB put together and tested various search strategies for the protocol and, after consultation with CRS, had these reviewed and approved by Marshall Dozier at the University of Edinburgh library whose support we acknowledge. GKM and RFMC planned the data extraction and statistical analysis, as well as of risk of bias, quality of evidence and completeness of reporting assessments. GKM designed the tables and wrote the first draft of the manuscript, which was then reviewed and amended by KB, CRS, SED and RFMC. All authors then approved the final written manuscript. RFMC is the guarantor for the work.
Funding This work was supported by Epilepsy Research UK (R44007) and the Juliet Bergqvist Memorial Fund.
Disclaimer The funders had no role in the design of the protocol, its preparation, analyses, interpretation of the data, manuscript preparation or decision to submit.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Author note Any future amendments of the protocol will be listed in this section along with a date, description, and rational for each amendment.