Model-based methods for case definitions from administrative health data: application to rheumatoid arthritis

Kristine Kroeker; Jessica Widdifield; Saman Muthukumarana; Depeng Jiang; Lisa M Lix

doi:10.1136/bmjopen-2017-016173

Article Text

PDF

XML

Research methods

Research

Model-based methods for case definitions from administrative health data: application to rheumatoid arthritis

Kristine Kroeker1,
Jessica Widdifield2,3,
Saman Muthukumarana4,
Depeng Jiang1,
Lisa M Lix1

¹ Department of Community Health Sciences, University of Manitoba, Winnipeg, Canada
² Institute for Clinical Evaluative Sciences, Toronto, Canada
³ Research Institute of the McGill University Health Centre, McGill University, Montreal, Canada
⁴ Department of Statistics, University of Manitoba, Winnipeg, Canada

Correspondence to Dr. Lisa M Lix; lisa.lix{at}med.umanitoba.ca

Abstract

Objective This research proposes a model-based method to facilitate the selection of disease case definitions from validation studies for administrative health data. The method is demonstrated for a rheumatoid arthritis (RA) validation study.

Study design and setting Data were from 148 definitions to ascertain cases of RA in hospital, physician and prescription medication administrative data. We considered: (A) separate univariate models for sensitivity and specificity, (B) univariate model for Youden’s summary index and (C) bivariate (ie, joint) mixed-effects model for sensitivity and specificity. Model covariates included the number of diagnoses in physician, hospital and emergency department records, physician diagnosis observation time, duration of time between physician diagnoses and number of RA-related prescription medication records.

Results The most common case definition attributes were: 1+ hospital diagnosis (65%), 2+ physician diagnoses (43%), 1+ specialist physician diagnosis (51%) and 2+ years of physician diagnosis observation time (27%). Statistically significant improvements in sensitivity and/or specificity for separate univariate models were associated with (all p values <0.01): 2+ and 3+ physician diagnoses, unlimited physician diagnosis observation time, 1+ specialist physician diagnosis and 1+ RA-related prescription medication records (65+ years only). The bivariate model produced similar results. Youden’s index was associated with these same case definition criteria, except for the length of the physician diagnosis observation time.

Conclusion A model-based method provides valuable empirical evidence to aid in selecting a definition(s) for ascertaining diagnosed disease cases from administrative health data. The choice between univariate and bivariate models depends on the goals of the validation study and number of case definitions.

administrative health data
chronic disease
diagnosis
regression
rheumatoid arthritis
validation

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

https://doi.org/10.1136/bmjopen-2017-016173

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

Studies about the validity (ie, sensitivity and specificity) of disease case definitions for administrative health data typically rely on descriptive methods to select one or more case definitions for use. Our study proposes and demonstrates a model-based method that provides empirical evidence to support case definition development.
Our method can be applied to diseases that are ascertained from diagnoses and/or prescription medication information in administrative health data, for which one or more validity measures are produced: sensitivity, specificity, positive predictive value, negative predictive value and summary measures such as Youden’s index.
A limitation of our method is that it cannot be applied to validation studies with a small number of case definitions (ie, <50 case definitions).

Introduction

Administrative health data are widely used for research and surveillance studies because they are relatively inexpensive to access, cover entire populations and can be linked to create longitudinal patient-specific records of healthcare use. However, one limitation of administrative health data is their potentially low sensitivity and specificity for ascertaining patients with chronic diseases.1–4 Therefore, validation studies are an essential tool for assessing data quality. A validation study compares cases ascertained from administrative health data with clinically confirmed cases and produces accuracy estimates (eg, sensitivity and specificity) for one or more case definitions.4 5 Many studies routinely test multiple case definitions. For example, in the rheumatoid arthritis (RA) validation literature, several studies reported more than 40 case definitions.6–8

Selecting a single case definition from among the many that may be tested in a single study is not a straightforward process; sensitivity and specificity estimates often vary with case definition criteria such as the number of diagnosis codes, number of years of data used to ascertain cases and patient characteristics (eg, age and sex). Published guidelines recommend selecting a case definition by prioritising a single validity measure.9 Moreover, these guidelines recommend that validation studies report all case definitions and at least four validity measures.10 11 Thus, a single validation study may result in a large volume of case definition information. Many researchers rely solely on descriptive analyses to summarise these data and select a case definition from among those that are tested.1 3 6 However, the case definition with the highest diagnostic validity estimate may not be more accurate than the case definition with the next highest diagnostic validity estimate, due to sampling error in the estimates. Inferential methods can be used to support the selection of a case definition; they can provide valuable empirical evidence about the case definition criteria associated with validity estimates. However, to date, there have been no recommendations from published guidelines of the use of inferential methods to analyse diagnostic validity estimates.10 11

We propose a model-based method to facilitate the selection of disease case definitions from validation studies for administrative health data. The objectives are to: (A) test administrative health data criteria associated with the validity of case definitions, and (B) compare competing models applied to case definition validity estimates. The model-based method is demonstrated for an RA validation study.

Methods

Data source

Study data were from an RA validation study6 conducted using administrative data from 1 April 1991 to 31 March 2011 for patients from Ontario, Canada. Case definitions for administrative health data were developed using medical records for 450 patients from 18 rheumatology clinics as the gold standard. Physician billing claims, hospital discharge abstracts and emergency room (ER) records were used to develop case definitions for all patients; in addition, pharmacy data were used to develop case definitions for patients aged 65+ years.

The published study data reported on validity estimates for 61 case definitions. Validity estimates for an additional 87 case definitions not reported in the publication were provided by the first author. Thus, a total of 148 case definitions were available for analysis. Of this number, 57 case definitions (38.5%) were tested for individuals 20+ years and 91 case definitions (61.5%) were tested for individuals 65+ years. All case definitions tested in the 20+ years age group were also tested in the 65+ years age group. The remaining 34 case definitions for the 65+ years age group included prescription medication criteria.

Study variables

The case definitions were described using the following criteria (table 1): age group, number of diagnoses in hospital discharge records, number of diagnoses in ER records, number of diagnoses in physician records, number of specialist physician diagnoses, length of physician diagnosis observation time, ≥60 days of separation between physician diagnoses, exclusion criteria A, exclusion criteria B, number of RA-related medications including steroids and number of RA-related medications excluding steroids. Diagnoses in hospital discharge records and diagnoses in ER records were ascertained using an unlimited observation period. A case definition with no physician diagnoses corresponds with having no physician diagnosis observation time. The exclusion criteria A was defined by the authors as follows: exclude individuals with at least two physician diagnosis codes with a different rheumatology diagnosis to an RA diagnosis; this includes osteoarthritis, gout, polymyalgia rheumatic, other seronegative spondyloarthropathy, ankylosing spondylitis, psoriasis, synovitis/tenosynovitis/bursitis, connective tissue disorder, vasculitis and others. The exclusion criteria B was defined by the authors as follows: exclude individuals who had an RA diagnosis code not confirmed by a specialist. The RA-related medication criteria were set to missing for the case definitions applied to the 20+ age group, because medication data were not available for this age group.

View this table:

Table 1

RA case definition criteria and attributes

The study dataset included each case definition as an observation. The case definition criteria and estimates of sensitivity and specificity were included as variables. Youden’s index (ie, sensitivity + specificity −1)12 was calculated from the estimates of sensitivity and specificity.

Statistical analyses

Descriptive analyses of the case definition attributes and the estimates of sensitivity, specificity and Youden’s index were conducted using frequencies, percentages and means to inform the model fitting process. All criteria were treated as ordinal measures. Spearman correlation coefficients were used to identify potential collinearity (defined as a correlation of 0.70 or greater13) among the case definition criteria.

The following models were fit to the data based on previous research14: (A) separate univariate fixed-effects models for sensitivity and specificity, (B) univariate fixed-effects model for Youden’s index and (C) bivariate (ie, joint) mixed-effects model for sensitivity and specificity. For the univariate models, sensitivity, specificity or Youden’s index were the outcome variables, and the case definition criteria were covariates. The bivariate mixed-effects model jointly modelled sensitivity and specificity as the outcome variables, and the case definition criteria were covariates. In the bivariate model, estimates of sensitivity and specificity were treated as repeated measures to account for their dependence.

Each covariate (ie, case definition criteria) was first tested in unadjusted models. Multivariable models were subsequently fit to the data; only the covariates that were statistically significant at α=0.01 and explained more than 1% of the variation in the unadjusted models (based on the pseudo R ² statistic15) were retained. The pseudo R ² statistic was calculated using the likelihood statistics from the unadjusted model and the null model (ie, model with no covariates) using McFadden’s method.15 A nominal α=0.01, based on the Bonferroni correction, was used to evaluate statistical significance in the multivariable model to limit the overall probability of a Type I error.16 The adjusted models were fit to the data for all age groups (n=148) and then fit to the data only for the 65+ age group (n=91). Univariate models were reported with the percent of explained variance. All model estimates were reported with 99% CIs.

The data were modelled using a beta distribution with a logit link function as recommended in previous research.14 The mixed-effects bivariate model used the Cholesky decomposition to ensure that the estimated variance–covariance matrix of the random effects was positive semidefinite and the model converged.17 All analyses were conducted using SAS V.9.3.18

Results

Descriptive analyses

As shown in table 2, two-thirds (64.9%) of the case definitions that were applied to the data for all age groups used 1+ hospital discharge record in an unlimited observation period as a criterion. At least one diagnosis in ER records in an unlimited observation period was a criterion of 6.8% of the case definitions. Physician claims were used in 94.6% of the case definitions. The case definitions used physician diagnosis observation periods of never (ie, when physician claims were not used, 5.4%), ≤1 year (25.7%), ≤2 years (27.0%), ≤5 years (29.7%) and an unlimited physician diagnosis observation period (12.2%) to ascertain physician diagnoses. At least one specialist diagnosis was included as a criterion in half of the case definitions (51.4%). A time separation of ≥60 days between two physician diagnoses was used in 14 (9.5%) case definitions. Exclusion criteria A and B were infrequently used in the case definitions (6.8% and 5.4%, respectively). Of the 91 case definitions for the 65+ age group, 11.5% required 1+ RA-related medication including steroids to ascertain cases and 11.5% of case definitions used 1+ RA-related medication excluding steroids to ascertain cases. Compared with the case definitions for the 65+ years age group, the case definitions for the 20+ age group had slightly lower average estimates of sensitivity (20+ years: 90.9 and 65+ years: 91.3), specificity (20+ years: 82.2 and 65+ years: 86.1) and Youden’s index (20+ years: 73.1 and 65+ years: 77.4).

View this table:

Table 2

Descriptive statistics for RA case definition criteria (n=148)

The following case definition criteria were highly correlated (data not shown): exclusion criteria A and B (r=0.89; p<0.0001), exclusion criteria A and ≥60 days of separation between physician claims (r=0.83; p<0.0001) and exclusion criteria B and ≥60 days of separation between physician claims (r=0.74; p<0.0001). These combinations of case definition criteria were not included in the same model; rather, one criterion from each pair was used in a model at a time.

Inferential analyses

Table 3 reports the percent pseudo R ² statistics for the univariate and bivariate unadjusted models for all case definitions. The number of physician diagnoses and physician diagnosis observation time were the two criteria that explained the most variation in all models (19.2%–82.6% and 16.0%–77.4%, respectively). In the unadjusted univariate models, sensitivity was significantly associated (p<0.01) with the number of physician diagnoses, length of physician diagnosis observation time, ≥60 days of separation between physician diagnoses, exclusion criteria A and B and number of RA-related medications excluding steroids. Specificity was significantly associated (p<0.01) with the number of physician diagnoses, number of specialist diagnoses, length of physician diagnosis observation time, ≥60 days of separation between physician diagnoses and number of RA-related medications excluding steroids. The unadjusted univariate models also revealed that Youden’s index was significantly associated (p<0.01) with the number of diagnoses in ER records, number of physician diagnoses, number of specialist diagnoses, length of physician diagnosis observation time, ≥60 days of separation between physician diagnoses and exclusion criteria A and B. The joint estimates of sensitivity and specificity were significantly associated (p<0.01) with the number of hospital diagnoses, number of physician diagnoses, number of specialist diagnoses, length of physician diagnosis observation time, ≥60 days of separation between physician diagnoses, exclusion criteria A and B and number of RA-related medications excluding steroids in the bivariate models.

View this table:

Table 3

Percent pseudo-R ² statistics for unadjusted univariate models of sensitivity, specificity, univariate model of Youden’s index and bivariate model of sensitivity and specificity (n=148)

When all case definitions (n=148) were considered, the adjusted univariate model of sensitivity showed that increasing the length of physician diagnosis observation time to unlimited compared from 1 year was associated with a statistically significant increase in sensitivity, and ≥60 days of separation between physician diagnoses significantly decreased sensitivity (figure 1). Similar relationships were found for the models for the case definitions from the 65+ years, where prescription medication data were available (n=91).

Figure 1

Logit estimates and 99% CIs for adjusted univariate models of sensitivity and specificity for case definitions applied to: (A) all age groups and (B) 65+ age group. Reference categories were: 1+ physician diagnoses, ≤1 year physician diagnosis observation time, 0 specialist diagnoses, no 60+ day separation between physician diagnoses and 0 RA-related medications excluding steroids. RA, rheumatoid arthritis.

When all case definitions (n=148) were considered, the univariate model of specificity showed that using 2+ and 3+ physician diagnoses were associated with a statistically significant increase in specificity compared with one physician diagnosis, increasing the physician diagnosis observation time to ≤2 years and ≤5 years from 1 year significantly decreased specificity and 1+ specialist diagnosis significantly increased specificity (figure 1). When only the case definitions for the 65+ age group (n=91) were considered, the results showed similar relationships. Also, the number of RA-related medications excluding steroids was associated with a statistically significant increase in specificity.

Based on the unadjusted models, the univariate model with all case definitions (n=148) for Youden’s index included the case definition criteria of number of diagnoses in ER records, number of physician diagnoses, physician diagnosis observation time, number of specialist visits and ≥60 days of separation between physician diagnoses. However, the number of ER records and physician diagnosis observation time criteria were not statistically significant and model fit improved when they were removed (data not shown). The adjusted univariate model of Youden’s index showed that using 2+ and 3+ physician diagnoses to ascertain cases significantly increased Youden’s index compared with 1+ physician diagnosis, 1+ specialist diagnosis significantly increased Youden’s index and a time separation ≥60 days between diagnoses significantly decreased Youden’s index (figure 2). When the case definitions for the 65+ population (n=91) were considered, a similar pattern emerged. Using the number of RA-related medications excluding steroids to ascertain RA cases resulted in a statistically significant increase in Youden’s index.

Figure 2

Logit estimates and 99% CIs for adjusted univariate models of Youden’s index for case definitions applied to: (A) all age groups and (B) 65+ age group. Reference categories were: 1+ physician diagnoses, 0 specialist diagnoses, no 60+ day separation between physician diagnoses and 0 RA-related medications excluding steroids. RA, rheumatoid arthritis.

When all case definitions were analysed using the adjusted bivariate model, 2+ and 3+ physician diagnoses were associated with a statistically significant increase in specificity and no association with sensitivity compared with 1+ physician diagnosis (figure 3). Increasing the number of physician diagnosis observation years from 1 year to ≤2 years, ≤5 years and unlimited observation period were associated with a statistically significant increase in sensitivity. Increasing the number of physician diagnosis observation years from 1 year to ≤2 years and ≤5 years were associated with a statistically significant decrease in specificity. Using ≥60 days of separation between physician diagnoses was associated with a statistically significant decrease in sensitivity but no significant change on specificity. Increasing the number of specialist diagnoses significantly increased specificity. When only the case definitions applied to the 65+ age group (n=91) were analysed, the relationships in the all age groups model remained statistically significant. Including 1+ RA-related medications excluding steroids decreased sensitivity and increased specificity.

Figure 3

Logit estimates and 99% CIs for adjusted bivariate models of sensitivity and specificity for case definitions applied to: (A) all age groups and (B) 65+ age group. Reference categories were: 1+ physician diagnoses, ≤1 year physician diagnosis observation time, 0 specialist diagnoses, no 60+ day separation between physician diagnoses and 0 RA-related medications excluding steroids. RA, rheumatoid arthritis.

Discussion

This study applied regression models in a secondary analysis of administrative health data to identify case definition criteria associated with validity estimates from a study about RA case definitions. Based on the results of the adjusted univariate model, sensitivity was associated with the number of physician diagnoses, physician diagnosis observation time and length of time between physician diagnoses. Based on the results of the univariate model, specificity was associated with the number of physician diagnoses, physician diagnosis observation time, specialist diagnoses and RA-related medications excluding steroids. Based on the univariate model results, Youden’s index was associated with the number of physician diagnoses, specialist diagnoses, length of time between physician diagnoses and number of RA-related medications excluding steroids. For the bivariate model, sensitivity was associated with the number of physician diagnoses, physician diagnosis observation time, amount of time between physician diagnoses and number of RA-related medications excluding steroids. In this same model, specificity was associated with the number of physician diagnoses, physician diagnosis observation time, specialist diagnoses and RA-related medications excluding steroids.

All of the models resulted in similar performance in our numeric example, but this may not always be the case. Selection of one model over competing alternatives depends on the study goals and the number of case definitions. Overall, however, the bivariate model is recommended when the number of case definitions is large and sensitivity and specificity are moderately or highly correlated. The univariate model applied to Youden’s index is recommended when the researcher places equal weight on maximising sensitivity and specificity.12 19–21 However, Youden’s index can result in the same estimate for different combinations of sensitivity and specificity. Thus, univariate models applied separately to sensitivity and specificity are recommended when the researcher does not place equal weight on these validity measures.22

A validation study is typically used to produce recommendations about selecting one or more case definitions for maximum accuracy in ascertaining disease cases. Using our model-based method, one can identify the case definition criteria associated with one or more measures of validity and use them to construct a case definition. Based on the univariate models for all age groups, the recommended RA case definition has the following attributes: (A) 2+ physician diagnoses, (B) unlimited physician diagnosis observation time and (C) 1+ specialist diagnosis. At least two physician diagnoses and 1+ specialist diagnosis were associated with specificity but not with sensitivity. An unlimited observation time was significantly associated with improvements in sensitivity. For the 65+ age group, the recommended RA case definition had the same case definition attributes and also included 1+ RA-related medication excluding steroids. The univariate model of Youden’s index and bivariate model of sensitivity and specificity produced similar results.

The recommended case definition based on the univariate sensitivity and specificity models is simpler than the recommended case definition from Widdifield et al; 6 however, both case definitions produce similar diagnostic accuracy estimates. The primary difference between the two recommended case definitions is that Widdifield et al recommended using one diagnosis in hospital discharge records to ascertain cases, while our model-based approach did not support this. Our recommendation derived from a model-based approach might lead to subsequent reanalysis of the original validation data to produce estimates of sensitivity and specificity for the model-supported case definition.

The case definition with the highest sensitivity or specificity estimates may not be significantly more accurate than other case definitions. A model-based approach provides empirical evidence about the case definition criteria that are associated with significant increases/decreases in validity estimates.

While this research focused on inferential techniques for diagnostic validation studies, design of such studies is also an important consideration to ensure that the effects of the criteria can be accurately estimated. Ensuring that all possible combinations of criteria are investigated is an important consideration.23 When a criterion is included in only a small number of case definitions, the power to detect the effect of the criterion on diagnostic validity estimates may be low.

This study has some limitations. Other parametric and non-parametric models have been proposed to combine estimates from a single case definition across multiple studies, such as copulas techniques,14 24 mixture models25 26 and mixed models of summary receiver operating characteristic curves.27 28 However, the models selected for this study have many applications and are most likely to be familiar to researchers. A beta distribution may not always be an appropriate choice for Youden’s index, because this index can, in theory, range from −1 to +1. However, in practice, values of Youden’s index less than zero are rare.

Inferential methods cannot be applied to the estimates from all diagnostic validation studies; overfitting the data may be a problem when the number of case definitions is small, or when the number of case definition criteria is large relative to the total number of definitions. Green29 suggested a minimum of 50 observations plus eight observations for every parameter estimated from a multiple regression model. Based on this, our model-based approach would require a minimum of 50 case definitions, and preferably more, in order to be implemented.29 In such cases, descriptive analyses must be relied on to select a case definition. Lastly, the validation study design may limit the ability to test interactions between criteria. Testing interaction effects would require as many combinations as possible of the criteria to be included in the validation study to allow for reasonable model power.

This study has a number of strengths. The models applied to the case definition data from a single study have previously been used in meta-analyses to combine diagnostic validity estimates for a single case definition across multiple studies.14 30 31 Second, these methods enable modelling of case definitions from published validation studies and when the individual-level administrative health data are not available. Another strength of this study was that the effects of publication bias on the study results was minimised by analysing all case definition data provided by the study authors. Finally, the methods used in this study can be applied to other chronic diseases and other diagnostic validity measures such as positive predictive value and/or negative predictive value.

Conclusion

This study applied model-based methods to a single validation study to select case definition criteria associated with validity measures such as sensitivity and specificity. The model-based results can be used by researchers to empirically guide the selection of a case definition for implementation in subsequent cohort studies and surveillance initiatives.32 33 Empirical methods can be used to quantify the magnitude of change in estimates of accuracy associated with different case definition criteria. This research contributes to accurate methods for using administrative health data to study chronic diseases such as RA.

References

1.↵
2. Quan H ,
3. Khan N ,
4. Hemmelgarn BR , et al
. Validation of a case definition to define hypertension using administrative data. Hypertension 2009;54:1423–8.doi:10.1161/HYPERTENSIONAHA.109.139279
OpenUrl CrossRef PubMed
2.↵
2. Hux JE ,
3. Ivis F ,
4. Flintoft V , et al
. Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care 2002;25:512–6.doi:10.2337/diacare.25.3.512
OpenUrl Abstract/FREE Full Text
3.↵
2. Tu K ,
3. Campbell NR ,
4. Chen ZL , et al
. Accuracy of administrative databases in identifying patients with hypertension. Open Med 2007;1:e18–26.
OpenUrl PubMed
4.↵
2. Lix LM ,
3. Yogendran MS ,
4. Leslie WD , et al
. Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases. J Clin Epidemiol 2008;61:1250–60.doi:10.1016/j.jclinepi.2008.02.002
OpenUrl CrossRef PubMed Web of Science
5.↵
2. Lix LM ,
3. Yan L ,
4. Blackburn D , et al
. Validity of the RAI-MDS for ascertaining diabetes and comorbid conditions in long-term care facility residents. BMC Health Serv Res 2014;14:17.doi:10.1186/1472-6963-14-17
OpenUrl
6.↵
2. Widdifield J ,
3. Bernatsky S ,
4. Paterson JM , et al
. Accuracy of canadian health administrative databases in identifying patients with rheumatoid arthritis: a validation study using the medical records of rheumatologists. Arthritis Care Res 2013;65:n/a–91.doi:10.1002/acr.22031
OpenUrl
7.↵
2. Widdifield J ,
3. Bombardier C ,
4. Bernatsky S , et al
. An administrative data validation study of the accuracy of algorithms for identifying rheumatoid arthritis: the influence of the reference standard on algorithm performance. BMC Musculoskelet Disord 2014;15:216.doi:10.1186/1471-2474-15-216
OpenUrl CrossRef PubMed
8.↵
2. Benchimol EI ,
3. Guttmann A ,
4. Mack DR , et al
. Validation of international algorithms to identify adults with inflammatory bowel disease in health administrative data from Ontario, Canada. J Clin Epidemiol 2014;67:887–96.doi:10.1016/j.jclinepi.2014.02.019
OpenUrl
9.↵
2. Chubak J ,
3. Pocobelli G ,
4. Weiss NS
. Tradeoffs between accuracy measures for electronic health care data algorithms. J Clin Epidemiol 2012;65:343–9.doi:10.1016/j.jclinepi.2011.09.002
OpenUrl CrossRef PubMed
10.↵
2. Benchimol EI ,
3. Manuel DG ,
4. To T , et al
. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol 2011;64:821–9.doi:10.1016/j.jclinepi.2010.10.006
OpenUrl CrossRef PubMed
11.↵
2. Whiting P ,
3. Rutjes AW ,
4. Reitsma JB , et al
. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25.doi:10.1186/1471-2288-3-25
OpenUrl CrossRef PubMed
12.↵
2. Youden WJ
. Index for rating diagnostic tests. Cancer 1950;3:32–5.doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
OpenUrl CrossRef PubMed Web of Science
13.↵
2. Tabachnick BG ,
3. Fidell LS
. Using multivarite statistics. 2nd ed. New York: Harper & Row, 1989:p. 87–8.
14.↵
2. Kuss O ,
3. Hoyer A ,
4. Solms A
. Meta-analysis for diagnostic accuracy studies: a new statistical model using beta-binomial distributions and bivariate copulas. Stat Med 2014;33:17–30.doi:10.1002/sim.5909
OpenUrl CrossRef PubMed
15.↵
2. Domencich TA ,
3. McFadden D
. Urban travel demand: a behavioral analysis. Amsterdam: North-Holland Publishing Company, 1975.
16.↵
2. Shaffer JP
. Multiple hypothesis testing. Annu Rev Psychol 1995;46:561–84.doi:10.1146/annurev.ps.46.020195.003021
OpenUrl CrossRef Web of Science
17.↵
2. Menke J
. Bivariate random-effects meta-analysis of sensitivity and specificity with SAS PROC GLIMMIX. Methods Inf Med 2010;49:54–64.doi:10.3414/ME09-01-0001
OpenUrl PubMed Web of Science
18.↵
SAS Institute Inc. SAS/STAT 9.3 user’s Guide. Cary, NC, 2011.
19.↵
2. Marrie RA ,
3. Yu BN ,
4. Leung S , et al
. Rising prevalence of vascular comorbidities in multiple sclerosis: validation of administrative definitions for diabetes, hypertension, and hyperlipidemia. Mult Scler 2012;18:1310–9.doi:10.1177/1352458512437814
OpenUrl CrossRef PubMed
20.↵
2. Butt DA ,
3. Tu K ,
4. Young J , et al
. A validation study of administrative data algorithms to identify patients with parkinsonism with prevalence and incidence trends. Neuroepidemiology 2014;43:28–37.doi:10.1159/000365590
OpenUrl CrossRef PubMed
21.↵
2. Tu K ,
3. Wang M ,
4. Jaakkimainen RL , et al
. Assessing the validity of using administrative data to identify patients with epilepsy. Epilepsia 2014;55:335–43.doi:10.1111/epi.12506
OpenUrl CrossRef PubMed Web of Science
22.↵
2. Zaslavsky AM ,
3. Shaul JA ,
4. Zaborski LB , et al
. Combining health plan performance indicators into simpler composite measures. Health Care Financ Rev 2002;23:101–15.
OpenUrl PubMed
23.↵
2. Vanniyasingam T ,
3. Cunningham CE ,
4. Foster G , et al
. Simulation study to determine the impact of different design features on design efficiency in discrete choice experiments. BMJ Open 2016;6:e011985.doi:10.1136/bmjopen-2016-011985
OpenUrl Abstract/FREE Full Text
24.↵
2. Hoyer A ,
3. Kuss O
. Meta-analysis of diagnostic tests accounting for disease prevalence: a new model using trivariate copulas. Stat Med 2015;34:1912–24.doi:10.1002/sim.6463
OpenUrl CrossRef PubMed
25.↵
2. Schlattmann P ,
3. Verba M ,
4. Dewey M , et al
. Mixture models in diagnostic meta-analyses--clustering summary receiver operating characteristic curves accounted for heterogeneity and correlation. J Clin Epidemiol 2015;68:61–72.doi:10.1016/j.jclinepi.2014.08.013
OpenUrl
26.↵
2. Böhning D ,
3. Böhning W ,
4. Holling H
. Revisiting Youden's index as a useful measure of the misclassification error in meta-analysis of diagnostic studies. Stat Methods Med Res 2008;17:543–54.doi:10.1177/0962280207081867
OpenUrl CrossRef PubMed
27.↵
2. Doebler P ,
3. Holling H ,
4. Böhning D
. A mixed model approach to meta-analysis of diagnostic studies with binary test outcome. Psychol Methods 2012;17:418–36.doi:10.1037/a0028091
OpenUrl CrossRef PubMed
28.↵
2. Hamza TH ,
3. Reitsma JB ,
4. Stijnen T
. Meta-analysis of diagnostic studies: a comparison of random intercept, normal-normal, and binomial-normal bivariate summary ROC approaches. Med Decis Making 2008;28:639–49.doi:10.1177/0272989X08323917
OpenUrl CrossRef PubMed Web of Science
29.↵
2. Green SB
. How many Subjects does It take to do A regression analysis. Multivariate Behav Res 1991;26:499–510.doi:10.1207/s15327906mbr2603_7
OpenUrl CrossRef PubMed Web of Science
30.↵
2. Riley RD ,
3. Abrams KR ,
4. Sutton AJ , et al
. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007;7:3.doi:10.1186/1471-2288-7-3
OpenUrl CrossRef PubMed
31.↵
2. Reitsma JB ,
3. Glas AS ,
4. Rutjes AW , et al
. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982–90.doi:10.1016/j.jclinepi.2005.02.022
OpenUrl CrossRef PubMed Web of Science
32.↵
2. Widdifield J ,
3. Paterson JM ,
4. Bernatsky S , et al
. The epidemiology of rheumatoid arthritis in Ontario, Canada. Arthritis Rheumatol 2014;66:786–93.doi:10.1002/art.38306
OpenUrl
33.↵
2. Widdifield J ,
3. Moura CS ,
4. Wang Y , et al
. The Longterm effect of early intensive treatment of seniors with rheumatoid Arthritis: a comparison of 2 Population-based Cohort studies on Time to Joint Replacement Surgery. J Rheumatol 2016;43:861–8.doi:10.3899/jrheum.151156
OpenUrl Abstract/FREE Full Text

Footnotes

Contributors KK led the conception and design of the study, analysis and interpretation of the data and drafted the article. LML provided guidance on the conception and design of the study, assisted in the analysis and interpretation of the data and was involved in revising the article. JW provided access to the study data, assisted in interpretation of the data and was involved in revising the article. DJ and SM provided guidance on the conception and design of the study, assisted in the analysis and interpretation of the data and were involved in revising the article. All authors read and approved the final manuscript.
Funding The first author was supported by the Canadian Institutes of Health Research (CIHR) Drug Safety and Effectiveness Network grant TD3.137716 through the scholarship from the Canadian Network for Advanced Interdisciplinary Methods for comparative effectiveness research (CAN.AIM) team. This work was supported by the Canadian Institutes of Health Research (CIHR) (www.cihr.irsc.gc.ca) through Canadian Masterâ€™s Scholarship funding from CIHR to the first author.
Competing interests None declared.
Patient consent This study does not involve human subjects.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The datasets used and analysed during the current study are available from JW on reasonable request.
Correction notice This paper has been amended since it was published Online First. Owing to a scripting error, some of the publisher names in the references were replaced with 'BMJ Publishing Group'. This only affected the full text version, not the PDF. We have since corrected these errors and the correct publishers have been inserted into the references.

[1] 1.↵

Quan H ,
Khan N ,
Hemmelgarn BR , et al
. Validation of a case definition to define hypertension using administrative data. Hypertension 2009;54:1423–8.doi:10.1161/HYPERTENSIONAHA.109.139279
OpenUrl CrossRef PubMed

[3] Quan H ,

[4] Khan N ,

[5] Hemmelgarn BR , et al

[6] 2.↵

Hux JE ,
Ivis F ,
Flintoft V , et al
. Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm. Diabetes Care 2002;25:512–6.doi:10.2337/diacare.25.3.512
OpenUrl Abstract/FREE Full Text

[8] Hux JE ,

[9] Ivis F ,

[10] Flintoft V , et al

[11] 3.↵

Tu K ,
Campbell NR ,
Chen ZL , et al
. Accuracy of administrative databases in identifying patients with hypertension. Open Med 2007;1:e18–26.
OpenUrl PubMed

[13] Tu K ,

[14] Campbell NR ,

[15] Chen ZL , et al

[16] 4.↵

Lix LM ,
Yogendran MS ,
Leslie WD , et al
. Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases. J Clin Epidemiol 2008;61:1250–60.doi:10.1016/j.jclinepi.2008.02.002
OpenUrl CrossRef PubMed Web of Science

[18] Lix LM ,

[19] Yogendran MS ,

[20] Leslie WD , et al

[21] 5.↵

Lix LM ,
Yan L ,
Blackburn D , et al
. Validity of the RAI-MDS for ascertaining diabetes and comorbid conditions in long-term care facility residents. BMC Health Serv Res 2014;14:17.doi:10.1186/1472-6963-14-17
OpenUrl

[23] Lix LM ,

[24] Yan L ,

[25] Blackburn D , et al

[26] 6.↵

Widdifield J ,
Bernatsky S ,
Paterson JM , et al
. Accuracy of canadian health administrative databases in identifying patients with rheumatoid arthritis: a validation study using the medical records of rheumatologists. Arthritis Care Res 2013;65:n/a–91.doi:10.1002/acr.22031
OpenUrl

[28] Widdifield J ,

[29] Bernatsky S ,

[30] Paterson JM , et al

[31] 7.↵

Widdifield J ,
Bombardier C ,
Bernatsky S , et al
. An administrative data validation study of the accuracy of algorithms for identifying rheumatoid arthritis: the influence of the reference standard on algorithm performance. BMC Musculoskelet Disord 2014;15:216.doi:10.1186/1471-2474-15-216
OpenUrl CrossRef PubMed

[33] Widdifield J ,

[34] Bombardier C ,

[35] Bernatsky S , et al

[36] 8.↵

Benchimol EI ,
Guttmann A ,
Mack DR , et al
. Validation of international algorithms to identify adults with inflammatory bowel disease in health administrative data from Ontario, Canada. J Clin Epidemiol 2014;67:887–96.doi:10.1016/j.jclinepi.2014.02.019
OpenUrl

[38] Benchimol EI ,

[39] Guttmann A ,

[40] Mack DR , et al

[41] 9.↵

Chubak J ,
Pocobelli G ,
Weiss NS
. Tradeoffs between accuracy measures for electronic health care data algorithms. J Clin Epidemiol 2012;65:343–9.doi:10.1016/j.jclinepi.2011.09.002
OpenUrl CrossRef PubMed

[43] Chubak J ,

[44] Pocobelli G ,

[45] Weiss NS

[46] 10.↵

Benchimol EI ,
Manuel DG ,
To T , et al
. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol 2011;64:821–9.doi:10.1016/j.jclinepi.2010.10.006
OpenUrl CrossRef PubMed

[48] Benchimol EI ,

[49] Manuel DG ,

[50] To T , et al

[51] 11.↵

Whiting P ,
Rutjes AW ,
Reitsma JB , et al
. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25.doi:10.1186/1471-2288-3-25
OpenUrl CrossRef PubMed

[53] Whiting P ,

[54] Rutjes AW ,

[55] Reitsma JB , et al

[56] 12.↵

Youden WJ
. Index for rating diagnostic tests. Cancer 1950;3:32–5.doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3
OpenUrl CrossRef PubMed Web of Science

[58] Youden WJ

[59] 13.↵

Tabachnick BG ,
Fidell LS
. Using multivarite statistics. 2nd ed. New York: Harper & Row, 1989:p. 87–8.

[61] Tabachnick BG ,

[62] Fidell LS

[63] 14.↵

Kuss O ,
Hoyer A ,
Solms A
. Meta-analysis for diagnostic accuracy studies: a new statistical model using beta-binomial distributions and bivariate copulas. Stat Med 2014;33:17–30.doi:10.1002/sim.5909
OpenUrl CrossRef PubMed

[65] Kuss O ,

[66] Hoyer A ,

[67] Solms A

[68] 15.↵

Domencich TA ,
McFadden D
. Urban travel demand: a behavioral analysis. Amsterdam: North-Holland Publishing Company, 1975.

[70] Domencich TA ,

[71] McFadden D

[72] 16.↵

Shaffer JP
. Multiple hypothesis testing. Annu Rev Psychol 1995;46:561–84.doi:10.1146/annurev.ps.46.020195.003021
OpenUrl CrossRef Web of Science

[74] Shaffer JP

[75] 17.↵

Menke J
. Bivariate random-effects meta-analysis of sensitivity and specificity with SAS PROC GLIMMIX. Methods Inf Med 2010;49:54–64.doi:10.3414/ME09-01-0001
OpenUrl PubMed Web of Science

[77] Menke J

[78] 18.↵
SAS Institute Inc. SAS/STAT 9.3 user’s Guide. Cary, NC, 2011.

[79] 19.↵

Marrie RA ,
Yu BN ,
Leung S , et al
. Rising prevalence of vascular comorbidities in multiple sclerosis: validation of administrative definitions for diabetes, hypertension, and hyperlipidemia. Mult Scler 2012;18:1310–9.doi:10.1177/1352458512437814
OpenUrl CrossRef PubMed

[81] Marrie RA ,

[82] Yu BN ,

[83] Leung S , et al

[84] 20.↵

Butt DA ,
Tu K ,
Young J , et al
. A validation study of administrative data algorithms to identify patients with parkinsonism with prevalence and incidence trends. Neuroepidemiology 2014;43:28–37.doi:10.1159/000365590
OpenUrl CrossRef PubMed

[86] Butt DA ,

[87] Tu K ,

[88] Young J , et al

[89] 21.↵

Tu K ,
Wang M ,
Jaakkimainen RL , et al
. Assessing the validity of using administrative data to identify patients with epilepsy. Epilepsia 2014;55:335–43.doi:10.1111/epi.12506
OpenUrl CrossRef PubMed Web of Science

[91] Tu K ,

[92] Wang M ,

[93] Jaakkimainen RL , et al

[94] 22.↵

Zaslavsky AM ,
Shaul JA ,
Zaborski LB , et al
. Combining health plan performance indicators into simpler composite measures. Health Care Financ Rev 2002;23:101–15.
OpenUrl PubMed

[96] Zaslavsky AM ,

[97] Shaul JA ,

[98] Zaborski LB , et al

[99] 23.↵

Vanniyasingam T ,
Cunningham CE ,
Foster G , et al
. Simulation study to determine the impact of different design features on design efficiency in discrete choice experiments. BMJ Open 2016;6:e011985.doi:10.1136/bmjopen-2016-011985
OpenUrl Abstract/FREE Full Text

[101] Vanniyasingam T ,

[102] Cunningham CE ,

[103] Foster G , et al

[104] 24.↵

Hoyer A ,
Kuss O
. Meta-analysis of diagnostic tests accounting for disease prevalence: a new model using trivariate copulas. Stat Med 2015;34:1912–24.doi:10.1002/sim.6463
OpenUrl CrossRef PubMed

[106] Hoyer A ,

[107] Kuss O

[108] 25.↵

Schlattmann P ,
Verba M ,
Dewey M , et al
. Mixture models in diagnostic meta-analyses--clustering summary receiver operating characteristic curves accounted for heterogeneity and correlation. J Clin Epidemiol 2015;68:61–72.doi:10.1016/j.jclinepi.2014.08.013
OpenUrl

[110] Schlattmann P ,

[111] Verba M ,

[112] Dewey M , et al

[113] 26.↵

Böhning D ,
Böhning W ,
Holling H
. Revisiting Youden's index as a useful measure of the misclassification error in meta-analysis of diagnostic studies. Stat Methods Med Res 2008;17:543–54.doi:10.1177/0962280207081867
OpenUrl CrossRef PubMed

[115] Böhning D ,

[116] Böhning W ,

[117] Holling H

[118] 27.↵

Doebler P ,
Holling H ,
Böhning D
. A mixed model approach to meta-analysis of diagnostic studies with binary test outcome. Psychol Methods 2012;17:418–36.doi:10.1037/a0028091
OpenUrl CrossRef PubMed

[120] Doebler P ,

[121] Holling H ,

[122] Böhning D

[123] 28.↵

Hamza TH ,
Reitsma JB ,
Stijnen T
. Meta-analysis of diagnostic studies: a comparison of random intercept, normal-normal, and binomial-normal bivariate summary ROC approaches. Med Decis Making 2008;28:639–49.doi:10.1177/0272989X08323917
OpenUrl CrossRef PubMed Web of Science

[125] Hamza TH ,

[126] Reitsma JB ,

[127] Stijnen T

[128] 29.↵

Green SB
. How many Subjects does It take to do A regression analysis. Multivariate Behav Res 1991;26:499–510.doi:10.1207/s15327906mbr2603_7
OpenUrl CrossRef PubMed Web of Science

[130] Green SB

[131] 30.↵

Riley RD ,
Abrams KR ,
Sutton AJ , et al
. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol 2007;7:3.doi:10.1186/1471-2288-7-3
OpenUrl CrossRef PubMed

[133] Riley RD ,

[134] Abrams KR ,

[135] Sutton AJ , et al

[136] 31.↵

Reitsma JB ,
Glas AS ,
Rutjes AW , et al
. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982–90.doi:10.1016/j.jclinepi.2005.02.022
OpenUrl CrossRef PubMed Web of Science

[138] Reitsma JB ,

[139] Glas AS ,

[140] Rutjes AW , et al

[141] 32.↵

Widdifield J ,
Paterson JM ,
Bernatsky S , et al
. The epidemiology of rheumatoid arthritis in Ontario, Canada. Arthritis Rheumatol 2014;66:786–93.doi:10.1002/art.38306
OpenUrl

[143] Widdifield J ,

[144] Paterson JM ,

[145] Bernatsky S , et al

[146] 33.↵

Widdifield J ,
Moura CS ,
Wang Y , et al
. The Longterm effect of early intensive treatment of seniors with rheumatoid Arthritis: a comparison of 2 Population-based Cohort studies on Time to Joint Replacement Surgery. J Rheumatol 2016;43:861–8.doi:10.3899/jrheum.151156
OpenUrl Abstract/FREE Full Text

[148] Widdifield J ,

[149] Moura CS ,

[150] Wang Y , et al

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Introduction

Methods

Data source

Study variables

Statistical analyses

Results

Descriptive analyses

Inferential analyses

Discussion

Conclusion

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password