Article Text

PDF

A prediction model to estimate completeness of electronic physician claims databases
  1. Lisa M Lix1,
  2. Xue Yao2,
  3. George Kephart3,
  4. Hude Quan4,
  5. Mark Smith5,
  6. John Paul Kuwornu1,
  7. Nitharsana Manoharan6,
  8. Wilfrid Kouokam7,
  9. Khokan Sikdar4
  1. 1Department of Community Health Sciences, University of Manitoba, Winnipeg, Canada
  2. 2Winnipeg Regional Health Authority, Winnipeg, Canada
  3. 3Department of Community Health Sciences, Dalhousie University, Halifax, Canada
  4. 4Department of Community Health Sciences, University of Calgary, Calgary, Canada
  5. 5Manitoba Centre for Health Policy, University of Manitoba, Winnipeg, Canada
  6. 6Institute for Clinical Evaluative Sciences, Toronto, Canada
  7. 7Faculty of Sciences and Engineering Sciences, Université de Bretagne-Sud, Vannes, France
  1. Correspondence to Dr Lisa M Lix; lisa.lix{at}med.umanitoba.ca

Abstract

Objectives Electronic physician claims databases are widely used for chronic disease research and surveillance, but quality of the data may vary with a number of physician characteristics, including payment method. The objectives were to develop a prediction model for the number of prevalent diabetes cases in fee-for-service (FFS) electronic physician claims databases and apply it to estimate cases among non-FFS (NFFS) physicians, for whom claims data are often incomplete.

Design A retrospective observational cohort design was adopted.

Setting Data from the Canadian province of Newfoundland and Labrador were used to construct the prediction model and data from the province of Manitoba were used to externally validate the model.

Participants A cohort of diagnosed diabetes cases was ascertained from physician claims, insured resident registry and hospitalisation records. A cohort of FFS physicians who were responsible for the diagnosis was ascertained from physician claims and registry data.

Primary and secondary outcome measures A generalised linear model with a γ distribution was used to model the number of diabetes cases per FFS physician as a function of physician characteristics. The expected number of diabetes cases per NFFS physician was estimated.

Results The diabetes case cohort consisted of 31 714 individuals; the mean cases per FFS physician was 75.5 (median=49.0). Sex and years since specialty licensure were significantly associated (p<0.05) with the number of cases per physician. Applying the prediction model to NFFS physician registry data resulted in an estimate of 18 546 cases; only 411 were observed in claims data. The model demonstrated face validity in an independent data set.

Conclusions Comparing observed and predicted disease cases is a useful and generalisable approach to assess the quality of electronic databases for population-based research and surveillance.

  • STATISTICS & RESEARCH METHODS
  • PUBLIC HEALTH
  • EPIDEMIOLOGY

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Strengths and limitations of this study

  • This study developed a prediction model to estimate the completeness of non-fee-for-service electronic physician claims for capturing services to populations.

  • The prediction model developed in this study is an efficient and potentially generalisable tool for routine estimation of the magnitude of administrative data completeness.

  • This study emphasises that incomplete electronic physician claims data should be supplemented with other data sources, including electronic medical records, to ensure comprehensive data for chronic disease research and surveillance.

  • The study focuses on completeness of electronic physician claims databases for diabetes; the research should be extended to other chronic diseases to ensure its generalisability.

Introduction

Electronic administrative health databases are widely used for population-based health research and surveillance.1 ,2 The popularity of these databases has arisen for several reasons: they are available in a timely manner, provide information about large numbers of individuals, and are relatively inexpensive to access and use. Physician claims electronic databases, which contain information on outpatient healthcare contacts, capture information on a larger proportion of the population than inpatient hospital records, but quality of claims databases tends to be poorer than that of hospital records for which standards for data collection and coding exist.3 ,4 Studies about the quality of claims databases are therefore essential to evaluate and improve their accuracy. However, most studies about physician claims databases have focused only on the validity of diagnosis codes,5–8 while other elements of data quality that could impact on the usefulness of these data for research and surveillance have infrequently been examined.9

Incompleteness of physician claims databases, which can result in substantially biased estimates of disease prevalence and healthcare utilisation, may arise for a number of reasons. The information in these databases is used to remunerate physicians for services provided to patients, usually on a fee-for-service (FFS) basis. However, physicians not remunerated by FFS methods may inconsistently record patient encounters in these databases. Specifically, non-FFS (NFFS) physicians, who are often paid via salaries and contracts, are not always required to use the same claims submission processes as FFS physicians,10 a process known as shadow billing. Incomplete capture of NFFS physician claims can have serious consequences; previous research has demonstrated substantial underestimation of diabetes prevalence associated with a lack of shadow billing.11

Possible methods to estimate completeness of electronic administrative databases12–16 include: (1) comparing observed to expected numbers of cases, where expected cases are estimated from a parametric or non-parametric model, (2) comparing the number of cases ascertained in administrative databases to cases ascertained from a validated database, (3) using capture-recapture models and (4) conducting database audits. These methods have primarily been applied to cancer registry and hospital records, but not to physician claims databases. Therefore, the purpose of this study was to develop a population-based model to predict prevalent diabetes cases from FFS physician claims and apply it to estimate cases among NFFS physicians, for whom claims data may be incomplete. We focus on diabetes because administrative health databases have demonstrated good sensitivity and specificity for case identification using electronic administrative databases and surveillance of diabetes is of interest worldwide.6

Methods

Data sources for prediction model

Data to construct the prediction model were from the eastern Canadian province of Newfoundland and Labrador (NL), which has a population of approximately 515 000 according to the 2011 Statistics Canada Census. NL physicians remunerated by NFFS methods do not submit shadow-billed claims to the provincial ministry of health,17 while physicians remunerated by FFS methods submit all of their claims to the ministry. NL has a larger proportion of NFFS physicians than most other Canadian provinces.18

Physician claims, physician registry records, hospital discharge abstracts and insured resident registry records from 1 April 2002 to 31 March 2004 were used to conduct the study. We selected these years because the NL physician registry contains comprehensive information on all registered physicians in this time period but is incomplete in later years; the registry includes information about physician remuneration methods, sex, age, specialty, year the medical degree was obtained and health region of the practice location. Each physician claim contains a single three-digit diagnosis code recorded using the International Classification of Diseases, Ninth revision (ICD-9) and date of service. Hospital discharge abstracts contain dates of admission and discharge and up to 20 ICD-9 and ICD-10-CA diagnosis codes. The resident registry contains dates of health insurance coverage, sex, date of birth and health region for all residents eligible for health insurance benefits. Physician claims, hospital separation abstracts and insured resident registry records are linkable using a unique, anonymised patient identifier. Physician claims and the physician registry are also linkable using an anonymised physician identifier.

Study cohort for prediction model

The diabetes case cohort comprised all individuals who met a validated case definition, which requires at least one hospitalisation or at least two physician billing claims (ICD-9 code 250; ICD-10-CA code E10-E14) within a 730-day period.5 ,19 Individuals <20 years of age or without health insurance coverage at the date of the case-qualifying diagnosis were excluded. For cases ascertained from hospital discharge abstracts, the date of the case-qualifying diagnosis was the date of hospital admission; for cases ascertained from physician claims, the date of the case-qualifying diagnosis was the date of the physician claim for the second diagnosis within the 730-day period. Diabetes cases were classified into three mutually exclusive groups: (1) cases ascertained only from hospital discharge abstracts, (2) cases ascertained from physician claims for which the case-qualifying diagnosis was from a FFS physician and (3) cases ascertained from physician claims for which the case-qualifying diagnosis was from a NFFS physician. The last group is comprised of cases from the claims of a small number of NFFS physicians who receive a portion of their remuneration by FFS payments. While cases in the latter two groups could have a hospital discharge abstract with a diabetes diagnosis, they qualified as a case based on having at least two physician billing claims with a diabetes diagnosis.

The physician cohort included all members of the physician registry who had at least one claim for an individual in the diabetes case cohort. Each physician was assigned to each member of the diabetes case cohort in the second and third groups based on the physician identification number found on the billing claim for the case-qualifying diabetes diagnosis.

Statistical analyses for prediction model

The diabetes case and physician cohorts were described using means, SDs, medians, frequencies and percentages. The mean and median number of diabetes cases per physician was estimated and stratified by physician cohort characteristics.

A multivariable generalised linear regression model with a γ distribution was fit to the number of diabetes cases for each FFS physician.20 The model covariates were years since specialty was received (quartiles; reference=lowest quartile), physician sex (reference=female), health region of practice (reference=Labrador, a remote region of NL) and specialty (reference=specialist). Years since specialty was highly correlated with years since medical degree and age (r≥0.80), hence the latter two variables were excluded. A main effects model was compared with a model with main and two-way interaction effects.20 Penalised goodness-of-fit measures, including the Akaike Information Criterion (AIC),21 were used to select the best fit model. The ratio of the deviance to degrees of freedom was used to assess model dispersion.

Model validation

We selected the Canadian province of Manitoba (MB) for external validation, which has a population of 1.2 million according to the 2011 Statistics Canada Census. NFFS physicians in this province submit shadow-billed claims to the provincial ministry of health. Watson et al22 reported that among family physicians practising in Winnipeg, the only major centre in MB (680 000+population), up to 90% of physicians remunerated by NFFS methods submit claims for services provided to patients. However, rates of shadow billing are expected to be lower in other regions of the province.

The same data sources were available in MB as in NL, with minor differences in database characteristics. Specifically, physician claims in MB contain diagnosis codes based on ICD-9-CM (ie, Clinical Modification).23 The MB physician registry does not contain information on year of medical degree. Five health regions, defined by the ministry of health for planning the delivery of healthcare services, were used to identify patient residence and physician practice locations.

Internal validation was conducted for both the NL and MB models. Measures of prediction accuracy, which included bias, mean absolute error (MAE) and root mean square error (RMSE),24 were calculated based on 10-fold cross-validation.25 ,26

Model prediction

The final fitted model for NL was used to predict the number of prevalent diabetes cases per NFFS physician. However, given that not all NFFS physicians provide services to patients with diabetes, we used the ratio of FFS physicians in the physician cohort to the total number of FFS physicians in the province27 to select a random prediction sample. A similar process was used to predict the number of cases from the MB data. In MB, we also compared the predicted number of diabetes cases for NFFS physicians to the observed number of cases from the shadow-billed claims of NFFS physicians.

The total number of prevalent diabetes cases in each province was estimated as the sum of: (1) observed cases ascertained from hospital discharge abstracts only, (2) observed cases ascertained from claims of FFS physicians, (3) predicted cases for NFFS physicians. Denominators of the prevalence estimates were based on 2001 Statistics Canada Census data; 95% CIs were calculated using the binomial distribution.

All analyses were conducted using SAS V.9.3. Data access approval was provided by the Newfoundland and Labrador Centre for Health Information and the Manitoba Health Information Privacy Committee.

Results

Descriptive analyses

A total of 31 714 prevalent diabetes cases were identified from the NL administrative data (table 1); 91.1% (n=28 989) of cases were identified from billing claims of physicians remunerated by FFS, while 1.3% (n=411) of cases were ascertained from billing claims submitted by NFFS physicians who received a portion of their remuneration by FFS. Almost two-thirds (60.7%) of diabetes cases from FFS physician claims were residents of the Eastern health region, which contains the largest city in NL (200 000+ population); 40.5% were 65+ years.

Table 1

Characteristics of diabetes case cohort by ascertainment source and province

In the MB external validation data, 51 031 prevalent diabetes cases were identified (table 1), of which 84.1% were ascertained from the billing claims of FFS physicians. Three-quarters (75.9%) of prevalent cases ascertained from the shadow-billed claims of NFFS physicians were from non-Winnipeg health regions.

There were 388 individuals in the NL physician cohort (table 2). Among FFS physicians (93.3%), the majority were general practitioners (80.4%), and most were from the Eastern health region (71.3%). The MB physician cohort contained more than 1200 physicians, of which 80.4% were FFS physicians. Among these FFS physicians, more than half (57.8%) were in the 40–64 years age group. The NFFS physicians (n=270) were primarily <40 years (68.5%) and almost 80% practiced outside of the urban Winnipeg health region.

Table 2

Characteristics of the physician cohort by method of remuneration and province

Table 3 describes the mean and median number of prevalent diabetes cases per FFS physician. In NL, the average number of prevalent cases per FFS physician was 75.5 and the median was 49.0. The mean and median were higher for general practitioners than for specialists and also for males than females. For MB, the average number of prevalent diabetes cases per FFS physician was 43.4 and the median was 25.0.

Table 3

Mean (SD) and median number of prevalent cases in the diabetes case cohort per fee-for-service physician in the physician cohort

Prediction model

For NL, the main effects model provided a good fit to the data, as judged by the ratio of model deviance to degrees of freedom (ratio=1.0) and the AIC was smaller for a main effects model than for one with main and two-way interaction effects (3833.1 vs 3830.4); likelihood ratio tests revealed statistically significant main effects for sex (p<0.0001) and years since specialty (p=0.0006).

The regression analyses produced similar results in the MB external validation data; the ratio of model deviance to degrees of freedom was close to 1.0 for the main effects model. The model with main and two-way interaction effects resulted in a negligible decrease in the AIC. The main effects of sex (p<0.0001), specialty (p=0.0021) and years since specialty licensure (p<0.0001) were statistically significant.

With respect to the internal cross-validation, for the NL model absolute bias estimates ranged from 0.2% to 12.9% across the 10 data folds, while for the MB model the estimates ranged from 0.6% to 13.8%. The MAE ranged from 40.1 to 67.5 for the NL model and from 26.7 to 43.2 for the MB model. Finally, the RMSE ranged from 56.5 to 131.2 for the NL model and from 33.8 to 151.0 for the MB model.

Using the MB model results, we compared the observed and expected number of prevalent diabetes cases per FFS and NFFS physician (table 4) for the entire province and by health region of practice. The provincial and regional figures were similar for FFS physicians, supporting the internal validity of the model. For NFFS physicians, the expected number of cases was 51% higher than the observed number for the entire province. When we examined these values by health region, we found that the expected value was 8.2% lower than the observed value for the Winnipeg health region. However, for the remaining health regions, the expected values were much higher than the observed values.

Table 4

Observed and predicted average number of diabetes cases per fee-for-service (FFS) and non-FFS (NFFS) physician in Manitoba's physician cohort

Figure 1 shows the percentage of diabetes cases ascertained from each data source in both provinces. In NL, the prediction model resulted in a 37.2% increase in the number of diabetes cases ascertained from the administrative databases, while in MB it resulted in a 16.3% increase. In NL, crude diabetes prevalence based on cases ascertained only from hospital data and FFS physician claims was 8.1%, while the estimate based on observed and expected cases was 13.0% (95% CI 12.9% to 13.0%). In MB, the crude diabetes prevalence estimate based on cases ascertained from hospital data and FFS physician claims was 5.6%, while the estimate based on both observed and expected cases was 6.7% (95% CI 6.7% to 6.8%).

Figure 1

Per cent of observed and predicted diabetes cases by ascertainment data source and Canadian province (FFS, fee-for-service; NFFS, non-FFS).

Discussion

This study developed a prediction model for linked administrative health databases to estimate the completeness of electronic physician claims data; the model was applied to estimate under-ascertainment of prevalent diabetes cases but could be applied to other chronic or acute conditions that are primarily managed or treated in non-acute care settings. When the model was applied to data from the Canadian province of NL, the results revealed that close to 40% of diabetes cases were missed because NFFS physicians do not report contacts with patients in claims data. When the model was externally validated in MB, a province in which some NFFS physicians submit some claims, the modelling results indicated that <20% of diabetes cases were missed, but this percentage varied substantially by region; there was less bias in the Winnipeg health region, which contains the largest city in MB, and more substantial bias in non-Winnipeg health regions where there is a higher proportion of NFFS physicians.

Data from the 2005 Canadian Community Health Survey,28 a national survey used for regional chronic disease surveillance, revealed a crude diabetes prevalence of 6.8% for NL and 4.4% for MB for the population 12+years, a difference of more than 50%. When we compared crude prevalence estimates for the two provinces using only FFS claims and hospital records, rates in NL were just 8.9% higher than those in MB. However, after adjustment for potential missed cases using our prediction model, crude prevalence was 45.1% higher in NL than in MB, producing a similar difference in estimates to those observed in survey data.

Incomplete capture of claims for NFFS physicians is similar to unit non-response in survey data, both of which can bias parameter estimates and increase variance estimates. Unit non-response in surveys is often difficult to adjust for, because information about non-responders is rarely available to the researcher. In fact, administrative data have been used in previous research to estimate the effect of survey non-response bias in estimates of healthcare use.29 However, our study suggests that the use of administrative data for evaluating survey non-response should be adopted with caution, as administrative databases may themselves be incomplete.

While the proposed prediction model provides a useful tool to estimate bias in disease prevalence due to incomplete claims data, it is equally important to consider how other databases can be used to address gaps in these data. Electronic medical records are increasingly being adopted in population-based chronic disease research and surveillance studies,30 and could represent an important additional source of data for case ascertainment. Pharmacy databases have also been used for case ascertainment31 when the medications used for disease treatment have high specificity for case capture.

Limitations of the study include the restricted set of explanatory variables available to develop the prediction model. Residual confounding due to factors such as physician productivity,10 type of practice and even characteristics of the patients seen by a physician may affect prediction accuracy.32 Strengths of the study include the use of a validated case definition to ascertain diabetes cases and the internal and external validation process.

Further research could examine the validity of the prediction model by applying it to other chronic diseases and in other jurisdictions;33 the utility of the model is not limited to Canadian administrative data, as a similar approach has been proposed to evaluate the completeness of cancer registry data.16 Simulation could also be used to assess the impact of patient, physician and health system characteristics on estimates of completeness.34 For example, the model assumes that physician characteristics will have the same distribution and association with the number of prevalent diabetes cases in FFS and NFFS populations, which may not be a valid assumption.35

In summary, this study revealed that completeness of physician claims data are associated with method of physician remuneration and that a predictive model can be used to estimate the magnitude of data incompleteness for disease surveillance. This predictive model makes use of routinely collected linked data, and therefore is feasible to implement over time and across jurisdictions.

Acknowledgments

The authors are indebted to Manitoba Health, Healthy Living, and Seniors (HIPC 2012/2013-04) and the Newfoundland and Labrador Centre for Health Information for the provision of data.

References

View Abstract

Footnotes

  • Contributors LML, GK, HQ, MS and KS designed the analysis and acquired the study data. JPK, XY, NM and WK conducted the analyses. LML, JPK and NM drafted the manuscript and all remaining authors read and revised it substantially. All authors approved the final version of the manuscript before submission.

  • Funding This research was funded by the Canadian Institutes of Health Research (funding reference number 123357). LML was supported by a Research Chair from the Manitoba Health Research Council.

  • Competing interests None declared.

  • Ethics approval University of Manitoba Health Research Ethics Board and the NL Health Research Ethics Board.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.