Article Text

Download PDFPDF

Identifying factors associated with high use of acute care in Canada: protocol of a population-based retrospective cohort study
  1. Mengmeng Zhang1,
  2. Jinhui Ma1,
  3. Feng Xie1,2,
  4. Lehana Thabane1,3
  1. 1Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
  2. 2Centre for Health Economics and Policy Analysis (CHEPA), McMaster University, Hamilton, Ontario, Canada
  3. 3Biostatistics Unit/FSORC, Saint Joseph’s Healthcare Hamilton, Hamilton, Ontario, Canada
  1. Correspondence to Dr Lehana Thabane; thabanl{at}


Introduction High-cost users (HCUs) account for a small proportion of the population but use a disproportionately large share of healthcare resources. Although HCUs exist in all healthcare types, acute care is the most expensive type of service and the most significant contributor to expenditures among HCUs. This study aims to determine demographic, socioeconomic and clinical factors associated with being HCUs in adult patients (≥18 years) receiving acute care in Canada.

Methods and analysis This is a population-based analysis using a national linked dataset. Adult patients who had at least one interaction with acute care facilities each year from 2011 to 2014 were captured in the dataset, and those living in institutions or other collective residences were not covered. The primary outcome is HCU of acute care (yes/no), which is defined as whether a patient is within the top 10% of the highest acute care cost users in his/her province. Multilevel logistic regression will be used to identify factors associated with HCU and to examine the provincial variations of these identified risk factors. Sensitivity analyses investigating the influences of different high user definitions and missing data on the study results will also be performed.

Ethics and dissemination All researchers will follow the codes and rules set by Statistics Canada and the Research Data Centre and give priority to the confidentiality of the data during and after this study. The study findings will be published in peer-review journals and disseminated at academic conferences.

  • health economics
  • health policy
  • accident & emergency medicine

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The linked dataset used in our study allows us to analyse a broad range of demographic, socioeconomic and clinical factors that are potentially associated with high use of healthcare resources.

  • This study will be conducted using national data, allowing us to compare similarities and differences of the acute care high-cost user characteristics across different provinces in Canada.

  • The variations of patient characteristics across different definitions on high system users will be investigated through sensitivity analyses.

  • Populations who did not participate in the surveys (eg, people living in institutions or other collective residences) are not included in our analysis and thus might be under-represented.

  • The information for each patient is retrieved from different databases with different data collection time, and there is a time lag between the demographic and socioeconomic information collected within the Census 2006/National Household Survey 2011 and the status of patients being high system users or non-high system users from 2011 to 2014.


Acute care is a type of short-term care for patients who are sick or injured or in the process of recovery from treatment. It involves emergency medicine, trauma care, prehospital emergency care, acute care surgery, critical care, urgent care and inpatient care.1 2 Acute care is crucial to the prevention of death and disability but meanwhile the costliest healthcare type in developed countries including Canada.2–4 Recent studies revealed that acute care accounted for the largest share of healthcare expenditures (28.3%) in Canada in 2018 and was expected to increase by 2% in 2019.5 6 The substantial and growing demand for acute care services places pressure on healthcare systems and calls for further research to understand the distribution and determinants of healthcare costs and strategies to reduce it.

Research shows that a small proportion of patients consume a disproportionately large amount of resources. A recently published systematic review demonstrated that roughly 68%, 55% and 24% of healthcare costs were, respectively, spent on 10%, 5% and 1% patients in developed countries or regions.7 In Ontario, 61% of hospital and community care expenditures were consumed by only 5% of patients.4 As for the utilisation of physician services, 30% of physician services were used by 5% of patients in British Columbia.8 These high system users (HSUs) are commonly defined using metrics such as cumulative costs, length of stay, frequency of hospitalisations and frequency of emergency department (ED) visits.9 Research has demonstrated that HSUs were about 12 times more likely to have hospital admissions (69.2% vs 5.4%) and eight times more likely to die (13.0% vs 1.7%) than non-high users within 2 years following their index physician visits.8 Also, an increased number of ED visits among patients with mental disorders is associated with higher mortality within 2 years following their index ED visits.10 In the context of rising healthcare costs and limited resources, the poorer health outcomes and higher mortality rates in HSUs make understanding the consumption of healthcare resources by HSUs a critical step to improve the efficiency and sustainability of healthcare systems.4 11

According to previous studies, acute care is the largest source and driver of spending among HSUs.12 A study estimated that acute care accounted for 62% of the high-cost user (HCU) costs in Ontario.4 Meanwhile, compared with non-HCUs, HCUs were found to be more likely to use acute care.12–14 A range of contributors to high acute care costs has been identified in previous studies conducted in Canada, which include older age, socioeconomic disadvantages (eg, personal or family low-income status) and medical complexity (eg, a higher level of comorbidities).7 13–17 However, few studies have adopted a national perspective and addressed the provincial variations in characteristics of HCUs of acute care. Prior studies have shown that age, sex and socioeconomic distributions of populations and delivery of healthcare services vary across the provinces of Canada.18 19 Large provincial variations exist regarding patients’ access and experience of healthcare services, and the performance and quality of the healthcare system.20 These differences might result in variations in HCU characteristics. Furthermore, most studies were conducted based on healthcare administrative databases or health surveys, which limited their capacity to address individual socioeconomic characteristics that could be heterogeneous among clinically similar patients and amenable to interventions.13–17 Thus, there is a lack of evidence on risk factors, especially socioeconomic factors, associated with being HCUs of acute care at the national level, and it is unknown how these factors may vary across provinces. This study aims to bridge these knowledge gaps and to identify socioeconomic, demographic and clinical factors associated with being HCUs of acute care in adult patients (≥18 years) in Canada.

Methods and data analysis

Data sources and study population

We will perform a retrospective cohort study using a national linked dataset on high users, the HSUs linked to T1 Family File-Census of the Population Long-Form-National Household Survey (HSUS-T1FF-NHS, hereinafter referred as ‘the linked dataset’), released by Statistics Canada and the Canadian Institute for Health Information (CIHI).17 The confidential Master Data File for the linked dataset will be used to address our research question.21 The original cohorts in the linked dataset were generated anonymously from CIHI’s internal datasets: Discharge Abstract Database (DAD), National Ambulatory Care Reporting System (NACRS) and Ontario Mental Health Reporting System (OMHRS). Only patients captured by DAD at least once between the fiscal years of 2011/2012 and 2014/2015 are included in the linked dataset. Those who have interactions with only NACRS or OMHRS but not DAD are not included. Using encrypted patient identity numbers, anonymised patient records from the T1FF, the 2006 Census of Population Long-Form (2006 Census 2B), and the 2011 NHS will be linked to all the cohorts at the individual level.22 The T1FF database includes all individuals who have filed a T1 tax return.23 The cross-sectional 2006 Census 2B enumerated the entire Canadian population, including Canadian citizens, landed immigrants and non-permanent residents (people who hold a work permit or study permit) and their families living in Canada.24 In this Long-form Census, 20% of private dwellings in Canada were selected. Canadian citizens living temporarily outside Canada, full-time members of the Canadian Forces stationed outside Canada and people who live in collective dwellings (eg, hospitals, nursing homes and hotels) were not included. The cross-sectional NHS provides information about all persons living in Canada except for foreign residents and those excluded from the long-form Census.25 These three databases complement each other to provide sociodemographic and socioeconomic information of the included population. The DAD database contains demographic, administrative and clinical data on all discharges from acute inpatient facilities.26 Thus, the linked dataset will provide combined information about the patients’ hospitalisations as well as their demographic and socioeconomic characteristics.21

The linked dataset involves eight subset cohorts to reflect the definitions of HSUs regarding acute care cost, the total length of stay, the number of hospitalisations and the number of ED visits each year in adults (≥18 years) and children.22 The cut-off value of 10% is used to define HSUs across all the above cohorts. The acute care costs in the linked dataset are calculated using provincial values of Cost of a Standard Hospital Stay (CSHS) in conjunction with Resource Intensity Weights (RIW).22 The CSHS is the average full cost of treating an average acute inpatient in a hospital which measures the cost efficiency of the hospital’s acute care services.22 The RIW estimates each patient’s relative cost weight compared with the average acute inpatient to measure the intensity of resource use. The total length of stay, number of hospitalisations and number of ED visits are annually cumulative values for each patient. Our study will focus on the adult acute care cost cohort, which covered adult patients (≥18 years) who have been hospitalised in an acute care facility from the fiscal year 2011/2012–2014/2015. In this cohort, HCUs are defined as the top 10% of each province’s highest acute care cost adult users (≥18 years old) every year from the fiscal year 2011/2012–2014/2015.22 Non-HCUs are patients randomly selected from the remaining 90% of adult patients in the same province each year with a sample ratio of 4:1 in the linked dataset. The rest of the adult patients in this acute care cost cohort are neither HCUs nor non-HCUs. No matching between the HCU group and the non-HCU group was performed in the linked dataset.

The definitions and selections of HCUs and non-HCUs in our analysis will be consistent with the methods used in the linked dataset for the following reasons. First, the raw cost data of the linked dataset is unavailable, and patients’ status of being HCU, non-HCU or neither has already been classified in the dataset provided to us. Second, although not all remaining 90% of patients are included as comparators, the sampling ratio of 4:1 between the HCU group and the non-HCU group is adequate for the power of our study and efficient for our analysis.27 28 Third, given the objective of our study being risk factor identification rather than effect or hazard measurement, matching is not necessary.28

Selected variables

For the primary purpose of this study, variables will be selected according to the findings of previous studies and identified from the linked dataset. The descriptions of the data sources in our study are present in table 1, while the dependent variable and the potential independent variables in our study are listed in table 2.

Table 1

Descriptions of data sources to be used in the analyses

Table 2

Dependent and independent variables selected for the study

Dependent variable (outcome measure)

The dependent variable for the primary analysis in our study is a dichotomous variable indicating whether a patient is an HCU or non-HCU of acute care. A patient is an HCU if he/she is among the top 10% of his/her province’s highest cumulative acute care cost adult patients in a specific fiscal year. Non-HCUs are patients who were randomly selected from the remaining 90% of that year’s adult acute care cohort. Given the fact that the characteristics of HSUs can be different when different metrics are used to define this population,9 dependent variables including the status of being HSU or non-HSU defined by the total length of stay, the number of hospitalisations and the number of ED visits will be used in sensitivity analyses to examine the robustness of primary analysis results and to explore different HSU characteristics across different definitions of HSUs (table 3). All the dependent variables will be obtained from the HSUS database.

Table 3

Proposed methods for primary analysis and sensitivity analysis

Clinical factors

Patients’ admission type and diagnosis codes will be obtained using the DAD.26 The admission type reflects the circumstance under which a patient is admitted and indicates the priority and urgency of his/her admission. It could range from urgent to elective. The diagnosis codes are the International Classification of Diseases, 10th Revision, with Canadian Enhancements (ICD-10-CA) codes assigned to patients. They will be used to classify a patient’s major condition to one of the 31 categories of the Elixhauser comorbidity index and to derive the Elixhauser comorbidity score with the van Walraven algorithm.29–31

Demographic and socioeconomic factors

The variables that will be included in our study are listed in table 2. Demographic factors include age, sex, rurality of residence, marital status, immigrant status and visible minority. Socioeconomic factors include work activity during the reference year, occupation classification, the after-tax low-income status of a family, income adequacy deciles among Canadian residents and the highest level of education. The rurality of a patient’s residence will be categorised into rural or urban areas that accommodate the categorisations in Census 2006 and NHS 2011.24 25 As for patients’ income, the after-tax income and the income adequacy deciles among Canadian residents will be used.23 Using the low-income measure after tax (LIM-AT) as the cut-off value, a fixed 50% of median census family income with adjustment for family needs, the after-tax family income will be classified into two levels: low income and non-low-income.23 The income adequacy deciles will be used to assess the extent to which a person’s income gap ratio is compared with the LIM-AT value among the Canadian population.23 The higher the decile is, the larger the after-tax income gap ratio is, and the less adequate the person’s family income is for his/her family needs. The other variables will be classified according to the options in the dataset.

Sample size

As a general rule, the number of events per variable (EPV) should be at least 10 to prevent major problems in logistic regression (eg, overestimation or underestimation of regression coefficients).32 33 To be more conservative, we will use an EPV of 20 in our study. Considering that there are 12 variables with 21 degrees of freedom in our study, the minimum number of events (ie, being HCUs) is 420. Since the sampling ratio between the HCU group and the non-HCU group is 4:1 in our study, the minimum sample size will be 2100. Using the data published on the website of the linked dataset,22 the province with the smallest number of HCUs has a total number of HCUs and non-HCUs of around 7000 over the 4-year period, which is larger than the minimum sample size and sufficient to do the analysis.

Data analyses

The data analysis will involve two steps. The first step will be to compare the characteristics of HCUs and non-HCUs (table 4). Categorical variables will be summarised using count and percentage, and continuous variables will be summarised using mean and SD for normally distributed data and median and IQR for non-normally distributed data. The second step will be multilevel mixed-effect logistic modelling to identify risk factors associated with the high acute care cost. The plans for primary analysis and sensitivity analyses in this step are listed in table 3. For the included clinical, demographic and socioeconomic factors, we hypothesise that older age, being male, increased rurality, being low-income, being immigrant, being visible minority, a higher level of comorbidity, certain types of conditions and occupations are associated with higher risk of being HCUs.7 11 34 35 Being married, higher work activity and having a certificate, diploma or degree are associated with decreased risk of being HCUs.7 The significance level of 0.05 will be used to identify significant factors. For each independent variable, the unadjusted OR and 95% CI will be estimated to determine if it is statistically significant (table 5).

Table 4

Characteristics of HCUs of acute care, 2011/2012–2014/2015, HSUS-T1FF-CENSUS-NHS

Table 5

Results of primary analysis and sensitivity analysis, HSUS-T1FF-CENSUS-NHS

Given that the response variable is dichotomous and the data is hierarchical within which some individuals are nested within the same province, we will conduct the mixed-effects logistic regression to address the dependence between observations and to explore the provincial variations of risk factors associated with being HCUs or non-HCUs.36 The technique of multilevel model (MLM) estimating subject-specific effect rather than the generalised estimation equations (GEE) approach estimating marginal or population-averaged effect will be used in our study.37 38 Although both methods are commonly used for the analysis of binary outcome data violating the independence assumption of traditional regression models, MLM treats dependence between observations as interest, and it is the more efficient way to account for the dependence. In contrast, GEE treats it as a nuisance.37 38 With the capacity of partitioning the covariance structure of the outcomes within and between provinces, MLM is more appropriate to address our research question.

Because the management and delivery of healthcare services in different provinces in Canada are highly decentralised,39 we treat province as a random effect rather than a fixed effect in our primary analysis. The analysis will model within-province and between-province variations simultaneously. Patient-level risk factors of being HCUs are listed in table 2. Since province will be included in the model as a random effect, province-level predictions such as the percentage of patients older than 65 years old and gross domestic product per capita will not be included as the independent factors in our analysis. Interactions between predictors will be explored through consulting experts in health economics. Sensitivity analyses will also be performed to investigate variations when high users are defined using different metrics including the length of stay, frequency of hospitalisations and frequency of ED visits and to examine the robustness of findings when missing data are handled using different methods (complete case analysis and multiple imputation by fully conditional specification algorithm).23 37

There are assumptions in logistic regression models: linearity between logit and independent variables, absence of multicollinearity and binomial distribution of errors.40 A violation of any assumptions can result in a biased or invalid effect estimate. Thus, these assumptions will be tested in our study. A smoothed scatter plot will be used to check the linearity of the logit graphically. The fractional polynomial method will also be used to test the assumption of linearity in the model.40 If the test result is not significant, then there is linearity in the logit. For collinearity, the tolerance statistics will be used.40 A tolerance of less than 0.20 indicates a concern of collinearity among the independent variables, and highly correlated risk factors will be removed from the model.41

The overall fit of the established model will be assessed using C-statistic.40 A value over 0.7 indicates that the developed model is good. All the data analyses will be performed using R statistical software, V.4.0.1.

Patient and public involvement

There will be no patient or public participation in the design, conduct, reporting and dissemination of this study.

Implications of the study results

It is known that heterogeneity exists among high-need, high-cost patients.9 42 Our study will provide insights into the understanding of the heterogeneity and social complexity of acute care HCUs and inform HCU predictions and policy-makings. The national perspective adopted in our study will provide a full picture regarding demographic, socioeconomic and clinical characteristics of acute care HCUs for Canada. By examining potential provincial variations of HCU characteristics in our study, the findings could be potentially useful to inform decision making at the provincial level. It is possible that HCUs in Northern Canada could be younger with more acute disorders compared with HCUs in other provinces, which may require more strategies targeting the prevention of acute disorders rather than the management of chronic disorders. By exploring the variations of high user characteristics across different definitions of HSUs, the results can be used to develop management strategies with specific target metrics such as hospital admission or ED visits.

Ethics and dissemination

The researchers in this study will follow the Code of Conduct and the Values and Ethics Code of Statistics Canada and the security and confidentiality requirements of the Research Data Centre (RDC) at McMaster University. Only researchers who are listed on the Microdata Research Contract and have completed the personnel security clearance can examine the data using authorised computers in the RDC. High priority will be given to the confidentiality of respondents’ personal information in the database. All researchers will adhere to the principles of physical protection, computer protection, confidentiality vetting and ‘deemed employee’ responsibilities to maintain the culture of confidentiality. The manuscript will be reviewed by all researchers and submitted by the principal investigator on behalf of the research team. The study findings will be communicated in peer-reviewed journals and academic conferences to inform further research and policymaking about HCUs.



  • Contributors MZ and LT conceptualised the study. FX and JM were involved in the design of the study. MZ drafted the manuscript, and all the other authors contributed to the revision of it. All the authors have read and approved the final manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.