Article Text

Download PDFPDF

Medical expenditure clustering and determinants of the annual medical expenditures of residents: a population-based retrospective study from rural China
  1. Yan Zhang1,2,
  2. Shan Lu1,2,
  3. Yadong Niu1,2,
  4. Liang Zhang1,2
  1. 1 School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
  2. 2 Research Centre for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, Hubei, China
  1. Correspondence to Dr Liang Zhang; zhangliang{at}


Objective To identify the characteristics of high-cost (HC) patients and the determinants of the annual medical expenditures of Chinese rural residents.

Methods Medical expenditure clustering was performed by Lorentz curve and Gini index. T and X2 tests were performed to identify the characteristics of the respondents, and a multilevel regression model examined the determinants of their annual medical expenditures.

Design A cluster sampling study was performed to identify those residents who availed healthcare services and to assign them to HC (top 5%), moderate-cost (top 30%) and low-cost (others) groups based on their annual medical expenditures.

Setting The annual healthcare utilisation was calculated by using data from the population-based database of the 2014 New Rural Cooperative Medical System.

Participants A total of 478 051 residents who availed healthcare services were recruited for the retrospective study in 2014. The annual medical expenditures of these residents were used as the research object.

Results The total medical expenditures of Macheng city residents for the year 2014 have a Gini index of 0.81 and around 68.01% of these expenditures can be attributed to HC patients. Female residents (51.5%) and persons aged over 60 years (34.48%) who are suffering from diseases that are difficult to diagnose have a high tendency to accumulate high medical costs. The annual medical expenditures of people living in the same village or town tend to be approximated. Age, disease category, inpatient status, healthcare utilisation and utilisation level are identified as the determinants of annual medical expenditures.

Conclusions The medical expenditures of rural residents are clustered at a remarkably high level, and HC patients are suffering from high economic burden. Therefore, policy-makers must guide these patients in seeking appropriate healthcare services and improve their management of healthcare quality to reduce the unnecessary healthcare utilisation of these patients.

Trial registration number ChiCTR-OOR-14005563.

  • medical expenditure
  • high-cost
  • health economics
  • patient flow

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study is the first to introduce the medical expenditure clustering technique, the findings of which can supplement the results of previous research on high-cost patients.

  • The annual medical expenditures of residents are seldom reported at the population level. This study was conducted in Macheng, a city in Hubei province with 889 160 residents according to the New Rural Cooperative Medical System database, which also stores the inpatient and outpatient records of these residents.

  • The Lorentz curve and Gini index were used to cluster the annual medical expenditure data, and a three-level linear regression model was used to aggregate these data at the residential and town levels.

  • The age, gender, hospitalisation, geographical and disease data of the sample were included in the regression model. However, some individual factors were not included in the model.


The rapid increase in health expenditures greatly impedes the development of the New Rural Cooperative Medical System (NRCMS), the largest basic social health insurance system in rural China that covers 603.46 million rural residents. Specifically, the health expenditures per capita in China have increased from ¥513.8 (US$83.6) in 2012 to ¥1279.2 (208.2) in 2017 with an annual growth rate of 25.6%, which is much higher than the annual growth in fundraising per capita (16.02%).1 Medical expenditure clustering is considered an important factor that motivates such rapid increase in health expenditures.2

Studies on the distribution of diseases within a population have defined clustering as the uneven distribution of disease morbidity in time or space.3 4 In line with this definition, medical expenditure clustering indicates the uneven distribution of the medical expenditures of a given population. In recent years, researchers have shown great interest in medical expenditure clustering and have specifically focused on high-cost (HC) patients, who are known for accumulating high annual medical expenditures5 and comprise the top 5% biggest spenders in healthcare.2 Previous studies have revealed that the medical expenditures of HC patients exceed those of the entire population by 50%.6 For instance, in 2014, HC patients account for 52.3% of the total medical expenditures in the USA.5 These patients and their healthcare quality management have attracted much research attention because of their high healthcare utilisation rate and inappropriate utilisation of healthcare services.7 Improving the healthcare quality of these patients can discourage such inappropriate utilisation, reduce their health expenditures, conserve social health insurance funds and promote horizontal equity.

Medical expenditure clustering has also become a major concern in low-income and middle-income countries. In rural China, the rapid development of NRCMS significantly promoted the healthcare utilisation rates of residents. For instance, the annual hospitalisation rate in rural China increased from 8.7% in 2008 to 14.9% in 2017.1 An empirical analysis of seven counties in China revealed that 78.6% of inpatient services in 2015 were distributed among one-third of all inpatients in the area. In addition, one patient in Qianjiang District used inpatient services 27 times in 2014. Another study involving 12 600 families in Jiangsu revealed that HC patients accounted for 44.9% of the total medical expenditures of the entire population.8 Moreover, the medical expenditures clustered at the patient level are much higher than those clustered at the family level. We also suspect that China has a very high degree of medical expenditure clustering and that HC patients incur unnecessary medical expenditures because of the fragmented healthcare delivery system in rural China. Residents in rural China also seek healthcare services from a three-tier (village–town–county) healthcare delivery system where the higher tiers provide better services and charge higher medical costs.9 Given that patients neither follow a specified order nor have limitations when seeking healthcare services and that general practitioners or consultants are unavailable in most parts of rural China, the residents of rural areas tend to make uninformed decisions when choosing among hospitals and various types of healthcare services, thereby leading to their inappropriate utilisation of healthcare services. For instance, some of these residents, especially HC patients, may be given inpatient services when they actually require outpatient services and may be unnecessarily admitted to higher-level hospitals, thereby incurring higher medical costs.10 Yingchun revealed that the inappropriate hospital admission rate in five counties reached as high as 27.6% in 2014.11

To reduce the economic burden of rural residents in China, several policies and strategies, such as the Tiered Healthcare System and Serious Illness Medical Insurance (2016), have been proposed. However, the effectivity of these programmes has received mixed feedback and the focus of these policies remains unsubstantiated, which may be ascribed to the difficulty in identifying high-demand and HC patients.12 Waxmonsky et al argued that those HC patients with multiple or complex conditions, behavioural disorders or socioeconomic problems are particularly difficult to monitor.13 Moreover, only few studies have investigated the characteristics and determinants of HC patients by using population-level data because of the lack of necessary data for exploring their annual medical expenditures. Therefore, the critical values, average expenditures and inpatient service utilisation of HC patients in rural China remain unclear.

Previous studies reveal that HC patients often maintain a high level of medical expenditure for the following year. For instance, Robst found that 49.2% of patients under the Florida Medical Assistance Programme (Florida Medicaid) were continuously classified as HC patients from 2005 to 2010.14 Meanwhile, Wodchis et al found that one-third of residents with public health insurance in Ontario, Canada continuously incurred high medical expenditures from 2009 to 2011.15 Therefore, while HC patients are assumed to possess certain characteristics, the distribution of their medical expenditures remains unclear.

The growth of health expenditures must be controlled and the efficiency of insurance funds must be enhanced to improve the health insurance system of a specific area. Medical expenditure clustering can be used to monitor the cost efficiency of healthcare services. Based on the above findings, identifying HC patients is a necessary procedure and clustering the medical expenditures of a given population must be taken as the first step towards achieving such goal.

This research focuses on medical expenditure clustering in particular as well as on the distribution and characteristics of HC patients and the determinants of the annual medical expenditures of residents in general. This study also aims to guide policy-makers and health planners in predicting and planning the future needs of these patients.


Study setting

We calculated the annual medical expenditures of residents based on the outpatient and inpatient services that they have availed within a calendar year. A population-based retrospective study was performed in Macheng, a typical rural area in Hubei and county-level city in central China (figure 1). Macheng has a total population of 889 160 and a GDP per capita of ¥22 758 (US$3704.83).

Figure 1

Map of Macheng city and geographical distribution of the residents.

Macheng has 2 county hospitals, 22 township hospitals and 207 village clinics. The residents of this city are enrolled in NRCMS, which reimburses the medical expenditures of these residents for any type of healthcare service (eg, outpatient and inpatient services in various medical institutions, including tertiary hospitals in urban areas). The healthcare utilisation information of these residents is recorded in the NRCMS database whenever they request for a reimbursement. Therefore, this study uses the 2014 Macheng city data from the NRCMS database.

Data processing

Those residents of Macheng who availed healthcare services in the past were identified through a retrospective study. After screening the data, 478 051 of the 889 160 resident records stored in the NRCMS database were included in the sample and processed by using MS Excel 2010. First, the outpatient and inpatient cases were input into a single Excel sheet. Second, those cases under the same patient identifier (ID number) were sorted chronologically. Third, the annual medical (inpatient and outpatient) expenditures of each patient, the number of outpatient and inpatient cases and other information were recorded. Fourth, the annual healthcare utilisation cases were input into a new database, where each case represents the annual healthcare utilisation of a resident. Finally, the residents were sorted in a descending order according to the annual medical expenditures stated in their records. Those residents who occupied the top 5% and 6%–30% of the sample were included in the HC and moderate-cost (MC) groups, respectively, while the other patients were included in the low-cost (LC) group. These three groups represent the various degrees of medical expenditure clustering.

The main programming techniques employed in this paper include Excel functions (eg, COUNTIF, SUMPRODUCT, LOOKUP and IF) and case processing technologies (eg, split columns and removal of duplicates). The outstanding diseases of each resident were marked and the original International Classification of Diseases (ICD)-10 disease codes were adjusted to broader ones (eg, the disease code for chronic obstructive pulmonary disease (COPD) was adjusted from J44.900 to J44). Township hospitals were divided into four levels according to their scale and service capacity. The distance from and arrival time to county hospitals were individually captured by using Google Maps. The exchange rate of the US$ against the RMB in 2014 was 6.1428.

The land form of towns, the healthcare capacity of township hospitals and the sociological characteristics (eg, gender and age), arrival time to county hospitals, disease categories, healthcare utilisation (eg, annual length of stay (LOS)) and annual medical expenditures of the residents were collected to build the final database.16

Statistical analysis

First, the medical expenditures of the residents were clustered by using the Gini coefficient and Lorentz curve. The Gini coefficient is a digitised representation of medical expenditure clustering. A larger Gini coefficient corresponds to a higher degree of medical expenditure clustering. Then, the characteristics of the residents in the HC, MC and LC groups were compared by conducting t-test and χ2 test in the IBM SPSS Statistics V.22.0 software. At last, the determinants of annual medical expenditure were then examined by conducting a linear logistic regression analysis.

Two key observations were obtained at this stage. First, the obtained data showed a hierarchical structure. Therefore, the determinants of annual medical expenditure were examined by conducting a multilevel linear logistic regression analysis using MLwiN V.2.30.17 Second, the annual medical expenditure showed a skewed distribution as expected. Therefore, this variable was transformed to follow a normal distribution. The patient, village and town were assigned to levels 1, 2 and 3, respectively. After accepting the Log10 (x) translation, the following regression model was obtained:

Embedded Image

where β0jk denotes the fixed-effects parameter while uoj and wojk denote the random effects at the village and town levels, respectively.

Patient and public involvement

No patients or members of the public were involved in this research.

The patient information was anonymised and deidentified before the analysis.


Medical expenditure clustering in the sample area

Table 1 presents the clustering results for the medical expenditures of Macheng residents in 2014. Among these residents, the top 5% and 20% accounted for 68.01% and 90% of the total medical expenditures of the city, respectively. Figure 2 shows the Lorentz curve of the clustering results, which have a Gini coefficient of 0.814.

Table 1

Medical expenditure clustering results for Macheng residents in 2014

Figure 2

Lorentz curve of the medical expenditure clustering for Macheng in 2014.

Table 2 shows the medical expenditure distribution of the HC, MC and LC groups. The HC group has an average annual expenditure per capita of over ¥15 000 (US$2441.6) and a minimum expenditure of ¥4985.80 (US$811.61) while the LC group has a maximum expenditure of ¥347.29 (US$56.54). Figure 3 presents the expenditure composition of these groups.

Table 2

Expenditure distribution of the HC, MC and LC groups (¥)

Figure 3

Cost composition of the three groups of rural residents in Macheng in 2014.

Characteristics of HC patients at various clustering levels

Table 3 shows the demographic characteristics of the patients. The three groups showed significant differences (p<0.001) for nine demographic items. Specifically, females accounted for 51.5% and 47.42% of the HC and LC groups, respectively. Most of the residents aged over 60 years were assigned to the HC group. The residents in the HC and LC groups generally had small (4.01) and large (4.23) family sizes, respectively. Family size, distance from county hospitals and arrival time to county hospitals all showed the same change trends. Most of the residents living near a high-capacity township hospital were assigned to the HC group (15.19%). By contrast, 30.76% of the members in the LC group were living near a low-capacity township hospital. These three groups also showed similar distributions across varying geographical and traffic conditions. Specifically, those members of the LC group who were living in poor areas (mountains and county roads) showed high accident rates (12.54% and 32.36%, respectively). In addition, the members of the HC group were highly likely to develop cancer, circulatory, digestive and urinary diseases as well as haematological disorders while the members of the LC group often developed respiratory diseases.

Table 3

Distribution of the demographic characteristics of residents in the three cost groups (n=478 051)

Table 4 shows the distribution of the healthcare utilisation of all Macheng residents in 2014. The residents in all three groups showed a decreasing trend in their average number of inpatient cases (2.36, 0.45 and 0.02 for the HC, MC and LC groups, respectively), average number of inpatient cases and annual LOS (25.69, 3.28 and 0.1). However, the opposite trend was observed in the annual LOS (0.07, 62.94 and 99.95) and average number of outpatient cases (8.28, 13.57 and 5.05) of those residents without prior hospitalisation experience in the HC, MC and LC groups. The healthcare utilisation of these residents also showed an inverted V-shaped distribution at the town and clinic levels. Specifically, the residents in the MC group had a high average number of outpatient cases at the town (3.69) and clinic (8.61) levels, but the residents in all three groups showed a decreasing trend in their average number of outpatient cases (1.76, 1.27 and 0.21 for the HC, MC and LC groups, respectively).

Table 4

Distribution of the healthcare utilisation characteristics of the three groups (n=478 051)

Determinants of the annual medical expenditures of residents

A three-level linear regression was performed where the patient, village and town were assigned to levels 1, 2 and 3, respectively. Table 5 displays the results for the explanatory variables that are used to fit the three-variance component models. Age, disease category and healthcare utilisation were identified as the major determinants of the annual medical expenditures of residents. No significant relationship was observed among gender, family size, distance to county hospitals, arrival time to county hospitals, geography, traffic condition and capacity of township hospitals. The medical expenditures of these residents increased along with age, number of outpatient cases, LOS and healthcare utilisation level while the other factors were kept constant. Those patients with cancer, circulatory, digestive, urinary, obstetric and gynaecological diseases or haematological disorders had a higher probability to incur high annual medical expenditures compared with those patients with respiratory, ENT, endocrinal, skeletal and muscular diseases.

Table 5

Three-level linear regression model analysis of annual medical expenditures (n=478 051)


Clustering of the medical expenditures of residents in rural China

The medical expenditures of rural residents showed an extremely uneven distribution. A Gini coefficient of 0.814 was recorded in 2014, which indicated that the medical expenditures of this population was clustered at the minority level. The HC group accounted for 68.01% of the total medical expenditures, which was much higher than that recorded in the USA (52.3%).5 Meanwhile, the LC group, which included 70% of the residents in the sample, accounted for 2.42% of the total medical expenditures. The annual medical expenditure per capita of the entire population was ¥1222.49 (US$199.01), which was nearly similar to that of the MC group (¥1261.36, US$205.18). However, this value was only 7.35% of the annual medical expenditure per capita of the HC group. The maximum annual medical expenditure of this population was ¥424 962.1 (US$69 180.52), which is more than four times larger than the annual reimbursements provided by NRCMS (¥100 000, US$16 279.22).

The three groups showed obvious differences in their expenditure structures. Hospitalisation expenditures accounted for over 95% of the medical expenditures of the HC group, thereby supporting the findings of Driessen et al.18 Outpatient expenditures accounted for most of the medical expenditures of the LC group while outpatient and hospitalisation expenditures equally accounted for the medical expenditures of the MC group. The medical expenditures of the HC group showed a dispersed distribution and an SD that was much higher than the mean (¥22 817.95 vs ¥16 618.78, US$3714.58 vs US$2705.41). By contrast, the medical expenditures of the MC and LC groups showed a relatively concentrated distribution.

The HC group faces a very high economic burden. In fact, those residents with medical expenditures of over ¥5000 (US$813.96) were included in this group. This group had an annual medical expenditure per capita of over ¥16 000 (US$2604.68), of which only around 50% were reimbursed by NRCMS. Therefore, HC patients had an average out-of-pocket expenditures over ¥8000 (US$1302.34), which was nearly 80% of the total consumer spending per capita of rural residents (¥10 129.8, US$1649.05).1 This ratio can easily lead to a catastrophic health spending.

Aggregating annual medical expenditures at the village and town levels

The multilevel linear regression revealed a hierarchical structure (town–village–residents) in the collected data. The annual medical expenditures of residents were clustered at the town and village levels while the annual medical expenditures of residents living in the same village and town tend to be approximated. The distribution of annual medical expenditures as shown in table 3 significantly differs from that shown in table 5 in terms of geography, traffic condition and capacity of township hospitals. Such differences were mostly observed at the town level because the residents were living in towns with the same geography, traffic conditions and capacity of township hospitals. Meanwhile, 30 towns in the sample showed differences in their social customs, geographical locations and capacity of township hospitals. These findings altogether show that those residents living in areas with favourable geographical conditions (eg, plains and national standard roads) and high-capacity hospitals tend to accumulate high medical expenditures. Similarly, the effects of distance from and arrival time to county hospitals were mostly observed at the village level. Given their convenient locations, those residents who are living near county hospitals have a higher probability to avail healthcare services in these institutions and accumulate higher medical expenditures compared with those who are living far from these hospitals.

Determinants of annual medical expenditures

Annual medical expenditures are directly determined by healthcare utilisation. Table 5 shows that a higher utilisation of healthcare services corresponds to higher medical expenditures. The village, town and county levels had regression coefficients of 0.031, 0.049 and 0.134 for outpatient cases, which were lower than the corresponding coefficients for inpatient cases. Similar to hospital utilisation, an increase in LOS corresponds to an increase in medical expenditures.

In addition, a higher age corresponds to a higher expenditure. This case is particularly true for those residents aged above 60 years, who account for 34.48% of HC group, in other words, 8.5% of the elderly population was defined as HC, while only 5% of total population was defined as HC. Such high expenditures can be attributed to their poor physical condition and high tendency to develop one or multiple diseases. A WHO report revealed that an ageing population would increase the healthcare expenditures of a country, but the extent of such increase is less than expected. Although health-related needs often increase along with age, the relationship between healthcare utilisation and health expenditure varies as a person grows older.19 In fact, the medical expenditures of people from high-income countries gradually decline after they reach the age of 75. Other studies even show that those people aged 80 years and above have a much lower share in the consumption of medical resources compared with the total population. In terms of gender, the average medical expenditures of female residents do not differ from those of male residents. The relationship between gender and medical expenditures also warrants further study.20

Those patients with respiratory, urinary, endocrinology, skeletal and muscular diseases have the same degree of medical expenditure clustering. By contrast, those patients with cancer, circulatory, digestive, obstetrical and gynaecological diseases or haematological disorders have significantly high annual medical expenditures, with patients with cancer having the highest regression coefficient of 0.759. Developing diseases that are fatal and difficult to assess can easily lead to high medical expenditures.21 These findings can be attributed to the limited ability of healthcare professionals at townships, most of whom lack specialisation in treating urinary and cardiovascular diseases or haematological disorders; therefore, those residents suffering from such diseases tend to be admitted to county hospitals or hospitals outside their respective counties.22

Redesigning the healthcare delivery system for the HC group

The intense concentration of medical expenditures reveals a great imbalance in the healthcare utilisation of Macheng city residents. Therefore, the rural healthcare delivery system as well as the current efforts in reducing the costs and promoting the cost efficiency of healthcare services should focus on the HC patients.

First, a monitoring mechanism for HC patients must be established and the NRCMS database can be used as a source of information that can facilitate the monitoring of these patients. Although HC patients are identified based on their medical expenditure or health claims, a patient can be predicted as HC high probability in advance based on several risk factors. Robst and Wodchis et al found that a patient identified as HC in a year is more than 40% likely to be identified as an HC patient in the following year given that these patients often maintain a high level of medical expenditure for the following year.14 15 Therefore, those residents with a remarkably high healthcare utilisation are exposed to many risk factors and have been identified as HC patients in the previous year warrant special attention. Moreover, HC patients need guidance when choosing healthcare services and must be given priority access to primary healthcare.

Second, enhancing the healthcare quality management of patients plays a vital role in discouraging their unnecessary utilisation of healthcare services, such as inappropriate admission to hospitals and excessive utilisation of inpatient services.23

Third, exploring new mechanisms, such as the comprehensive management of patients with chronic diseases, setting a global budget for multilevel institutions, integrated management programmes and integrated delivery of medicine and nursing services to aged residents, can motivate doctors to deliver continued care to HC patients. In doing so, healthcare professionals can avoid unnecessary service duplications, improve their healthcare efficacy and help their patients take advantage of primary healthcare services.24


From the city-level population perspective, the medical expenditures of rural residents in China have an intense clustering level. Apart from demographic characteristics (eg, age and disease), healthcare utilisation was identified as a primary determinant of medical expenditure clustering. Therefore, policy-makers must guide HC patients in choosing healthcare services and improve their healthcare quality management to discourage their unnecessary utilisation of healthcare services. Doctors must also be motivated to deliver continued care to this group of patients.


This study has two limitations. First, hospitalisation information, geographical factors, referral status and diseases were included in the regression model while other individual factors, such as economic status, education and preference, were ignored. Second, several studies show that HC residents with a high tendency to develop multiple chronic diseases tend to use multidisciplinary and multi-institutional services.25 However, this work only considered the main diseases of these patients as captured in the NRCMS database. These limitations may affect the stability of the findings and should be examined in further studies.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.


  • Contributors YZ and SL participated in the conception, design, analyses and writing of the manuscript. NY participated in the data collection and statistical analysis. LZ helped draft, review and revise the manuscript. All authors gave their approval to publish this version of the manuscript.

  • Funding This research is supported by the National Youth Natural Science Foundation of China (Grant No: 71603088).

  • Competing interests None declared.

  • Patient consent Parental/guardian consent obtained.

  • Ethics approval The study protocol conformed to the guidelines of the Ethics Committee of the Tongji Medical College of Huazhong University of Science and Technology and was registered in the Chinese Clinical Trial Registry (ChiCTR-OOR-14005563).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The anonymised dataset can be requested by sending an email to the corresponding author.