Article Text

Download PDFPDF

ICF Generic Set as new standard for the system wide assessment of functioning in China: a multicentre prospective study on metric properties and responsiveness applying item response theory
  1. Cristina Ehrmann1,2,3,
  2. Birgit Prodinger1,2,3,
  3. Gerold Stucki1,2,3,
  4. Wenzhi Cai4,
  5. Xia Zhang5,
  6. Shan Liu4,
  7. Shouguo Liu6,
  8. Jianan Li6,
  9. Jan D Reinhardt1,2,7
  1. 1 Swiss Paraplegic Research, Nottwil, Switzerland
  2. 2 Department of Health Sciences and Health Policy, University of Lucerne, Luzern, Switzerland
  3. 3 ICF Research Branch, a cooperation partner within the WHO Collaborating Centre for the Family of International Classifications in Germany (at DIMDI), Nottwil, Switzerland
  4. 4 Shenzhen Hospital of the Southern Medical University, Guangzhou, China
  5. 5 Department of Rehabilitation Medicine, Third Affiliated Hospital of Peking University, Beijing, China
  6. 6 Department of Rehabilitation Medicine, First Affiliated Hospital of Nanjing Medical University, Nanjing, China
  7. 7 Department of Health Sciences, Institute for Disaster Management and Reconstruction, Sichuan University and Hong Kong Polytechnic University, Chengdu, China
  1. Correspondence to Dr Jan D Reinhardt; reinhardt{at}scu.edu.cn

Abstract

Objectives To examine metric properties and responsiveness of the International Classification of Functioning, Disability and Health (ICF) Generic Set when used in routine clinical practice to assess functioning.

Design Prospective multicentre study.

Setting 50 hospitals from 20 provinces of Mainland China.

Participants 4510 adult inpatients admitted to the departments of Pulmonology, Cardiology, Neurology, Orthopaedics, Cerebral Surgery or Rehabilitation Medicine.

Main outcome measures The ICF Generic Set (ICF Generic 6 Set) applied with an 11-point numeric rating scale (0-no problem to 10-complete problem) was fit to the Partial Credit Model (PCM) to create an interval score of functioning.

Results PCM assumptions were found to be fulfilled after accounting for Differential Item Functioning. With an average improvement by 7.86 points of the metric ICF Generic 6 score (95% CI 7.53 to 8.19), the ICF Generic 6 Set proved sensitive to change (Cohen’s f2=0.41). Ceiling and floor effects on detecting change in functioning were cancelled or reduced by using the metric score.

Conclusion The ICF Generic 6 Set can be used for the assessment of functioning in routine clinical practice and an interval score can be derived which is sensitive to change.

  • functional status
  • sensitivity to change
  • psychometrics
  • rasch analysis
  • ICF

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

View Full Text

Statistics from Altmetric.com

Strengths and limitations of this study

  • This study introduces a new International Classification of Functioning, Disability and Health (ICF)-based standard for collecting reliable information on patients’ functioning in routine clinical practice in hospitals across China.

  • The metric ICF Generic 6 Set score derived in this study can be used to compare functioning across health conditions, clinical departments, hospitals and over time.

  • The non-random selection of hospitals in this study may, however, limit generalisability of the results.

Introduction 

For an optimal planning of treatments and documenting outcomes of interventions, diagnostic information should be complemented by information on functioning, that is physiological and mental functions, activities of daily living and participation in society.1 2 As indicated by previous research, diagnosis alone cannot sufficiently predict relevant health outcomes such as hospitalisation,3 length of stay,4 social integration,5 and mortality.6 7 Information on functioning, in turn, has been demonstrated to be of added value in predicting those outcomes.6–9 Moreover, interval scales of functioning can be used to quantify the impact of interventions within and across patient populations.10

The International Classification of Functioning, Disability and Health (ICF) is the reference for a systematic documentation of meaningful domains of functioning such as memory, pain, walking, self-care and social interactions, which are units of classification and called categories.1 The list of more than 1400 ICF categories is organised into two parts, each with two components: Functioning and Disability with the components Body functions and structures (b and s) and Activities and participation(d) and Contextual factors with the components Environmental factors (e) and Personal factors. Personal factors have not yet been classified. The ICF categories are designated by the letters b, s, d and e, followed by a numeric code starting with the chapter number (first level, one digit), followed by the second level (two digits) and the third and fourth levels (one digit each). A detailed description of functioning using the ICF usually involves the selection of second-level, third-level or fourth-level categories. In order to facilitate the assessment of functioning in clinical, research and other health-related settings, ICF Core Sets11 and the ICF Generic Set have been developed.12 While an ICF Core Set is a subset of the ICF codes for describing patient functioning in populations with a specific health condition or in a specific setting, the ICF Generic Set defines a minimum set of information on functioning that should be collected across health conditions and clinical settings as well as in the community.12 The ICF categories of the Generic Set are: from the component Body Functions: (1) energy and drive functions (b130), (2) emotional functions (b152), (3) sensation of pain (b280) and from the component Activities and Participation: (4) carrying out daily routine (d230), (5) walking (d450), (6) moving around (d455) and (7) remunerative employment (d850). These ICF categories address four out of the eight World Health Survey domains of functioning. They were shown to be sufficiently explanatory for self-perceived health in general and clinical populations. The selection of these categories was based on a psychometric approach using data from three sources: (1) the German National Health Interview and Examination Survey 1998, (2) the United States National Health and Nutrition Examination Survey 2007/2008 and (3) the ICF Core Set studies.12 The ICF Generic Set is aimed as a response to the challenge of creating a common metric of functioning to ensure comparability of data across studies and populations.12 Data corresponding to this minimum set of functioning information can be generated with two different approaches: (1) mapping existing assessment tools of functioning to the ICF13 14 and identifying items operationalising the functioning domains of the ICF Generic Set or (2) using the categories of the ICF Generic Set in combination with a rating scale as items.15 Regarding the first approach, Oberhauser et al showed that a psychometrically sound metric can be developed for tracking and comparing functioning in people living in private households in England.16 The second approach was first tested within a Chinese initiative to build a national ICF-based data system for evaluating and monitoring health systems performance.14 15 In a pilot study, the seven categories of the ICF Generic Set were used in routine clinical practice to collect functioning information by rehabilitation professionals, using the generic ICF qualifier (a five point ordinal scale: 0 no problem, 1 mild problem, 2 moderate problem, 3 severe problem, 4 complete problem) as a rating scale. Reinhardt et al demonstrated the feasibility of the use of the ICF Generic Set in clinical practice with about 6 min assessment time on average and the possibility to aggregate information across categories of the ICF Generic Set into a functioning score that was sensitive to change during inpatient rehabilitation treatment.17 However, results from above study and feedback from clinical raters also revealed several limitations that future studies would need to address: (1) clinical raters reported difficulties in applying the generic ICF qualifier scale, in particular with regard to differentiation between scale points, (2) the descriptions of the ICF categories were not always consistently understood, (3) remunerative employment had to be removed from the scale as clinicians found themselves unable to appraise this category in the inpatient setting and (4) the study was confined to the rehabilitation setting.17

To address the above issues, a large multicentre study was conducted as a follow-up. (1) Instead of the generic ICF qualifier scale, where each qualifier is defined, a numeric rating scale from 0 (no problem) to 10 (complete problem), where only the extremes are defined, was used and (2) clinically meaningful descriptions of ICF categories developed in a consensus conference were employed.14 (3) Information about remunerative employment (d850) was collected but not included in creating the sum score of functioning in the inpatient setting. (4) The study was conducted across various clinical departments.

The objective of this paper was to examine the psychometric properties of the ICF Generic Set when used in routine clinical practice to assess functioning. The specific aims were (1) to identify whether it is possible to aggregate information across categories contained in the ICF Generic 6 Set and assessed on a 11-point numeric rating scale into a metric functioning score, (2) to examine the ICF Generic 6 Set’s sensitivity to change and (3) to investigate ceiling and floor effects affecting the detection of the change.

Methods

Study design and setting

This was a prospective multicentre study conducted from 5 November 2014 to 28 February 2015. Patients admitted to the departments of Pulmonology, Cardiology, Neurology, Orthopaedics, Cerebral Surgery or Rehabilitation Medicine were included in this study. Inclusion criteria were: (1) adults aged 18 years and older; (2) with definite medical diagnosis and (3) with complete data at admission and study endpoint (discharge, death, transfer or end of study period). Participating centres comprised Grade II and grade III hospitals from 20 provinces of Mainland China. Grade II and grade III refer to size and available resources of the hospitals, with grade III being Province level hospitals meeting highest medical standards and Grade II being smaller but still well-equipped City level hospitals. The study was presented at the annual conference of the Chinese Nursing Association in Guangzhou and partners from participating hospitals were recruited there as well as through personal networks of the authors. The study protocol was available to the participating hospitals.

The study was performed according to the principles of the Helsinki Declaration and informed written or verbal (in case of illiteracy) consent was obtained from all study participants. We received ethical approval for the analysis and publication of the data for research purposes from of Shenzhen Southern Medical University, Guangzhou, China where the study centre was located and the data was hosted on 20 September 2017 (No. NYSZYYEC20170013).

Study population

Patients with different health conditions admitted to the participating hospitals and departments within above specified timeframe were recruited for this study. Based on their International Classification of Diseases (ICD)-10 diagnosis at admission, patients were assigned to six different health condition groups: (1) musculoskeletal health condition group including patients with limb dysfunctions or bone and joint diseases, (2) neurological health condition group including patients with stroke, traumatic brain injury or cerebral apoplexy, (3) cancer health condition group, for example, patients with lung cancer or bone tumours, (4) cardiovascular health condition group, for example, patients with hypertension or coronary heart disease, (5) respiratory health condition group, for example, patients with pneumonia or bronchiectasis disease and (6) group comprising other health conditions that could not be classified into one of the above.

Measures and procedures

Six out of seven categories of the ICF Generic Set (excluding d850-remunerative employment) were used by clinical nurses to assess patients functioning on an 11-point numeric rating scale (0 (no problem) to 10 (complete problem)) at admission and discharge or study endpoint. Each ICF category was accompanied by a simple, clinical intuitive description.14 For example, d230—Carrying out daily routine refers to  ‘actions of planning, managing and completing activities of daily living’ as opposed to the original ICF description ‘Carrying out simple or complex and coordinated actions in order to, plan, manage and complete the requirements of day-to-day procedures or duties, such as budgeting time and making plans for separate activities during the day’. The patients were not involved in rating their functioning. In rating each category, the assessors considered all previous data routinely collected in the hospital department in question: information from anamnesis, clinical examinations, single item scales like visual analogue scale for pain or standardised assessment tools such as the Barthel Index. Nurses received formal training on how to assess functioning with the ICF Generic Set by the authors (JR, XZ, WC, SL). The functioning of each patient was assessed by the same trained nurse at the admission and the study endpoint. Mean time between assessments was about 13.5 days (SD: 9.1, Minimum: 1, Maximum: 70). Mean assessment time was about 9.1 min at admission (SD: 5.3, Minimum: 1, Maximum: 36) and 7.1 min at discharge or study endpoint (SD: 4.2; Minimum: 1, Maximum: 30). Demographic (gender, age) and diagnostic data (ICD-10) were extracted from hospitals’ patient journals by the authors (JR, XZ, WC, SL).

Patient and public involvement

As clinicians were to rate patients in ICF categories based on available routinely collected information, patients were not involved in the design of the study or recruitment procedures and conduct. They were, however, informed about the purpose of the study and informed consent was obtained. After academic publication, patients and the public will be informed about the results in patient magazines and through social media.

Statistical analysis

Descriptive statistics

Descriptive statistics were used to describe the study population and response distributions. The marginal homogeneity test was used to test change in response patterns for each ICF category between admission and study endpoint.

Rasch analysis

A one-parameter item response model, also known as Rasch model, was used to test if a valid interval score of functioning could be derived by aggregating responses across ICF Generic Set items, that is, ICF-categories combined with the 11-point numeric rating scale.18 19 Although the Rasch model requires more assumptions than non-parametric item response models for measuring persons and items, it offers high stability of model parameter estimates and person ability estimation.20 The RUMM2030 package was used for carrying out the Rasch analysis.21 Based on item parameters estimated using a pairwise conditional method, RUMM2030 calculates person parameters using weighted maximum likelihood. In addition, item thresholds, that is, equal probability points between two adjacent response options, were estimated for each item. Thresholds should be ordered to be interpretable since they are supposed to reflect an increase on the functioning trait.

The three assumptions of the Rasch model, that is, local dependency, unidimensionality and invariance, were iteratively tested. First, the unidimensionality assumption of items being homogeneous in the sense of measuring a single latent trait of functioning was tested using the principal component analysis (PCA) method proposed by Smith.22 For each patient, two separate abilities were estimated from the Rasch calibration of the set of items with positive loadings and the Rasch calibration of the set of items with negative loadings on the first residual component from the PCA followed by pairwise t-tests. The number of significant t-tests should be below 5% to indicate unidimensionality. Second, the local independence assumption implying no relations between pairs of items and unbiased parameter estimates was tested. The presence of local dependence among items after accounting for the trait (residual correlations of items) may be an indicator for additional dimensions, which again would violate the unidimensionality assumption.23 To this end, Yen’s Q3 statistic representing correlations between item residuals of the Rasch analysis were used.24 The critical value for Yen’s Q3 statistic, Q3*, that is, the difference between Q3 and the average correlation, was calculated based on the parametric bootstrapping procedure implemented by Christensen et al.25 Testlets, that is, super items combining individual items, were created for locally dependent items to absorb local dependency and improve model fit.26 The iterative process in the testlet design is the same as in single item design, except that under the testlet design the thresholds ordering is not expected.27 Third, to assess Differential Item Functioning (DIF) across age groups (above or below the age mean, ie, 58 years), gender, health conditions groups (musculoskeletal, neurological, cancer, cardiovascular, respiratory and others) and time of assessment (admission vs study endpoint), analysis of variance (ANOVA) tests based on an overall significance level of 0.05 with Bonferroni correction for the number of items were carried out.28 A significant main effect of the respective group variable indicates that subgroups respond in a systematically different way (indicated by parallel item characteristic curves). Items demonstrating DIF were split into specific questions for each of the levels in the groups showing DIF.

The Partial Credit Model (PCM) was chosen after a likelihood ratio test was performed with the output of the initial analysis to identify which version of the polytomous Rasch model (Rating Scale or Partial-Credit) was appropriate.29 30 While the item fit was examined with individual item χ2 probability values, the overall fit of the data to the Rasch model was checked based on the global χ2 of the items.30 31

The targeting of the functioning scale with regard to the sample was studied by comparing the distribution of person and item locations.32 Reliability was studied with the person separation index (PSI) from the Rasch analysis with an adequate expectation of 0.70 or above at the group level.31 33

Two stratified random samples, called development sample and validation sample, were selected across admission and study endpoint so that each person was represented only once while ensuring equal representation of the two time points. Health condition (six groups), age (younger than 58 years vs 58 years and older) and gender were used as criterion for stratification when selecting patients for each random sample, with equal representation of each subgroup. The subgroup of females older than 58 years suffering from cancer had the smallest number of 32 patients and thus defined the size of the other subgroups. Stratified random samples for development and validation thus comprised 768 patients (32*6*2*2) each.

After obtaining the final logit score from the Rasch model, a user-friendly scale from 0 to 100 and a transformation table were created allowing deriving scores for the overall sample at both admission and discharge or study endpoint.

Sensitivity to change

We assumed that on average, patients’ functioning scores should improve during clinical treatment from admission to study endpoint. As we had to account for repeated measurements as well as for clustering of patients in evaluators in hospitals, we used mixed effects regression with maximum likelihood method to estimate 95% CIs for the average change in patients between admission and study endpoint. We compared all nested models with each other using likelihood ratio tests. The fully nested model employing random intercepts for patients, evaluators and hospitals showed superior fit. Cohen’s f2 was used as a measure for standardised effect size with values above 0.15 considered moderate and those above 0.35 considered large.17 34 We, moreover, conducted stratified analysis by health condition group.

Effect of ceiling and floor effects at baseline on detection of change

We used boxplots for studying whether ceiling and floor effects may prevent the detection of change for patients corresponding to each of the baseline quintiles of the ICF Generic Set raw score in comparison with the interval ICF Generic 6 Set.35 In addition, for both the ICF Generic 6 Set raw score and the interval ICF Generic 6 Set, an F-test (based on ANOVA) followed by Tukey’s Honest Significance Difference (HSD) posthoc tests, when significant, were used for determining if average change in functioning differed significantly across groups of patients corresponding to different quintiles of the ICF Generic 6 Set baseline score.36 For testing how much the transformation of raw scores into interval scores was linked with the presence of ceiling and floor effect on detecting change, eta-squared measures, η², were calculated (as an indicator of the association between the total variability in the change of functioning and patients corresponding to different baseline quantiles of the ICF Generic 6 Set raw score).37

Results

Baseline characteristics of study participants

Table 1 shows descriptive characteristics of the 4510 adults patients considered in this analysis after excluding children, 308 adults with no defined medical diagnosis and 58 adults with incomplete data at admission and study endpoint. From the 4510 adult patients, more than half were male and the mean age was about 58 years. While musculoskeletal and neurological conditions were the most common diagnoses, cancer was the least common. A total of 915 patients underwent surgery during inpatient treatment (510 from the musculoskeletal, 54 from the cancer, 217 from the cardiovascular, 13 from the respiratory, 83 from neurological health condition group and 38 from the group comprising other health conditions).

Table 1

Descriptive information on sample demographics, diagnostic groups, departments and provinces at admission (n=4510)

The sample of patients for each region was determined by the number and grade of hospitals. Table 2 shows the distribution of response options, mean and median for individual categories of the ICF Generic 6 Set at admission and study endpoint.

Table 2

Distribution of response options from 0 (no problem) to 10 (complete problem) and average and median item scores at admission and discharge

Rasch analysis

For both development sample (Sample A) and validation sample (Sample B), Body Functions items loaded positively on the first residual component from the PCA, while the Activities and Participation items loaded negatively. The ICF Generic 6 Set showed unidimensionality in both samples, as less than 5% of pairwise t-tests were statistically significant (Sample A: 4.23%, (2.75–5.72); Sample B: 3.61% (2.24–4.97)). The local independence assumption was not met in any of the samples. According to the critical value of 0.12 for Yen’s Q3*, the following items showed local dependence in both samples A and B: Energy and drive functions (b130) and Emotional functions (b152), Emotional functions (b152) and Sensation of pain (b280), Carrying out daily routine (d230) and Walking (d450), Walking (d450) and Moving around (d455). Moreover, for both samples A and B, all items showed DIF for health condition group and the item Sensation of pain (b280) for time of assessment. None of the items showed DIF by gender or age group. For both samples, two testlets (Body Functions testlet: Energy and drive functions (b130), Emotional functions (b152) and Sensation of pain (b280) and Activities and Participation testlet: Carrying out daily routine (d230), Walking (d450) and Moving around (d455)) were created. After examining the testlet design for DIF, the Body Functions testlet showed DIF for time of assessment and the Activities and Participation testlet for health condition group. For the Activities and Participation testlet, the item characteristic curves of musculoskeletal and neurological disorders group were parallel with the item characteristic curves of respiratory and cardiovascular disorders group and of cancer and other disorders group. Therefore, the DIF for the health conditions groups was accommodated by splitting this testlet in three specific items for the musculoskeletal and neurological disorders group, the respiratory and cardiovascular disorders group and the cancer and other disorders group. Splitting the Activities and Participation testlet had a good effect on the items fit, but the DIF for the Body Functions testlet for time of assessment was still present. We, however, did not adjust this testlet for time of assessment DIF since this DIF was found inconsistent due to the small differences of this testlet mean locations between the time of assessment for all class intervals (below 0.5 logits). Item locations and fit statistics and the targeting of the scale are shown in table 3. According to the PSI values, the reliability of the scale was just below 0.7 for both samples A and B (table 3).

Table 3

Individual item locations and fit statistics, including targeting, unidimensionality, reliability, local dependency and DIF for both samples A and B for final solution

Figure 1 illustrates the targeting of patients included in sample A and sample B as well as of the overall sample of 4510 patients at both admission and study endpoint. The functioning abilities for the overall sample were estimated using the item difficulties from the validation sample. When comparing the distribution of item thresholds with the persons’ abilities, ICF Generic 6 Set items did not discriminate well between persons with a very low level of difficulties.

Figure 1

Histogram of people’s functioning (grey columns) and item thresholds (small vertical black lines) for both samples A and B and for overall sample at admission and study endpoint.

After fit to the Rasch model was achieved for the ICF Generic 6 Set, logit-scores were transformed into a user-friendly scale ranging from 0 (no problem) to 100 (complete problem) and a transformation table for total raw scores into an interval scale for use in parametric analyses was created (table 4).

Table 4

Transformation of the ICF Generic 6 Set score raw scores into interval ICF Generic 6 Set score ranging from 0 (no problem) to 100 (complete problem) by health condition groups

Sensitivity to change

Patients from all health condition groups apart from cancer showed improvement from admission to study endpoint (figure 2). Across all health conditions, patients improved by 7.86 points of the Rasch transformed overall score (95% CI 7.53 to 8.19). Effect size in terms of Cohen’s f2 was 0.41 (large). Average improvement for the musculoskeletal and neurological health condition group was 6.75 (95% CI 5.89 to 7.61) with a Cohen’s f2 of 0.37 (large) and 10.88 for cardiovascular and pulmonary diagnoses (95% CI 9.15 to 12.62) with a Cohen’s f2 of 0.63 (large). With 4.19 points (95% CI 2.51 to 5.88), the cancer and other health conditions group showed the smallest improvement over time which was also reflected in the lowest standardised effect size of 0.12 (low).

Figure 2

Distribution of interval-scale ICF Generic 6 Set scores (0–100 scale) at admission and study endpoint for each health condition group. The central rectangle spans the first quartile to the third quartile, with the central segment showing the median and the whiskers above and below the box extend until each reaches no more than 1.5 times the height of the box (third quartile–first quartile). The points above and below the end of whiskers are the outliers. The white points indicate the mean.

Effect of ceiling and floor effects at baseline on detection of change

Both the box-plots and non-significant Tukey’s HSD test for the interval ICF Generic 6 Set score showed that a floor effect present in detecting change when using the ICF Generic 6 Set raw score was cancelled when using the interval score for patients with musculoskeletal and neurological health conditions (figure 3) as well as respiratory and cardiovascular health conditions. For patients with cancer and other health conditions, a floor effect was present in both raw and interval scales (figure 3 and significant differences between patients corresponding to the first baseline quintile of the ICF Generic 6 Set raw score and patients corresponding to all other baseline quintile groups). For all health conditions groups, there was a significant difference between patients corresponding to the fifth baseline quintile of the ICF Generic 6 Set raw score and patients corresponding to the other baseline quintile groups, with a larger decrease of scores for the fifth quintile. However, the transformation of raw scores into the interval score reduced this ceiling effect as indicated by the respective eta-squared measures (musculoskeletal and neurological health conditions: η²—raw scores=0.20, η²—interval scores=0.08; respiratory and cardiovascular health conditions: η²—raw scores=0.40, η²—interval scores=0.02; cancer and other health conditions: η²—raw scores=0.23, η²—interval scores=0.04).

Figure 3

Distribution of ICF Generic 6 Set raw scores (0–60 scale; upper part of the figure) and interval-scale ICF Generic 6 Set scores (0–100 scale; lower part of the figure) stratified at baseline quintiles (Q1=the best initial functioning, Q5=the worst initial functioning) for each health condition group. The central rectangle spans the first quartile to the third quartile, with the central segment showing the median and the whiskers above and below the box extend until each reaches no more than 1.5 times the height of the box (third quartile–first quartile). The points above and below the end of whiskers are the outliers. The white points indicate the mean.

Discussion

This nationwide validation study demonstrated that the ICF Generic 6 Set in combination with an 11-point numeric rating scale can be used for the assessment of functioning in routine clinical practice and across a variety of hospital departments and health conditions. After accounting for local dependence of items by creating a body function and a activity and participation testlet and DIF across health condition groups unidimensional interval scores for three different health condition groups could be established for the ICF Generic 6 Set. The interval ICF Generic 6 Set score was sensitive to change with large standardised effect sizes (with the exception of the cancer and other health conditions group). We could also show that ceiling and floor effects in the detection of change were reduced or cancelled when transforming the raw scores into Rasch-based interval scores.

In the application of the PCM, the unidimensionality, local dependency and DIF assumptions were tested. Our results confirmed previous findings from our pilot study.17 Irrespective of the type of rating scale used for the ICF Generic 6 Set, local dependency among Body Functions items and Activities and Participation items was present. The dependent sets of items identified based on the critical value for the Yen’s Q3* statistic are also content-dependent items when following ICF. Thus, the fitting statistics from the PCM were better than those of the individual items.38 In both studies, DIF for health conditions groups was found for all items of the ICF Generic 6 Set. After items were combined in two testlets, the DIF at the level of body functioning testlet disappeared. This could be explained by the heterogeneity of functioning of individual items nested within the testlet. Moreover, due to the existence of the testlet effect, the DIF for time of assessment amplified for this testlet.39 The DIF between health condition groups reflects the complexity of each health condition. This did not cause a major problem as we could statistically adjust for it and accounted for DIF by providing different transformation tables for three health condition groups.

With respect to the distribution of persons and items along the continuum of functioning of the ICF Generic 6 Set, the items did not completely match the expected patients’ abilities at the lower end of the continuum. This finding is also reflected by the PSI value for both samples A and B. Although we used a heterogeneous sample in this study, the reliability was slightly better than in the pilot study.17 Further research is needed as to whether this result is due to the use of the 11-point numeric rating scale in contrast to the ICF qualifiers.

While in the pilot study we split the Body Functions testlet for time of assessment group, in this study, we ignored this issue since the testlet showed good fit. In contrast to the pilot study, Sensation of pain (b280) fitted the Rasch model better. The DIF for time of assessment groups could be neglected as the testlet that included this item showed an overall good fit. However, Sensation of pain was clinician administered, and further research is necessary, also in other countries, to clarify its fit to the metric of functioning.17 The patient’s self-reported pain may be used since pain is subjective. While listed as a body function in WHO’s ICF and ICD-11, it may be debated if pain is a function or a symptom.40 Furthermore, there is a difference between the actual sensation of pain and cases where this sensation is impaired, that is, patients are not able to feel pain in certain body parts in spite of tissue damage. Moreover, there are the issues of neuropathic pain and phantom pain. Future research is needed on how this category is actually understood by the raters as well as patients in different situations.

Remunerative employment (d850) was not included in the Rasch model since the category was difficult to assess in the clinical setting. However, since previous research claimed that this ICF category is relevant to community follow-up, Remunerative employment (d850) should be assessed and reported alongside the interval ICF Generic 6 Set score.41

As in our pilot study, the interval ICF Generic 6 Set score was sensitive to improvement of patients’ functioning during inpatient treatment. With regard to the new COSMIN guidelines for testing sensitivity to change (responsiveness) of a measure, we could not assess proportions of correlation between the change in the interval ICF Generic 6 Set score and change in another functioning measure.42 In line with the results of the pilot study, we would expect moderate to large treatment effects. In contrast to other health conditions groups, the standardised effect size was, however, small for cancer and other unclassified diagnoses. This may be owed to progression of disease counteracting treatment effects. In addition, the heterogeneity of this group may have posed an issue. Furthermore, beyond standardised effect sizes, minimal clinically important differences in scores need to be determined in future research. In contrast to untransformed raw scores of the ICF Generic 6 Set score, the Rasch-transformed interval score could largely reduce or cancel ceiling or floor effects in identifying change for patients with very low or very high baseline scores. This finding is indeed promising in that it shows a wide applicability of the interval-scale ICF Generic 6 Set across the patient spectrum.

We note several limitations to our study. First, most of the patients had neurological and musculoskeletal conditions, which may limit the generalisability to other diagnostic groups such as cancer. We, however, accounted for this in determining samples for the Rasch analysis by equal representation of all health conditions, genders and age groups. Second, patients were from grade III and II hospitals that were recruited at a nursing conference or through authors’ personal networks. We thus cannot exclude selection bias further limiting generalisability of our study. It should, however, be noted that there are usually no more than 2–3 grade III hospitals per province so that our sample should be at least fairly representative for the 20 included Provinces. Third, the assessment of functioning at admission and discharge by the same nurse might affect applicability of our results to clinical practice. We, however, collected additional data on 703 patients with functioning rated by two independent nurses at each time point for investigating interrater reliability. A follow-up study will report the results of the respective analysis. Fourth, although most floor effects in the detection of change were no longer present after transformation of the ordinal raw score to an interval score based on Rasch-abilities, reduced floor effects remained for the cancer and other health conditions groups. Clinical studies using the ICF Generic 6 Set as an outcome measure could deal with this problem by employing Tobit models.43 Fifth, although it is possible to assess how a person manages daily routine along a wide range of activities of daily living (ADL) in hospital environments, these environments differ from those which patients face when discharged to their homes and communities. Performance in managing daily routine (d230), but also walking (d450) or moving around (d455) once discharged to their homes and communities can thus only be inferred from what patients are able to do in the hospital. How good this inference is must be evaluated in future studies examining the use of the ICF Generic 6 in community follow-up.

This study marks the first attempt to apply the ICF nationwide and generate a reliable, interval score of functioning. Further research studying the interrater reliability, convergent validity, known group validity and predictive validity of the ICF Generic 6 Set is underway. In line with the actual efforts to validate the ICF Core Sets across the six WHO regions, similar attempts are needed and have in fact been initiated in other countries to apply the ICF Generic Set. For instance, under the leadership of the European Union of Medical Specialists, Board of Physical and Rehabilitation Medicine an initiative was launched towards developing ICF-based clinical data collection tools following the approach described in this study.44 45 Collaborations across countries will allow developing a universal scoring algorithm of functioning which will ultimately allow comparison of functioning outcomes across health conditions and clinics as shown in this study and across countries.

Conclusion

In conclusion, the ICF Generic 6 Set in combination with an 11-point numeric rating scale can be used for creating an interval score of functioning that is sensitive to change in clinical practice and across a wide range of health conditions. We recommend the use of the ICF Generic 6 Set on a 11-point numeric rating scale in clinical practice and research within Mainland China. However, the reliability of the ICF Generic 6 Set in terms of PSI was only moderate. Our finding also revealed that some items, for example, Sensation of pain (b280), require specific attention. Based on the evidence gained in this study, future studies are needed to test the ICF Generic Set as a standard for the reporting of functioning information in different healthcare systems and countries.

Acknowledgments

The authors of this study would like to acknowledge the support of the National Health and Family Planning Commission of the People’s Republic of China, the Chinese Association of Rehabilitation Medicine and the Rehabilitation Nursing Committee of the Guangdong Nursing Association. We also express our deep gratitude to all of participating hospitals and clinical raters. The participating hospitals were: Affiliated Hospital of Guangdong Medical College, Guangzhou Panyu Central Hospital, Zhujiang Hospital, First Affiliated Hospital of Guangzhou Traditional Chinese Medical University, Guangzhou Huiai Hospital (Guangzhou Brain Hospital), People’s Hospital of Baoan Shenzhen, People’s Hospital of New District Longhua Shenzhen, Shenzhen Baoan Hospital of Traditional Chinese Medicine, Shantou Central Hospital, Haojiang Hospital (First Affiliated Hospital of Shantou Medical College), Beijiao Hospital of Southern Medical University, People’s Hospital of Nanhai District of Foshan, Guangdong Tongjiang Hospital, Third People’s Hospital of Huizhou, Zhaoqing Gaoyao People’s Hospital, Dongguan Kanghua Hospital, Fuzhou General Hospital of Nanjing Military, West China Hospital of Sichuan University, Rehabilitation Hospital of Sichuan Province, Nanchong Central Hospital, Sichuan Provincial People’s Hospital, First Affiliated Hospital of Shanxi Medical University, First Affiliated Hospital of the Fourth Military Medical University, Jiangsu Provincial Hospital, Jiangsu Provincial Hospital of Traditional Chinese Medicine, Qingdao Municipal Hospital, Second Affiliated Hospital of Shandong University of Traditional Chinese Medicine, The Affiliated Hospital of Qingdao University, Hai Nan General Hospital, Affiliated Hospital of Hainan Medical University, People’s Hospital of Fuyang, First people’s Hospital of Hefei City (Third Affiliated Hospital of Anhui Medical University), Second Affiliated Hospital of Wenzhou Medical College, General Hospital of Ningxia Medical University, Cardiovascular Disease Hospital of Ningxia Medical University, Jilin University Sino-Japanese Friendship Hospital, People’s Hospital of Xinjiang Uygur Autonomous Region, First Affiliated Hospital of Chongqing Medical University, Second Affiliated Hospital of Chongqing Medical University, Guizhou Provincial People’s Hospital, Affiliated Hospital of Guiyang Medical College, People’s Hospital of Xinyu, Rehabilitation Hospital of Heilongjiang Provincial Seafarers General Hospital, First Affiliated Hospital of Jiamusi University, Third Affiliated Hospital of Jiamusi University, Second Affiliated Hospital of Kunming Medical University, General Hospital of the Yangtze River Shipping (Wuhan Brain Hospital), Affiliated Hospital of Qinghai University, Elderly Hospital of Shanghai Jingan, Shanghai Ledu Hospital.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
View Abstract

Footnotes

  • Contributors CE contributed to the conception and design of the study, performed and interpreted all the analyses of the data and drafted the article. JDR, GS and BP contributed to conception of the design of the study and interpretation of the data and provided supervision. WC, XZ, SL, SL and JL contributed to the acquisition of the data, conception and the design of the study. All authors contributed to critically revising the paper for important intellectual content and all authors read and approved the final article.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethics approval This study has been approved by the Chinese Association of Rehabilitation Medicine and exempt from hospital ethics approval since it involved non-invasive clinician based assessment of patients based on routinely collected clinical data.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement The materials and datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.