Article Text


Study on the development of an infectious disease-specific health literacy scale in the Chinese population
  1. Xiangyang Tian1,
  2. Zeqing Di2,
  3. Yulan Cheng1,
  4. Xuefeng Ren1,
  5. Yan Chai1,
  6. Fan Ding3,
  7. Jibin Chen2,
  8. Jodi L Southerland4,
  9. Zengwei Cui2,
  10. Xiuqiong Hu1,
  11. Jingdong Xu5,
  12. Shuiyang Xu6,
  13. Guohong Qian7,
  14. Liang Wang8
  1. 1Chinese Center for Health Education, Beijing, China
  2. 2Chinese Association of Preventive Medicine, Beijing, China
  3. 3Office for Public Health Hazard Response Public Health Emergency Center, Chinese Center for Disease Control and Prevention (CDC), Beijing, China
  4. 4Department of Community and Behavioral Health, College of Public Health, East Tennessee State University, Johnson City, Tennessee, USA
  5. 5Health Education Institute, Hubei Provincial CDC, Wuhan City, Hubei Province, China
  6. 6Health Education Institute, Zhejiang Provincial CDC, Hangzhou, Zhejiang Province, China
  7. 7Gansu Provincial Center for Health Education, Lanzhou, Gansu Province, China
  8. 8Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, Tennessee, USA
  1. Correspondence to Dr Liang Wang; WANGL2{at}


Objectives To develop a scale to assess infectious disease-specific health literacy (IDSHL) in China and test its initial psychometric properties.

Methods Item pooling, reduction and assessment of psychometric properties were conducted. The scale was divided into 2 subscales; subscale 1 assessed an individual's skills to prevent/treat infectious diseases and subscale 2 assessed cognitive ability. In 2014, 9000 people aged 15–69 years were randomly sampled from 3 provinces and asked to complete the IDSHL questionnaire. Cronbach's α was calculated to assess reliability. Exploratory factor analysis, t-test, correlations, receiver operating characteristic (ROC) curve and logistic regression were used to examine validity.

Results Each of the 22 items in subscale 1 had a content validity index >0.8. In total, 8858 people completed the scale. The principal components factor analysis suggested a 5-factor solution. All factor loadings were >0.40 (p<0.05). The IDSHL score was 22.07±7.91 (mean±SD; total score=38.62). Significant differences were observed across age (r=−0.276), sex (males: 21.65±8.03; females: 22.47±7.78), education (14.16±8.19 to 26.55±6.26), 2-week morbidity (present: 20.62±8.17, absent: 22.35±7.83; p<0.001) and health literacy of the highest and lowest 27% score groups (all p<0.05). The ROC curve indicated that 76.2% of respondents were adequate in IDSHL. Binary logistic regression analysis revealed 12 predictors of IDSHL adequacy (p<0.05). Among the 22 remaining items, Corrected Item-Total Correlation ranged from 0.316 to 0.504 and Cronbach's α values ranged from 0.754 to 0.810 if the items were deleted. The overall α value was 0.839 and the difficulty coefficient ranged from 1.19 to 4.08. For subscale 2, there were statistically significant differences between the mean scores of those with a correct/incorrect answer (all p<0.001).

Conclusions The newly developed 28-item scale provides an efficient, psychometrically sound and user-friendly measure of IDSHL in the Chinese population.

Statistics from

Strengths and limitations of this study

  • This study filled a gap in the literature by developing an infectious disease-specific health literacy (IDSHL) scale in China.

  • This study had a sufficient sample size to test and validate the scale, and to detect statistically significant differences in health literacy across sociodemographic categories.

  • The newly developed scale provides an efficient, psychometrically sound and user-friendly measure of IDSHL.

  • Data collected retrospectively might result in recall bias, whereas survey administration techniques may result in information bias.

  • Unique environmental conditions present during survey administration may have influenced survey responses and outcomes.


Health literacy is conceptualised as a mechanism through which individuals can exert control over their health and those factors associated with health outcomes, for example, health determinants.1 ,2 Health literacy is thus a skill set enabling individuals to comprehend health-related information,3 to make health-related decisions in the context of everyday life, and to maintain a healthy lifestyle.4–6 Hence, individuals who are health literate should: (1) possess necessary health awareness; (2) demonstrate basic health knowledge; (3) grasp necessary health skills; (4) be able to make reasonable decisions that benefit health; and (5) be proficient in reading, writing, numeracy and basic communication skills for acquiring, accessing and practising health information.7–9 Low literacy levels are associated with increased health risk behaviours, negative health outcomes and increased medical costs.3 ,10–14

It is argued that a disease-specific or context-specific health literacy tool may be more useful and relevant when it is applied to populations in need of managing a particular chronic illness or condition.15 Furthermore, health literacy surveys can provide health professionals with useful assessments of people's health education needs while also functioning as an effective evaluation tool for targeted disease-specific interventions. There is currently no measure to assess infectious disease-specific health literacy (IDSHL), though Sun et al16 developed a skills-based instrument for measuring the health literacy of respiratory infectious diseases.

Over the past two decades, nationwide health initiatives have led to significant reductions in infectious disease prevalence in China.17 Yet infectious disease prevalence remains high when compared with developed countries.18 Improving health literacy is one of China's top health priorities towards preventing the spread of infectious disease. Funded by the National Sci-Tech Plan Project (code number 2013BAI06B06), this paper reports on the development and psychometric properties of the IDSHL instrument that was used in China.


The following steps were employed to develop the IDSHL instrument: (1) defining and conceptualising the IDSHL; (2) domain and item development; (3) instrument construction; and (4) assessing the psychometric properties of the instrument in the target population.19

Conceptualisation and constructs of the IDSHL indicator framework

We focused on three core principles to guide conceptualisation of the IDSHL instrument: cognition, decision-making and self-efficacy to prevent or treat infectious diseases. We used these core principles to facilitate four focus groups among individuals living in Beijing (average education level: middle school) in order to gain a comprehensive understanding of the domains which should be included in the IDSHL. A conceptual model consisting of six domains was finally formed: five inter-related domains assessed one's skills to prevent/treat infectious diseases and the remaining domain assessed cognitive ability. Next, we developed a second-tier indicator framework to interpret each of the domains. An expert panel consisting of 10 people with expertise in infectious disease prevention and control assessed the face validity of the initial framework as well as relevance, appropriateness and accuracy of each indicator in the framework. The final version of the framework included only those indicators where 80% agreement was obtained among the panel.

During the second stage of instrument development, we conducted two rounds of the Delphi survey to elicit expert opinion regarding the specific indicators that should be included in the IDSHL measure.20 Twenty-three of the 30 invited health workers with expertise in infectious disease control in China participated in both rounds. At the conclusion of the second round, the expert panel reached consensus on the domains included in the two-tier IDSHL indicator framework (table 1).

Table 1

IDSHL indicator framework and domains

Selection of initial infectious disease-specific items and development of the scale

On the basis of the framework, we developed initial items (questions) to form the questionnaire. An item pool of 60 questions was subsequently developed by research staff and divided into two subscales. Subscale 1 consisted of 54 questions and assessed domains 1–5 with the purpose of measuring the necessary awareness, knowledge and skills of individuals to prevent or treat infectious diseases; subscale 2 (6 questions) assessed cognitive ability. A 10-person expert panel was organised from participants of the Delphi survey. Experts were required to rate each item on a five-point Likert scale21 ranging from 5 (most relevant to IDSHL) to 1 (least relevant), and were asked to assess the clarity and conciseness of the close-ended items by using ‘yes’ or ‘no’ responses on each item. The content validity index (CVI) of the measure was calculated for each category and item. A CVI value of >0.80 was set as the cut-point for acceptable validity.22 Eventually, 10 items were removed and the final questionnaire contained 50 items with 44 items in subscale 1 and 6 items in subscale 2.

Population testing

To examine the utility of the IDSHL measure in China, 9000 residents were randomly sampled and asked to complete the questionnaire.


We used a three-staged stratified cluster sampling method to select study participants. First, we sampled three provinces (ie, Zhejiang, Hubei and Gansu) based on the socioeconomic development level (ie, competitive, average and distressed). From each sampled province, we then selected one city representing the ‘average’ socioeconomic development level. Next, we selected one urban district and one rural county from each of the three sampled cities. We then sampled two residential areas, two senior high schools, four hotels and four construction sites from each urban district; and from each county, we selected two villages and two senior high schools. Third, from each of the urban residential areas and rural villages, we employed a systematic random sampling technique to select 50 households from the household registration list. All family members of the sampled households aged 15–69 years were surveyed. From each of the sampled schools, 250 students were sampled using randomised clustered sampling methods. For the hotels and construction sites, 125 workers were sampled each due to the relatively smaller staff size. Overall, 9000 respondents were eligible to participate in the survey.

Data collection

Written consent was obtained prior to survey administration. Trained research assistants (RAs) provided instructions to respondents who then completed the self-administered questionnaire. Among respondents who had low reading comprehension, RAs read the instructions and questions without offering any additional interpretation or explanation. Most respondents spent about 20–30 min completing the questionnaire. The field survey was completed in 2014. For the 44 questions in subscale 1, we performed item reduction, reliability and validity analysis, and for the 6 questions of subscale 2, we conducted independent t-tests based on the score value.

Subscale 1

Item reduction

The 44-item subscale 1 was carefully examined so as to create a parsimonious yet psychometrically sound scale. Items retained in the subscale were required to meet the following criteria: (1) internal consistency and reliability; (2) discriminative ability; and (3) theoretical relevance and congruence with infectious disease- specific context and practices.

Statistics analysis


Cronbach's α was used to assess the internal consistency and reliability of the composite measure, and Corrected Item-Total Correlation and Cronbach's α If Item Deleted were calculated. The items selected for removal showed a relatively low item-total correlation (<0.30), Cronbach's α value (<0.75),23 discriminative coefficient (<0.30), difficulty coefficient (<1.05) or difficulty coefficient (>10).

Construct validity

We performed exploratory factor analysis (EFA) and principal components factor analysis to determine the underlying factor structure of the questionnaire.24 The scree plot of the initial analysis and the results were rotated using varimax rotation. The number of factors was examined using the following criteria: (1) an eigenvalue >1, (2) scree plot characteristics and (3) interpretability. Specifically, items were removed when: (1) the item-factor loading was <0.40; (2) the loading(s) on each variable was (were) not significant; (3) the cross-loadings indicated relatively high loadings on more than one factor and (4) the item did not contribute to factor interpretability.

Discriminative validity

We defined difficulty coefficient (score) of the item as the reciprocal of the correct response rate (eg, if the correct response rate for a specific item was 20%, then the difficulty coefficient (score) would be 100/20=5). By using this scoring method, every respondent earned a cumulative score after completing the questionnaire. Then we compared the mean score of the top 27% to the lowest 27% of respondents to test the discriminative efficiency of each item in assessing the individual's health literacy. If p<0.05, the item was considered discriminatively efficient.

We conducted receiver operating characteristic (ROC) analysis and used self-reported health status to determine the possible cut-off of the instrument.25 We assessed the performance of the instrument in classifying respondents as having adequate health literacy using the ROC curve. The cut-off point was identified, and its sensitivity and specificity were evaluated.

We performed preliminary assessments of discriminative validity by calculating the questionnaire's correlations with sociodemographic characteristics.26–29 Pearson's correlation coefficient was used to show the relationship between score and age, and an independent sample t-test was used to test the relationship between score and sex. A binary logistic regression was used to calculate the strength of association between health literacy score and sociodemographic characteristics. Health literacy was dichotomised as 1 (≥cut-off point of ROC, ie, 16.74) or 0 (<cut-off point of ROC, ie, 16.74) and all independent variables were categorised or dichotomised.

Subscale 2

To determine the discriminative efficiency of the reading comprehension materials, we conducted independent-samples t-tests (Levene's Test for Equality of Variances) to identify the score difference between those who correctly answered a reading comprehension question and those who incorrectly answered the question.

All analyses were conducted using SPSS V.24.0 software package for Windows (SPSS Inc, Chicago, Illinois, USA). We defined statistical significance with an α of 0.05.


Sociodemographic characteristics and health status of the sample

The sociodemographic characteristics of the sampled respondents are shown in table 2. Of the 9000 respondents sampled in the study, 8858 (mean age 31.39, SD 14.76 years) completed the questionnaire. The majority of respondents were of Han ethnicity (96.8%) with roughly equivalent genders (49.1% male) and those who were married (51.3%). Only 3.9% of respondents were illiterate, whereas the majority had completed high school (48.9%) or greater (12.4%). One-third of respondents (33.7%) were students and nearly two-thirds (64.1%) of respondents rated their health as good/very good. The 2-week morbidity rate was 16.7%.

Table 2

Sociodemographic characteristics of the sampled respondents in China in 2014 (n=8858)

Reliability and validity of subscale 1

Initial reliability testing

Twenty-one items were removed from the 44-item subscale 1 due to the relatively low item-total correlation or Cronbach's α value.

Validity testing

Content validity

Each of the 23 remaining items had a high content validity (CVI>0.8) based on expert ratings.

Construct validity

The results of this last factor analysis with varimax rotation showed a Kaiser-Meyer-Olkin value of 0.923, indicating sample adequacy for EFA. Bartlett's test of sphericity was significant (p<0.001), indicating the appropriateness of the data for further factor analysis. The principal components factor analysis and the scree plot of the initial analysis using varimax rotation suggested a five-factor solution (table 3 and figure 1). Q20 was removed because its item-factor loading was <0.40 (0.397 on factor 3 and 0.391 on factor 4). Eventually, five factors with eigenvalues >1 were generated and 22 items were retained. The eigenvalues ranged from 1.003 to 5.343. These five factors explained 46.27% of the variance. All factor loadings were >0.40 (p<0.05). These results corresponded very closely with what was predicted with the conceptual framework (table 1). For example, the six items that loaded highest on factor 1 were associated with infectious disease-related knowledge and values (domain 1). The seven items loading highest on factor 2 were related to infectious disease prevention (domain 2). The four items loading highest on factor 3 were related to management or treatment of infectious diseases (domain 3). The four items loading highest on factor 4 were associated with identification of pathogens and infection sources (domain 4). Finally, the two items loading highest on factor 5 were associated with transmission of infectious diseases (domain 5). Further evaluation suggested that Q6 and Q28 were more closely related to domain 1; therefore, domain 5 was merged into domain 1. The final model consists of four domains (table 3).

Table 3

Rotated component matrix of principal components factor analysis

Figure 1

Scree plot of principal components factor analysis.

Discriminative validity

The maximum score on the 22-item subscale 1 was 38.62. The mean score value of the respondents was 22.07±7.91 (mean±SD; n=8728). The two-tailed Pearson's correlation between the age and score was r=−0.238 (p<0.01). The independent sample t-test (Levene's Test for Equality of Variances) showed a statistically significant difference in the mean score value between males (21.65±8.03) and females (22.47±7.78; F=5.632, p<0.0001). One-way analysis of variance results showed a statistically significant and positive association between the mean score on the subscale and education levels: illiterate (14.16±8.19), primary school (17.00±8.20), junior high school (19.64±7.91), senior high school (23.85±6.76), college/university diploma or higher (26.55±6.26; all p<0.01 between and within groups). Those experiencing morbidity in the past 2 weeks had a statistically significant lower mean score value (20.62±8.17) than those without 2-week morbidity (22.35±7.83; F=58.064, p<0.001). Each of the remaining items had a statistically significant difference in the score value between the top 27% and lowest 27% of respondents (p<0.05).

The area under the ROC curve for predicting adequate health literacy was 0.643 (95% CI 0.615 to 0.671, p<0.001). The curve for the instrument showed that scores ≥16.74 on the instrument had a sensitivity of 77.3% and a specificity of 45.1% for predicting adequate health literacy (figure 2). Of the respondents, 76.2% had adequate health literacy levels.

Figure 2

Receiver operating characteristic curve analysis by score and self-reported health status.

Table 4 shows the results of the binary logistic regression analysis. Higher IDSHL rates were found among females, elders and those with higher education levels. Business/service staff, office clerks, self-employed entrepreneurs, students and healthcare personnel were also found to have higher IDSHL rates. In addition, respondents who perceived their health as very good/good/normal and those who did not report morbidity over the past 2 weeks had higher health literacy. Increased time spent surfing on the internet per day and preferring to obtain health information on the internet were also associated with higher IDSHL. In contrast, off-farm workers were more likely to have inadequate IDSHL.

Table 4

Results of binary logistic analysis for potential risk factors of IDSHL

Final reliability testing

For the remaining 22 items, the Corrected Item-Total Correlation ranged from 0.316 to 0.504. Each of the Cronbach's α Values If Item Deleted was lower than 0.839 (0.754–0.810). The overall α value was 0.839 and the difficulty coefficient (score) ranged from 1.19 to 4.08 (table 5).

Table 5

Reliability and difficulty coefficient of subscale 1

Subscale 2: cognitive ability

Slightly more than half (58.4%) of the respondents were regular internet users. One-third (39.8%) reported obtaining health information or knowledge via television, 30.3% through surfing the internet and 8.9% through healthcare professionals (table 2).

The independent sample t-test (Levene's Test for Equality of Variances) showed a statistically significant difference in the mean score value between those who correctly answered the four questions and those who incorrectly answered the questions. The mean score of those with a correct/incorrect answer was 24.67±6.82/20.17±8.12 (F=125.321, p<0.001), 25.05±6.56/20.95±8.09 (F=134.749, p<0.001), 23.30±7.23/17.07±8.59 (F=108.723, p<0.001) and 24.32±6.74/18.28±8.29 (F=202.181, p<0.001), respectively.


Infectious diseases are among the top 10 causes of death worldwide.30 Low health literacy is associated with poorer health outcomes including higher morbidity and mortality from infectious diseases.31 Although many tools have been developed to measure health literacy including disease-specific health literacy,32 research in IDSHL is lacking. The current study developed a 28-item IDSHL with high reliability and validity. To the best of our knowledge, this is the first study to develop an IDSHL tool and test its efficacy in a large population.

In this study, the results of the EFA indicated that the 22-item subscale 1 is a well-constructed and acceptable tool for measuring IDSHL. All items had loading values >0.40 and loaded on only one factor, suggesting that the underlying factors are meaningful. The eigenvalues of the five factors ranged from 1.003 to 5.343, and all components accounted for 46.27% of the total variance, indicating that the instrument is acceptable for capturing the attributes of IDSHL among sample respondents.

The correlations showed that the instrument had good discriminative validity. Respondents who were younger, female, had higher education, did not report morbidity in the past 2 weeks or those who were in the top 27% in cumulative score had significantly higher scores than did their counterparts. The results of binary logistic regression analysis verified, when controlling for other factors, that gender, profession, 2-week morbidity, internet use and methods for obtaining health information were associated with IDSHL. ROC analysis indicated that the cut-off point for the instrument was set at 16.74. Thus, those with scores <16.74 may require help in increasing IDSHL. These results suggest that the questionnaire is an appropriate tool for examining IDSHL.

As an essential component of health literacy, we developed a separate six-item subscale (subscale 2) to assess cognitive ability. The results demonstrated that the questions could efficiently discriminate between individuals with higher IDSHL and those with lower IDSHL.

The present instrument has important public health utility. We tested the instrument in a large sample in China. The results indicate that the difficulty level is acceptable. In particular, the scale is relatively easy to use and administer and can be completed in 20–30 min. This instrument can be used by healthcare professionals to screen patients who may be at risk for misinterpreting key health information. It can also be used as a population-level IDSHL assessment tool in public health promotion and prevention activities and research.

This study has some limitations. First, 2-week morbidity was collected retrospectively and might result in recall bias. Second, information bias may be present as a result of the survey administration techniques of the RAs. Third, the unique environmental conditions present during survey administration may have influenced survey responses and outcomes. However, this scale has been tested among a large population and demonstrated good psychometric properties; therefore, it is an acceptable tool to measure IDSHL in China.


This study developed and validated a 28-item IDSHL scale. This newly developed instrument provides an efficient, psychometrically sound and user-friendly measure of IDSHL in the Chinese population.


The authors gratefully acknowledge Changning Li, Director of the Chinese Center for Health Education, for his great support and coordination of this research project. They would also like to extend their sincere thanks to Dr Shichang Xia, Director of the Zhejiang Provincial CDC, Dr Qiaoyun Li, Deputy Director of the Hubei Provincial CDC and Dr Peijun Lu, Director of the Gansu Provincial Center for Health Education, for their kind help in the coordination of field research.


View Abstract


  • Contributors XT, ZD and YC designed the study. XT and XH performed the statistical analysis. XT, XH and LW drafted the manuscript. XR, YC, XH, FD, JC, ZC, JX, SX and GQ coordinated the field research and data collection with administrative, technical or material support. LW and JLS helped revise and edit the manuscript.

  • Funding This study was financially supported by the Chinese National Sci-Tech Plan Project (number 2013BAI06B06).

  • Competing interests None declared.

  • Ethics approval Chinese Center for Health Education.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Additional data are available by emailing XT at

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.