Article Text

Original research
What should the standard be for passing and mastery on the Critical Thinking about Health Test? A consensus study
  1. Allen Nsangi1,
  2. Diana Aranza2,
  3. Roger Asimwe3,4,
  4. Susan Kyomuhendo Munaabi-Babigumira5,
  5. Judith Nantongo6,
  6. Lena Victoria Nordheim7,
  7. Robert Ochieng8,
  8. Cyril Oyuga9,
  9. Innocent Uwimana10,
  10. Astrid Dahlgren11,
  11. Andrew Oxman12
  1. 1Department of Medicine, Makerere University College of Health Sciences, Kampala, Uganda
  2. 2University Department for Health Studies, University of Split, Split, Croatia
  3. 3Lower Secondary School Section, Group Scolaire Nduba, Kigali, Rwanda
  4. 4Secondary School Teaching, Ministry of Education, Kigali, Rwanda
  5. 5Department of Global Health, Norwegian Institute of Public Health, Oslo, Norway
  6. 6Biology Department, Baptist High School, Kitebi, Uganda
  7. 7Department of Health and Functioning, Faculty of Health and Social Sciences, Western Norway University of Applied Sciences, Bergen, Norway
  8. 8Lower Secondary Section, Kibos Secondary School, Kondele, Kenya
  9. 9Research and Knowledge Management Department, Kenya Institute of Curriculum Development, Nairobi, Kenya
  10. 10Basic Education, Rwanda Education Board, Kigali, Rwanda
  11. 11Faculty of Health Sciences, Oslo Metropolitan University, Oslo, Norway
  12. 12Centre for Informed Health Choices, Norwegian Institute of Public Health, Oslo, Norway
  1. Correspondence to Dr Andrew Oxman; oxman{at}online.no

Abstract

Objective Most health literacy measures rely on subjective self-assessment. The Critical Thinking about Health Test is an objective measure that includes two multiple-choice questions (MCQs) for each of the nine Informed Health Choices Key Concepts included in the educational resources for secondary schools. The objective of this study was to determine cut-off scores for passing (the border between having and not having a basic understanding and the ability to apply the nine concepts) and mastery (the border between having mastered and not having mastered them).

Design Using a combination of two widely used methods: Angoff’s and Nedelsky’s, a panel judged the likelihood that an individual on the border of passing and another on the border of having mastered the concepts would answer each MCQ correctly. The cut-off scores were determined by summing up the probability of answering each MCQ correctly. Their independent assessments were summarised and discussed. A nominal group technique was used to reach a consensus.

Setting The study was conducted in secondary schools in East Africa.

Participants The panel included eight individuals with 5 or more years’ experience in the following areas: evaluation of critical thinking interventions, curriculum development, teaching of lower secondary school and evidence-informed decision-making.

Results The panel agreed that for a passing score, students had to answer 9 of the 18 questions and for a mastery score, 14 out of 18 questions correctly.

Conclusion There was wide variation in the judgements made by individual panel members for many of the questions, but they quickly reached a consensus on the cut-off scores after discussions.

  • medical education & training
  • community child health
  • public health
  • education & training (see medical education & training)
  • health services administration & management

Data availability statement

Data are available upon reasonable request. Data have been made available as an appendix to this manuscript. Any extra data will be made available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • The cut-off scores were determined using a combination of robust methods.

  • The judging panel had content expertise and familiarity of context.

  • The panel included eight people, a number on the lower end of the spectrum recommended for both methods used.

Introduction

Critical thinking is one of the most often included competencies in education systems the world over.1–3 However, there is little agreement on its definition4 or how it should be taught and evaluated.5

If health literacy is the ability to access, understand, appraise and apply health information,6 then critical health literacy is potentially a higher order thinking process (critical thinking) that could be developed through education to critically appraise information relevant to health.7

Within the educational sector, critical thinking focuses on dispositions and abilities that help people to decide what to do or what to believe. While critical thinking and health are widely included in primary and secondary school curricula, critical health literacy or critical thinking about health is not.8–10

Individuals with higher levels of health literacy are more likely to make healthy choices in life. Poor health literacy has been found to be a barrier to access to basic health services such as screening,8 lower adoption of preventive actions such as vaccination and insufficient understanding on the role of antibiotics.11

People with higher health literacy levels make better decisions when it comes to their health, they are more capable of adhering to treatments and they make more efficient use of resources.9

Health literacy assessment is recognised as an important consideration in delivering appropriately tailored effective healthcare and achieving better health outcomes.12 However, health literacy assessment tools continue to primarily focus on individuals and are slow in shifting from a medical perspective towards a societal one.13

The most frequently used tools reported in the literature are the Rapid Estimate of Adult Literacy in Medicine-Short Form, which tests reading ability through word recognition and pronunciation14; the Test of Functional Health Literacy in Adults, which requires patients to read and complete missing sections of selected passages of information to measure reading comprehension, as well as to read and apply the information on prescription labels and appointment slips to assess numeracy15; and the Newest Vital Sign, a quick assessment of reading comprehension and numeracy, requiring patients to read an ice cream nutritional label, then answer six problem-solving questions.16 17

All of these health literacy assessment tools, and other instruments used in children and adolescents18 focus on functional literacy and do not assess critical health literacy particularly people’s ability to appraise health information. Health literacy tools that include measures of critical health literacy, such as the European Health Literacy Survey Questionnaire, tend to rely on subjective self-assessment, which does not correlate with cognitive skills, rather than objective performance which does.18

The Informed Health Choices (IHC) project has developed learning resources based on a framework of concepts that people should understand and apply to assess healthcare claims and make informed health choices.19 We initially developed resources for primary school children (10–12 years old). Those resources included a textbook, a workbook, a teachers’ guide, a set of cards for one of the lessons and a classroom poster. The resources were found effective after being evaluated in a cluster randomised trial in Uganda.20 Those resources addressed 12 IHC Key Concepts—concepts that students should understand and apply to assess healthcare claims and make informed health choices.21 22

Building on this body of work and context analyses in Kenya, Rwanda and Uganda,23 we have developed digital resources for lower secondary school students (ages 14–16 years) in East Africa. Those resources, which address nine prioritised Key Concepts (table 1),23 are being evaluated in cluster randomised trials.24–26

Table 1

Key Concepts included as learning goals in the IHC lower secondary school learning resources

The primary outcome measure for the trials, an objective measure of critical health literacy, is a test with multiple-choice questions (MCQs) from the Claim Evaluation Tools item bank. The item bank contains MCQs that can be used to measure an individual’s ability to apply each of the 49 IHC Key Concepts.27 The MCQs can be used to assess learners’ abilities, evaluate the effectiveness of interventions or map people’s abilities.

The ‘Critical Thinking about Health (CTH) Test’ includes two MCQs for each of the nine Key Concepts addressed by the IHC lower secondary school resources. The primary outcome for the trials is the proportion of students who have a passing score on the CTH Test. Determining the proportion of students who pass requires determining a cut-off score, above which learners pass. In this context, a passing score indicates that learners:

  • Have a basic understanding of the concepts and how to apply them.

  • Do not need to repeat lessons or receive some other additional or alternative instruction.

  • Are ready to go on to subsequent lessons that reinforce learning of the same concepts and introduce new concepts.

Setting a standard is essential to ensure that the test results will be meaningful, interpretable and defensible.28 There is currently no relevant empirical literature on setting a standard for the CTH Test. Interpreting average differences in scores for a test or other continuous (or count) outcome measures is challenging.29 It requires a basis for judging the importance of an average difference. For instance, a small average difference in test scores might be due to most students doing a little bit better or to a few students doing a lot better when comparing two groups of learners. The difference in the proportion of learners who have a passing score is more meaningful and easier to interpret than an average difference in test scores. However, one major statistical drawback for dichotomising this continuous variable may result in a loss of descriptive information on the performance of the study population. For example, the nature and extent of differences between individuals with poorer performance are lost when a cut-off score is dichotomised as having/not having passed with a passing or mastery score.30

Objectives

The objectives of this study were to determine cut-off scores for passing (having at least a borderline ability to apply the concepts) and mastery (having mastered the concepts) for the secondary school resources.

Methods

We applied a modification of the Nedelsky’s and Angoff’s methods to determine an absolute standard.31 Both methods rely on expert judges and the concept of individuals who are on the border of passing or failing. In the Nedelsky’s method, judges eliminate response options that a borderline learner would be able to eliminate.32 The chance of getting each question correct is then equal to one divided by the number of remaining response options. For example, if there are two remaining response options (one of which is the correct option), the chance of a borderline individual answering the question correct is one-half or 50%. The resulting cut-off score is then determined by adding up the probabilities for all the questions.

With Angoff’s method, which is one of the most widely used, the judges assess the difficulty of each question as a whole.33 The Angoff’s method relies on subject matter experts who examine the content of each question (item) and then predict how many minimally qualified test takers would answer the item correctly.

Using a combination of Nedelsky’s and Angoff’s methods, starting with Nedelsky’s method, the judges increased or decreased the probability of answering each question correctly based on an overall assessment. This gave them a logical approach to making an initial judgement about the difficulty of each question. It then allowed them to adjust for uncertainty about the number of response options a borderline individual would eliminate, the difficulty of the stem (scenario) for the question, the difficulty of the concept, and anything else that may have made a question more or less difficult.

For each method, there are five stipulated steps:

  1. Selection of judges.

  2. Defining ‘borderline’ knowledge and ability.

  3. Training of the selected judges in the use of the method.

  4. Collection of their judgements.

  5. Combining the judgements to determine a cut-off.

Selection of the judges

In March 2022, we purposively selected and recruited four types of judges: lower secondary school teachers who participated in the pilot in each country to ensure that the judgements made were appropriate for the target audience and the context, health systems researchers and individuals who teach evidence-informed decision making, and curriculum developers and educational researchers with experience in evaluation of educational interventions designed to teach critical thinking skills (table 2).

Table 2

Judges

The recommended number of judges when using the Angoff’s method ranges from 5 to 30.34 For this study, we recruited a total of eight judges,35 a number that we considered to be manageable and adequate for making the required judgements while relying on our previous experience establishing a standard for passing and mastery for our earlier resources of primary school students.36

We had initially contacted nine individuals, all of whom agreed to participate in the process apart from one who cited a busy work schedule, thereby leaving us with a number necessary to enable valid inferences to be made in addition to meaningful participation in the discussions.

A commonly held view in the scientific literature is that the resulting cut-off scores may be more accurate as the subject expertise of the judges increases, but that assertion has not been empirically confirmed.35 37 38

For this study, we aimed to ensure diversity within the panel of judges by selecting experienced individuals (5 or more years) with the following types of expertise:

  • Health researchers and people who teach evidence-informed decision-making.

  • Educational researchers with experience evaluating interventions to teach critical thinking skills.

  • Curriculum or examination developers.

  • Lower secondary school teachers in East Africa.

We invited at least one teacher from each country who participated in the pilot study of the learning resources to help ensure that the cut-offs are appropriate for the context in which the learning resources were to be evaluated. The context under consideration is lower secondary schools in East Africa, comprising of high teacher–student ratios of about 1:60 on average, limited resources and students with English as a second or third language. Judges were provided with instructions in advance on how judgements would be made (online supplemental appendix 1).

Definition of borderline knowledge and ability

We defined a student on the border of passing as an individual who may or may not have a basic understanding of the concepts and the ability to apply them, may or may not need additional instruction, and may or may not be ready to go on to subsequent lessons. We defined a student on the border of master as an individual on the border between having mastered and not having mastered the nine key concepts, having a basic understanding of the concepts and how to apply them and having a clear understanding the concepts and how to apply them, and not needing and clearly not needing additional or alternative instruction and being ready to go on to other lessons which will reinforce learning of the same concepts and introduce new concepts.

We created personas that were characteristic of people on the border of passing and of people on the border of having mastered the concepts (online supplemental appendix 2).

Training of the selected judges

The training of the selected judges occurred remotely, having sent the training materials (protocol, CTH Test, instructions and personas) a few days prior to the 1-hour online discussions where judges were given an opportunity to ask questions.

The instructions provided to the judges were discussed in detail before they started making their judgements (online supplemental appendix 1). The main objective of the training was to enable the judges to assess the difficulty of each question for two types of test takers: (1) ones who have a borderline understanding of the concepts they need to assess claims about treatment effects and (2) ones who have mastered the concepts. The judges took the CTH Test before they made judgements about the difficulty of the questions. On completion of the test, we did not assess their individual performance on the test but gave them the right answers to the questions as reference for when they made their judgements. We anticipated that giving them the correct answers after attempting the test themselves would help give them a sense of how difficult the questions were but individual assessments of their performance would not be necessary since some of the judges had participated in teaching the concepts in pilot schools thereby creating an unfair advantage.

The judges had a practice round with six MCQs with different degrees of difficulty before making their individual judgements. This exercise informed a discussion of what made a question easy or difficult. It also alerted them to their tendencies to be more or less pessimistic about the probability of a borderline student answering questions correctly in comparison with the other judges.

The ‘CTH Test’, although availed to the judging panel for purposes of setting a cut-off, is currently not available with this manuscript to avoid contamination pending the preplanned evaluations in the three East African countries for whose purpose a cut-off score is being set.

However, the Claim Evaluation Tools item bank found here is open access and free for non-commercial use.

Collecting judgements and combining the judgements to determine a cut-off

The judges independently made their judgements for all 18 MCQs. One of us (AN) calculated the mean and the median for each MCQ and for the cut-off score. She presented these and the range to the judges. The judges were also shown the difficulty of the MCQs based on the results of the Rasch analysis after making their judgements (online supplemental appendix 3). AN and AO moderated an online discussion during which disagreements were discussed and resolved.

We used a nominal group technique to reach a consensus.39 We initially shared all the judgements for each MCQ with the judges. We then invited those from each end of the spectrum to provide reasons for their judgements, before inviting others to comment. After the final cut-off score was agreed upon, we checked to make sure that all the judges agreed with the cut-off scores, and adjustments were made, if needed, based on the consensus of all the judges.

The same approach was used to determine a cut-off score for passing and for mastery.

Patient and public involvement

There was no patient or public involvement in the study. In addition to participating in the process for the establishment of a standard for passing and mastery, study participants have also been involved in the interpretation of the study results and the write-up of this manuscript.

Results

The discussions and consensus meetings were conducted online on 9 March and 22 March 2022. During the pilot, the judges agreed on the following:

  • With a combination of prolonged school closures in East Africa due to the COVID-19 pandemic and English being a second or third language for many of the test takers, the judges agreed as a rule to always decrease the probability of answering a question correctly by at least 10% for both borderline and mastery test takers to account for reading errors.

  • For purposes of determining the cut-off score, the judges agreed about the importance of keeping in mind the contexts in which we are using the test and the cut-off scores.

During our discussions about the judges’ reasoning, we found that different judges had different reasons for their judgements, and each judge tended to apply the same reasoning across the MCQs.

Apart from one of the judges, who had participated in teaching the content to lower secondary school students during the piloting of an earlier version of the resources, the judges were not consistently biased towards underestimating or overestimating the difficulty of the MCQs. Although there were substantial differences in the panel’s independent judgements about the difficulty of each MCQ (online supplemental appendix 4), there was less disagreement when the probabilities for each MCQ were summed up to determine the cut-offs, and the judges quickly came to a consensus about the difficulty of each MCQ and the cut-offs after a couple of deliberations, with each lasting at least an hour (table 3).

Table 3

Individual and consensus summary judgements

Following discussions of each MCQ and the cut-offs, the judges agreed that at least 9 questions out of 18 needed to be answered correctly to pass and at least 14 questions needed to be answered correctly to demonstrate mastery of the concepts.

Discussion

Empirical studies have shown that when judges use a common definition of minimally competent test takers, this tends to increase judgement consensus when determining cut-off scores.38 Although there was substantial variation in the judges’ independent assessments of the difficulty of each MCQ, the judges quickly reached a consensus, which is consistent with findings from multiple studies that determined cut-off scores.38 40

We provided the judges with performance data after they made their independent judgements, and made them aware that although the data provided an indication of the relative difficulty of the questions, it did not provide an indication of the probability of a borderline test taker answering a question correctly, since most of the Rasch analysis data41 came from a mix of people, most of whom had not been taught and were not familiar with the IHC Key Concepts. Some studies have indicated that when judges view normative data, they tend to contaminate the process and systematically lower cut-off scores.38 40 However, there is no indication that this occurred in this study.

When assembling a panel of judges, both the Angoff’s and Nedelsky’s methods recommend between 20 and 30 judges who are representative of the population to which the standards will be applied.19 However, there is little agreement on the appropriate number of judges,12 16 and several studies have found that between 5 and 10 judges is a manageable number and sufficient to determine cut-offs. In this study, the eight judges who participated in determining the cut-off scores came from different disciplines (education, health and evidence-based practice) and countries (Croatia, Kenya, Rwanda, Norway and Uganda).

Evidence suggests that cut-off judgements made using the Angoff’s method are reproducible,37 but there is a possibility for variability in cut-offs determined by different groups of judges as experience and context are brought into play.42 There is no gold standard for setting a passing score. However, to ensure that the resulting cut-off is reproducible and unbiased, the approach that is used should ensure the credibility of the judges and use a systematic approach to collect their judgements. The key aspects to consider when selecting judges are their content expertise, familiarity with the context and examinees, and achieving a good balanced in gender and ethnicity.43 This study met all these standards.

Strengths and limitations

We determined the cut-off scores using a combination of robust methods such as Angoff’s and Nedelsky’s while working with a panel of judges who had content expertise and were familiar with the context. The cut-offs were established for students in lower secondary schools in East Africa. It is uncertain whether the same cut-offs are appropriate for other contexts.

However, the methods used in this study are robust and efficient and could be used in other settings, as well as for other tests using questions from the Claim Evaluation Tools database or other MCQs.36

Although the number of judges recommended by both the Angoff’s and Nedelsky’s methods ranges from 5 to 30, we were on the lower end of the spectrum with only eight members on the judging panel, a number we found manageable but may have left out significant contributions from others to the judgements.

Conclusion

Although there was wide variation in many of the individual judgements, it was possible to reach a consensus on the cut-off scores for passing and mastery in an online meeting that lasted less than 90 min.

The use of a combination of the Angoff’s and Nedelsky’s methods, in addition to initial agreement on some general guidance following a pilot, ensured an appropriate process that resulted in absolute standards for having a basic understanding (passing) and mastery of the nine concepts addressed in the IHC secondary school resources.

Data availability statement

Data are available upon reasonable request. Data have been made available as an appendix to this manuscript. Any extra data will be made available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

We obtained ethics approval from the following institutions: (1) Rwanda National Ethics Committee (approval number 916/RNEC/2010) for the Rwandan study site; (2) Masinde University of Science and Technology Institutional Ethics Review Committee and the Kenyan National Commission for Science, Technology and Innovation (approval number NACOSTI/P119/1986) for the Kenyan study site; (3) Makerere University School of Medicine Research Ethics Committee and the Uganda National Council of Science and Technology (reference number HS91ES) for Uganda. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We are greatly indebted to the Informed Health Choices (IHC) Team for their invaluable feedback during the planning of this study. In addition, we would like to thank the students and teachers who participated in the pilot and validation (Rasch analysis study) of the Critical Thinking about Health Test whose findings we drew upon to provide context for the judges on the panel.

References

Supplementary materials

Footnotes

  • Twitter @AllenNsangi, @AsiimweRoger10

  • Contributors AN, AO and AD were responsible for study conception, wrote the protocol, conducted the study, and led data acquisition, analysis and interpretation. DA, RA, SKM-B, JN, LVN, RO, CO and IU provided feedback during the process, and participated in data acquisition and interpretation. AN drafted this paper, while AO, AD, DA, RA, SKM-B, JN, LVN, RO, CO and IU provided substantial input to the draft. AN is the article guarantor.

  • Funding This study was funded by the Norwegian Research Council (project number: 284683, grant number 69006) awarded to AO through the Norwegian Institute of Public Health, in collaboration with Makerere University, Uganda, Tropical Institute of Community Health and Development, Kenya and the University of Rwanda, Rwanda.

  • Disclaimer The funder had no role in the study design, preparation of the manuscript and publication decision.

  • Competing interests None declared.

  • Patient and public involvement No patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.