Article Text
Abstract
Objectives Within cost-effectiveness models, prevalence figures can inform transition probabilities. The methodological quality of studies can inform the choice of prevalence figures but no single obvious candidate tool exists for assessing quality of the observational epidemiological studies for selecting prevalence estimates. We aimed to compare different tools to assess the risk of bias of studies reporting prevalence, and develop and compare possible numerical scoring systems using these tools to set a threshold for inclusion of reports of prevalence in an economic analysis of neonatal hypoglycaemia.
Design Assessments of bias using two tools (Joanna Briggs Institute (JBI) Checklist for Prevalence Studies and a modified version of Risk Of Bias In Non-randomised Studies-of Interventions (ROBINS-I)) were compared for 18 studies relevant to a single setting (neonatal hypoglycaemia). Inclusions of studies for use in a decision analysis model were considered based on summary scores derived from these tools.
Results Both tools were considered easy to use and produced dispersed scores for each of the 40 study–outcome combinations. The modified ROBINS-I scores were more skewed than the JBI scores, particularly at higher thresholds. The studies selected for inclusion are generally the same using either tool; if 50% was used as the cut-off threshold using the Applicable Score both tools would yield the same results. However, the JBI tool is shorter and may be easier to interpret and apply to studies that do not involve a control group, while the modified ROBINS-I tool assesses more methodological detail in studies that include a control group.
Conclusion Both tools performed well for systematically assessing studies that report on outcome prevalence and provided similar discrimination between studies for risk of bias. This convergent validity supports use of both tools for the purpose of assessing risk of bias and selecting studies that report prevalence for inclusion in economic analyses.
- health economics
- health economics
- statistics & research methods
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
This study addresses a methodological task for which no single obvious candidate tool exists.
Assessments of candidate tools and approaches to use of the tools were undertaken independently by the three authors.
Convergent validity between the tools examined supports the use of either approach to guide the inclusion of prevalence reports in economic modelling.
Studies were assessed by each researcher using one tool immediately, followed by the other in a consistent order. For assessment items that are similar, responses to one tool may, therefore, have influenced responses using the second tool.
Introduction
The probability of an outcome occurring is a fundamental parameter required in the creation of a decision analytical model. It represents the likelihood that patients in a cohort will move from one health state to another in a decision tree or state transition model (eg, Markov model), and is thus often referred to as a transition probability.1 When referring to clinical outcomes, the transition probability is equivalent to the prevalence of that outcome in the population represented in the model.
The evidence base from which model parameters are drawn often involves more than a single data source, and developing the model may involve aggregation of this data.2 The process of deciding which values to use as the key inputs in a model, including the transition probabilities, should be based on a systematic review of the literature, and a description of this process should accompany the model,3–5 with the use of a source and any translational steps justified.1 4 The use of published studies as a source for transition probabilities should have their validity transparently assessed by applying critical appraisal criteria.4
In 2016, Sterne et al observed that, in terms of assessing study validity, there has been a shift in focus away from analysis of methodological quality to assessments of risk of bias, often in a domain-oriented manner, that is, considering different domains of bias in turn.6 The potential for bias and types of bias in non-randomised studies may differ from those in randomised studies.7 A number of instruments have been developed for assessing the risk of bias in non-randomised studies.7 In 2003, Deeks et al identified six that were considered to have utility for systematic reviews, although they noted that none had been formally validated.7 In 2007, Sanderson et al concluded that there was a lack of a single obvious candidate tool for assessing quality of observational epidemiological studies.8 They identified three domains as being fundamental in assessing risk of bias (appropriate selection of patients, appropriate measurement of variables and appropriate control of confounding), but noted that these were present in only approximately half of the checklists that they evaluated.8
Subsequent to these systematic reviews, Sterne et al developed the ‘Risk Of Bias In Non-randomised Studies-of Interventions’ (ROBINS-I) tool to evaluate the risk of bias in studies that do not use randomisation to allocate participants to comparison groups.6 The ROBINS-I includes a total of seven bias domains: selection of comparison groups, confounding, classification of interventions, deviations from intended interventions, missing data, measurement of outcomes and selection of the reported result. These domains can be further compartmentalised into pre-intervention (confounding and participant selection), intervention (classification of interventions) and postintervention (the remainder) categories.6 The ROBINS-I assesses risk of bias using an absolute scale, as distinct to the approach commonly used by other similar tools of comparing against a theoretical, perfect observational study or a high quality randomised trial.9 ROBINS-I was constructed with an objective of allowing the risk-of-bias assessment to determine the degree to which the rating of a study is downgraded.6 This would facilitate comparison between ratings of randomised trials and ratings of non-randomised studies when using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system. If we consider the intervention to be an exposure (eg, the occurrence of neonatal hypoglycaemia), the ROBINS-I provides a systematic approach that can assess a non-interventional observational study for risk of bias within the seven specified domains.
In 2015, Munn et al observed a lack of guidance for authors undertaking systematic reviews of observational epidemiological studies, including those reporting prevalence or incidence information.10 That absence of guidance included the lack of a standard method for conducting critical appraisals of the studies used in systematic reviews of prevalence data.11 The same authors also observed a significant increase in the volume of systematic reviews being performed and published that focused on questions of prevalence.11 This combination of factors led to the establishment of a working group, composed of researchers from the Joanna Briggs Institute (JBI, University of Adelaide, Australia), to create guidance for conducting systematic reviews of studies reporting incidence and prevalence parameters.10 This guidance has been published as a checklist with supporting explanatory information.12 When applied to prevalence studies, reported risks of bias in the JBI tool cover an array of concepts similar to those in the ROBINS-I tool.
The ROBINS-I and JBI tools were selected for comparison in this study in light of the conclusions by Sanderson et al in their comprehensive 2007 systematic review that, despite the existence at that time of 86 candidate tools developed to assess the quality of evidence from observational epidemiological studies, none could be recommended as a single ideal candidate.8 Both the ROBINS-I and JBI tools were developed subsequent to that review, and address a number of the recommendations from Sanderson et al, particularly those relating to rigour in their development and appropriate coverage of key domains.
The ROBINS-I domain pertaining to bias in ascertainment of exposures is notably lacking from the JBI tool, which was not designed with the explicit intent of assessing reports of prevalence after a nominated exposure. It does not explicitly inquire about such concepts as whether exposure was measured prior to determination of outcome; whether exposure measures were defined, reliable, and consistently applied; whether different levels of exposure were considered; or whether the exposure was assessed more than once over time. The JBI tool also does not explicitly address bias in reporting of results, particularly the implications of performing multiple measurements or analyses of the exposure–outcome relationship.
Conversely, although the ROBINS-I tool does assess a number of concepts related to measurement of the outcomes, it does not explicitly examine the validity of outcome ascertainment, and it does not downgrade on the basis of sample size alone. Further, the ROBINS-I tool contains a series of assessment items examining the appropriateness of methods for selecting a control group; a topic not included in the JBI tool.
Differences in prevalence for the same or similar outcomes vary for a number of reasons, including methodological differences, differences in definitions of the outcomes, and differences in the populations being examined. We wished to select published reports of prevalence of outcomes of neonatal hypoglycaemia for use in a decision analytic model for an economic analysis. A wide range of prevalence figures have been reported for these outcomes, in part because of inconsistencies in the definition of neonatal hypoglycaemia, particularly the blood glucose concentration threshold used to diagnose asymptomatic cases, changes in that definition over time, and differences in approaches to screen for and identify the condition. The blood glucose concentration threshold for diagnosing neonatal hypoglycaemia has ranged from 20 mg/100 mL (1.11 mmol/L)13 in earlier studies to 2.6 mmol/L,14 and has variably included additional criteria such as a requirement for low results on consecutive measurements.
In an economic analyses, each prevalence parameter needs to be informed by the available information even if the underlying quality of information is of poor quality. This means that the question becomes how to decide which sources of information to include and not include for each outcome, rather than determining a single inclusion threshold across all studies. We first undertook this study to examine the use of risk-of-bias assessments to assist with these decisions.
Objective
We aimed to (1) undertake a comparison of different tools to assess the risk of bias of studies reporting prevalence for use as data sources for economic analyses, and (2) develop and compare possible numerical scoring systems using these tools to set a threshold for inclusion of reports of prevalence in an economic analysis, using the example of outcomes of neonatal hypoglycaemia.
Methods
Both the ROBINS-I tool6 and the JBI Checklist for Prevalence Studies12 were selected for initial assessment. We chose these two tools based on their applicability to observational studies and/or studies reporting prevalence, consistency with the GRADE approach to assessment of uncertainty, and advice from local researchers familiar with candidate instruments. A modified version of the ROBINS-I tool was preformatted into a spreadsheet for ease of use by assessors. The ROBINS-I assessment item pertaining to the bias domain of deviations from intended interventions was excluded as the topic of interest was an exposure at a point in time rather than an intervention over time. Instead, we added three assessment items pertaining to the domain of study design (clarity of the statement of objective, inclusion of sample size justification or similar, inclusion of an unexposed group) and three pertaining to external validity (specification of the study population, relevance of the cohort to the target population and drop-out rate) (online supplemental table 1). For each domain, the overall bias was summarised as high, low, or uncertain.
Supplemental material
From the pool of non-randomised studies that reported, or allowed the calculation of, prevalence of outcomes of neonatal hypoglycaemia, we selected three that covered a range of methodologies and study population sizes and focused on a single outcome.15–17 All three researchers assessed these three studies using both assessment tools, discussed discrepancies and reached consensus on how the questions should be interpreted. A further 18 studies18–35 reporting prevalence of outcomes after neonatal hypoglycaemia were then each assessed by combinations of two of the three researchers using both tools.
Three summary scores were formed to facilitate further comparison between studies (online supplemental table 2).
Supplemental material
Count Score(s): sums of responses in each column. That is, the total number of responses indicating low risk of bias, and the total number of responses indicating high risk of bias. Two separate values are thus generated. For the sum of responses indicating low risk of bias, a higher score represents a low risk of bias; for the sum of responses indicating a high risk of bias, a higher score represents a high risk of bias. These are presented as a percentage of the total value possible on the tool. (Note that the total number of questions, and therefore, the maximum total value is 12 on the modified ROBINS-I tool and 9 on the JBI tool).
Composite Score: calculated by subtracting the total number of responses indicating high risk of bias from the total number of responses indicating low risk of bias. A higher score represents a lower risk of bias. Negative values are possible for studies that score a greater number of high risk of bias elements/domains than low risk of bias elements. This is presented as a percentage of the total value possible on the tool.
Applicable Score: conversion of the Composite Score into a percentage by dividing the Composite Score by the maximum score possible after subtracting any ‘not applicable’ responses. A higher score represents a lower risk of bias. Negative values are also possible using this approach.
All three scores have a maximum value of 100%. Excluding ‘not applicable’ responses in the Applicable Score is intended to more accurately reflect which elements of the tool are relevant to the study being assessed.
Patient and public involvement
This work is a research methods paper, and as such was undertaken without patient involvement.
Results
Ease of use and assessor agreement for initial three studies
All three researchers reported that the assessment tools and spreadsheets were generally easy to use, and that, because of the structural and content similarities between the two tools (online supplemental table 1), assessment using both tools did not result in a large increase in time required compared with assessment using a single tool. However, since the JBI tool includes fewer assessment items it may have a modest time advantage over the modified ROBINS-I tool.
After the initial training assessment of three studies, chance corrected AC1 agreement, a more valid measure of inter-rater reliability than the kappa statistic36 between the two assessors ranged from 0.51 (95% CI 0.18 to 0.84) to 0.93 (95% CI 0.81 to 1.00) for the modified ROBINS-I and 0.39 (95% CI 0.06 to 0.71) to 0.79 (95% CI 0.56 to 1) for the JBI tool. There were no consistent patterns in the assessment fields for which scores were discrepant across the 12 modified ROBINS-I or 9 JBI domains from 18 studies. All discrepancies were resolved by discussion before inclusion in subsequent evaluation.
Assessment tool scores and agreement
When used by combinations of two researchers to assess 40 study–outcome combinations (hereafter ‘assessments’) from the 18 studies, both the modified ROBINS-I and JBI tools resulted in a wide distribution of scores for each outcome (figures 1 and 2), potentially allowing selection of studies for inclusion at a wide range of thresholds. The distribution of scores with the modified ROBINS-I tool was generally skewed slightly higher than the distribution of scores with the JBI tool.
Using the Count Scores, the difference between the two tools in the number of studies selected for inclusion or exclusion varies with the threshold in a non-linear manner (table 1). For lower thresholds (eg, 25%), there is greater difference between the two tools than for higher thresholds (eg, 50%, 75%), with more studies being included using the modified ROBINS-I than using the JBI at the lower thresholds.
Using the Composite or the Applicable Scores, the modified ROBINS-I and JBI tool each resulted in very similar numbers of studies included (table 1). If 50% was used as the cut-off threshold for inclusion or exclusion of studies based on their risk of bias, both tools would give the same results using the Applicable Score (figure 3). The level of agreement fell (ie, some studies would be included using one tool but not the other) with either higher or lower cut-off thresholds.
Notable outliers where the scores were very different using the two tools (figure 3) were one study(30) on the outcomes of learning disabilities and epilepsy (modified ROBINS-I Applicable Score 42%, JBI Applicable Score 0%) and one study(32) on epilepsy and vision disorders (modified ROBINS-I Applicable Score 18%, JBI Applicable Score-56%). Both of these studies have low numbers of subjects (39 and 45 cases, respectively). For both studies, items relating to bias due to confounding were graded as being at high risk of bias when using the modified ROBINS-I tool, but at low risk of bias using the JBI tool. These differences related to scoring of bias in selection of comparison groups and in measurement of outcomes. For the selection of comparison groups, the modified ROBINS-I tool items were scored as uncertain or not applicable, but the JBI tool items were scored as high risk of bias. For the measurement of outcomes, the modified ROBINS-I tool items were scored as low risk of bias, while the JBI tool items were scored as uncertain.
Discussion
Both of the domain-based assessment tools we considered performed well for systematically assessing studies that report on outcome prevalence and provided similar discrimination between studies with higher and lower risk of bias. Although the selection of a threshold for inclusion or exclusion of prevalence studies is subjective, the application of a standardised risk of bias assessment before selecting a threshold does allow discrimination between the upper and lower ranges of risk of bias among the candidate studies.
Although presented with different wording, the component questions of the modified ROBINS-I and JBI assessment tools include variations on the same concepts, and overlap in a number of domains. Both tools address overall applicability, selection and description of the study population(s), reporting and appropriateness of sample size and statistical analyses, risk of bias due to measurement of outcomes, and the response rate/missing data.
Both assessment tools were perceived as being simple to use, with a minimal learning curve; after an initial set of training assessments, agreement between assessors was high and unanimity was readily reached with brief discussion where required. Numerically, the ROBINS-I tool (both original and modified) includes more components that need to be considered to complete the domain-level assessments and covers greater breadth of potential bias domains. The JBI tool, however, was designed to specifically critique studies including reports of prevalence, and its component items may be more focused on this goal.
Although neither tool is designed to output a numeric score, both tools gave similar results using any of the three different scoring systems that we devised to determine whether particular studies should be included or excluded from use in estimating prevalence of an outcome, particularly at higher thresholds. The distributions of scores were wide enough with both tools to allow selection for inclusion at a number of different thresholds, or to stratify studies into different levels of risk of bias as a component of consideration for inclusion. This convergent validity supports both tools for the purpose of assessing risk of bias and selecting studies that report prevalence. The selection of a specific threshold may be based on the number of applicable studies available or the relative or absolute number needed for inclusion.
The two studies assessed as having very different scores using the two tools had low population numbers, which the JBI tool penalises to a greater extent than the modified ROBINS-I, and both included ‘unclear’/‘unknown’ responses in their JBI assessments, which reduces the denominator in the Applicable Score calculation, thus increasing the impact of the remaining assessment items on the score calculation. The specific differences between the modified ROBINS-I and JBI tools that accounted for the different scores were (1) those related to data being gathered from both a sample and control and the potential for confounding due to patient characteristics (covered in modified ROBINS-I) as compared with measuring outcomes in a sample population only (JBI) and (2) those related to blinding of outcome assessors (covered only in modified ROBINS-I). Scores are therefore lower using the modified ROBINS-I tool for reports of prevalence in studies that measure outcomes in an exposure population, but not a non-exposed control group, and studies in which the assessor is not blind to the exposure. Such blinding may not be practical in many of the studies in which outcome prevalences are reported.
These differences between tools are likely to be more important where a lower threshold for inclusion is used, either because most available studies are at higher risk of bias, or there are few studies available reporting a particular outcome. The JBI tool may be easier to interpret and apply to studies where a control group is not present, whereas the modified ROBINS-I tool addresses a slightly wider range of parameters related to overall methodological quality when assessing studies that compare the rates of outcomes between an exposure group and a control population.
Limitations
Conversion of the tools to numeric scores does not apply any differential weighting to the assessment domains. Arguably, this may result in inclusion of some studies with severe bias in a critical domain or exclusion of studies with bias in domains that the researcher considers less critical for the purposes of the planned economic analysis. However, forming a numeric score does not preclude researchers also using qualitative assessments before making a final decision. Where many potential data sources exist, risk of bias tools may supplement such judgements by suggesting an initial ordering of candidate studies.
We utilised published risk-of bias assessment tools; one modified and one unmodified. Although we did not assess the impact of our modifications of the ROBINS-I tool, addition of methodological domains not present in the original versions may be useful for other researchers to include domains deemed relevant for the purpose of a planned economic analysis.
In assessing these tools, studies were assessed by each researcher using one tool immediately, followed by the other in a consistent order. For assessment items that are similar, responses to one tool may, therefore, have influenced responses using the second tool.
Summary
Either the modified ROBINS-I or JBI risk-of-bias assessment tools can be used to select observational studies reporting prevalence for inclusion in an economic analysis. The results of the risk-of-bias assessments can be converted into numerical scores, and thresholds for inclusion can be selected at an appropriate level to include more or fewer studies as required. Particularly at higher thresholds, the studies selected for inclusion are generally the same using either tool. However, the JBI tool is slightly shorter and may be easier to interpret and apply to studies that do not involve a control group, but the modified ROBINS-I tool assesses more methodological detail particularly in studies that include a control group.
References
Footnotes
Contributors All listed authors contributed to the planning, conduct, and reporting of the accompanying work. MJG wrote the first draft, while RE and JEH contributed supervision/oversight, and review and editing of the manuscript. Each author listed has seen and approved the submission of this version of the manuscript and takes full responsibility for the manuscript.
Funding Financial support for this study was provided in part by a grant (Douglas Goodfellow Medical Research Fellowship, grant number 1417003) from the Auckland Medical Research Foundation. The funding agreement ensured the authors’ independence in study design, data collection, data analysis, data interpretation, writing of the report, and the decision to submit the paper for publication.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Ethics approval Each author listed on the manuscript has seen and approved the submission of this version of the manuscript and takes full responsibility for the manuscript.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as online supplemental information.