Introduction The Health of the Nation Outcome Scales (HoNOS) for adults, and equivalent measures for children and adolescents and older people, are widely used in clinical practice and research contexts to measure mental health and functional outcomes. Additional HoNOS measures have been developed for special populations and applications. Stakeholders require synthesised information about the measurement properties of these measures to assess whether they are fit for use with intended service settings and populations and to establish performance benchmarks. This planned systematic review will critically appraise evidence on the measurement properties of the HoNOS family of measures.
Methods and analysis Journal articles meeting inclusion criteria will be identified via a search of seven electronic databases: MEDLINE via EBSCOhost, PsycINFO via APA PsycNET, Embase via Elsevier, Cumulative Index to Nursing and Allied Health Literature via EBSCOhost, Web of Science via Thomson Reuters, Google Scholar and the Cochrane Library. Variants of ‘Health of the Nation Outcome Scales’ or ‘HoNOS’ will be searched as text words. No restrictions will be placed on setting or language of publication. Reference lists of relevant studies and reviews will be scanned for additional eligible studies. Appraisal of reliability, validity, responsiveness and interpretability will be guided by the COnsensus-based Standards for the selection of health Measurement INstruments checklist. Feasibility/utility will be appraised using definitions and criteria derived from previous reviews. For reliability studies, we will also apply the Guidelines for Reporting Reliability and Agreement Studies to assess quality of reporting. Results will be synthesised narratively, separately for each measure, and by subgroup (eg, treatment setting, rater profession/experience or training) where possible. Meta-analyses will be undertaken where data are adequate.
Ethics and dissemination Ethics approval is not required as no primary data will be collected. Outcomes will be disseminated to stakeholders via reports, journal articles and presentations at meetings and conferences.
PROSPERO registration number CRD42017057871.
- systematic review
- measurement properties
- outcome measures
- mental health services
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
This systematic review will apply structured checklists that standardise the appraisal of available evidence.
The review potentially extends previous systematic reviews on the topic by including meta-analyses of relevant measurement property metrics, if data are adequate.
The review focuses on clinician-rated versions of the Health of the Nation Outcome Scales (HoNOS) family of measures; it does not include self- or proxy-completed versions.
To reduce potential language bias, the review will include all relevant studies regardless of language of publication and studies using translated versions of the HoNOS family of measures.
The search strategy does not include dissertations and reports; this may mean that a small amount of relevant information is missed.
In 1992, UK’s Health of the Nation strategy set a target to ‘improve significantly the health and social functioning of mentally ill people’.1 The Health of the Nation Outcome Scales (HoNOS)2 was developed under the auspices of the Royal College of Psychiatrists as a means of quantifying progress against this target. The HoNOS was developed as a clinician-rated measure for use with working-age adults in contact with mental health services. It comprises 12 scales assessing behaviour, impairment, symptoms and social functioning. Each scale is scored on a five-point scale representing maximum severity over the rating period, typically the previous 2 weeks (0=no problem; 1=minor problem requiring no action; 2=mild problem but definitely present; 3=moderately severe problem; 4=severe to very severe problem). Scoring is guided by a glossary that provides specific anchor points for each scale. Subscale and total scores can be derived. Scores are based on clinical judgement, and can be used to guide patient treatment, resulting in the provision of targeted care and support for clinical decision-making.3 In addition to its application in clinical practice, the HoNOS has also been used by researchers and policy-makers to monitor mental health service quality and effectiveness.4
In the decade following the development of the HoNOS, it was acknowledged that additional variants were required for use with specific populations.5 6 This resulted in the development of the HoNOS for Children and Adolescents (HoNOSCA),5 and the HoNOS for older adults (HoNOS 65+).6 Since then, the HoNOS family of measures has been extended to include the following: the HoNOS for adults confined to a secure facility (HoNOS-secure)7; the HoNOS for Acquired Brain Injury (HoNOS-ABI)8 and the HoNOS for People with Learning Disabilities (HoNOS-LD).9 Other variants of the HoNOS have been developed for administrative and research purposes, including the HoNOS for Payment by Results (HoNOS PbR)10 for casemix classification. All HoNOS measures apply the same scale scoring approach, but the number and content of the scales, and the subscale and total score structures, are tailored to the population or purpose (see online supplementary appendix 1 for an overview of the HoNOS measures).
Supplementary file 1
HoNOS measures are widely used in clinical practice and research contexts. The HoNOS, HoNOSCA and HoNOS 65+ are now the most widely used routine outcome measures in mental health services in England.4 11 In Australia, these three measures have been implemented in inpatient, residential and ambulatory settings within public sector mental health services12 and the HoNOS is used in private hospitals with psychiatric beds.13 In New Zealand, the HoNOS, HoNOSCA, HoNOS 65+, HoNOS-LD and HoNOS-secure have been mandated for routine collection in mental health services.14 Elsewhere, the routine implementation of HoNOS measures has occurred in local contexts, or is under active consideration. For example in Canada, the HoNOS, HoNOSCA and HoNOS 65+ are used in at least two provinces.15 In the Netherlands HoNOS, HoNOSCA and HoNOS 65+ have been used by various mental health services and are among instruments recommended by benchmarking systems for routine use.16 In Germany, HoNOS has been used in the sector of rehabilitative mental healthcare.17 In Norway, the national Norwegian Patient Register is preparing for use of HoNOS and HoNOSCA as possible routine outcome measures,18 and the HoNOSCA is used routinely within several child and adolescent mental health services in Denmark, Sweden and Norway. Various HoNOS measures have been translated into languages including Norwegian, Danish, Dutch, Spanish, Italian, Greek, German, Lithuanian, French and Thai. The HoNOS measures were designed as clinician-rated measures, although self- and proxy-completed versions of the HoNOS, HoNOSCA and HoNOS 65+ have also been developed.19–21
Implementing routine outcomes monitoring in mental health services involves a substantial commitment of resources in training and education, data management, and analysis and reporting.12 It is therefore important that the measurement properties (ie, their reliability, validity and responsiveness or sensitivity to change) and practical aspects (ie, their interpretability and feasibility/utility in practice) of selected measures are acceptable.22 23 Consumers, carers, clinicians, managers, policy-makers and researchers require up-to-date, synthesised evidence about the performance of the HoNOS measures to help them decide whether these measures are fit for use with the intended service settings and populations. Systematic reviews can inform such decision-making, as they provide an opportunity to compare findings from individual studies on a ‘level playing field’ and consider the reasons for agreements and disagreements. They may provide an opportunity to establish benchmarks for measurement property metrics, against which clinicians can compare the measures’ performance in their own environment. Researchers also require this information to support their choice of measures when reporting or designing new studies.24
A preparatory scoping search located a number of reviews25–32 that have sought to systematically identify and evaluate available information on the measurement properties of one or more members of the HoNOS family of measures. These reviews report that, for the most part, the HoNOS measures demonstrate acceptable performance on various measurement properties, and a number have identified HoNOS measures as suitable candidates for routine outcomes monitoring for certain populations and purposes when evaluated against a set of predetermined criteria.25–27 33 These reviews have also been useful in highlighting areas for potential improvement. For example, reviews have shown that the four HoNOS scales measuring aspects of social functioning (relationships, activities of daily living, living conditions, and occupation and activities) tend to perform less well than the other individual scales and other subscales and the total score.25 Reviews have highlighted areas of conflicting evidence—for example, with respect to the factor structure of the HoNOS,28 HoNOSCA28 and HoNOS 65+31 and aspects of their feasibility or utility.28 31 34 Reviews have also highlighted the absence of data, at particular points in time, for certain measurement properties—for example, the content validity of the HoNOSCA,28 test–retest reliability and construct validity of the HoNOS 65+,28 31 the test–retest reliability of the HoNOS-secure26 29 and the internal consistency, test–retest reliability, and criterion validity of the HoNOS-LD.32
There is now more than 20 years of accumulated knowledge about the performance of the HoNOS and its variants, but there is no single, up-to-date evaluation of a comprehensive range of measurement properties for all HoNOS measures. Given the widespread implementation and policy importance of the HoNOS measures, and their ongoing development, this would be an important reference point for all stakeholders. The most comprehensive review of the HoNOS, HoNOSCA and HoNOS 65+ was conducted by Pirkis and colleagues28 who examined evidence regarding their reliability, validity, responsiveness, interpretability and feasibility/utility from studies published up to 2005, regardless of treatment setting or population. A later review supplemented these results with studies published up to 2011, and expanded coverage to include the HoNOS-LD and HoNOS-secure, but did not consider evidence of interpretability and feasibility/utility for any measure.32 Other subsequent reviews have been undertaken pertaining to the HoNOS25 27 30 33 and the HoNOSCA,27 but have been more narrow in scope, focused on: studies from a single population subgroup27 31 33; a subset of HoNOS scales25; or on one or two measurement properties only.27 30 33 The HoNOS-secure has been the subject of two additional reviews, one considering literature up to 201129 and the other up to 2015.26 However, being a relatively new measure, the number of available studies was small.
The existing reviews of the measurement properties of the HoNOS family of measures are, collectively, also limited in a number of methodological aspects. First, existing reviews have often excluded studies published in languages other than English or been unclear about whether they have been excluded. Given the number of languages that HoNOS measures have been translated into, excluding papers published in languages other than English may significantly under-represent the body of available evidence. Second, none of the previous reviews have applied detailed structured checklists35 36 that standardise the appraisal of the quality of evidence on measurement properties. The benefits of structured checklists are that they provide guidance regarding which measurement properties are important and how to investigate them, and increase the likelihood that extracted results will be comparable across raters and studies.36 Third, all of the previous reviews have been descriptive; none have sought to apply meta-analytic techniques to pool information across studies on measurement property metrics. This has been done with other measures.24 37–39 Doing so may improve confidence in the reliability of findings, if they are replicated across multiple studies, and may increase the external validity of findings, if findings are maintained when samples of varying composition are pooled.40 41 Fourth, the existing reviews have not formally examined whether measurement properties differ between subgroups defined by patient or study characteristics; factors such as patient mix and rater profession/experience or training may be important sources of variation.
The current systematic review will be undertaken to address the aforementioned gaps. Its objectives are to:
determine the extent of available evidence regarding the measurement properties of clinician-rated measures in the HoNOS family, in relation to their reliability, validity, responsiveness (or sensitivity to change), interpretability and feasibility/utility (or acceptability);
examine the measurement properties of scales, subscales and totals for each measure in the HoNOS family, and whether performance varies across subgroups (eg, defined by treatment setting, clinical grouping, age group and rater profession/experience or training);
apply meta-analytic techniques to generate pooled estimates of relevant measurement property metrics (eg, intraclass correlation coefficient (ICC), kappa, Cronbach’s alpha and Pearson’s correlation coefficient) for each clinician-rated HoNOS measure’s scales, subscales and totals (by subgroup), where possible.
Methods and analysis
Design and registration
The review protocol was lodged with the PROSPERO (International Prospective Register of Systematic Reviews) on 22 February 2017 (CRD42017057871) and updated on 23 November 2017. It was developed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P) statement42 (see online supplementary appendix 2). The completed review will be reported according to the PRISMA guidelines.43 Protocol amendments will be documented on PROSPERO with the date of all amendments, and a description of the changes and the rationale.
Supplementary file 2
We will identify relevant peer-reviewed journal articles via a search of seven electronic databases from their inception: MEDLINE via EBSCOhost, PsycINFO via APA PsycNET, Embase via Elsevier, Cumulative Index to Nursing and Allied Health Literature via EBSCOhost, Web of Science via Thomson Reuters, Google Scholar and the Cochrane Library. Variants of the following terms will be searched as text words or keywords (as appropriate): ‘Health of the Nation Outcome Scales’ or ‘HoNOS’ (see online supplementary appendix 3 for the full search strategy). Searches were initially undertaken in January 2017 and were updated in February 2018. Exploratory searches of non-English language databases (China Academic Journals of China National Knowledge Infrastructure, Scientific Electronic Library Online via Web of Science, Russian Citation Index via Web of Science and Korean Citation Index—Korean Journal Database via Web of Science) were also undertaken, but these did not improve capture of eligible studies so were not incorporated into the search strategy. Reference lists of relevant articles will be scanned for additional studies. We will also contact the Royal College of Psychiatrists’ Research Unit in the UK for details of any additional publications they may be aware of.
Supplementary file 3
The search will focus on identifying original published studies in the form of articles published in peer-reviewed journals. It is possible that some relevant studies may be published in doctoral theses and government or other reports; however, these sources are more difficult to search for comprehensively. Exploratory searches for thesis and report material yielded only a small amount of material that was not also published in peer-reviewed journal articles, hence we excluded these sources from the search strategy. Database search results, including abstracts and full-text articles, as relevant, will be downloaded into the EndNote reference management software package (EndNote V.X5.0.1).
Selection of studies
A study will be included if all the following six criteria are fulfilled: (1) it is published in full in a peer-reviewed journal (ie, abstracts, letters and other short communications will be excluded unless they provide sufficient information to make the majority of required ratings); (2) its primary aim is to assess relevant measurement properties (within the domains of reliability, validity and responsiveness, as well as interpretability or feasibility/utility) for any of the HoNOS family of measures including: HoNOS, HoNOSCA, HoNOS 65+, HoNOS-secure, HoNOS-LD, HoNOS-ABI, HoNOS PbR or another HoNOS measure whose development has been described in the literature; (3) the HoNOS measure is clinician rated, regardless of mode of completion (eg, paper-and-pencil, directly entered into electronic medical record); (4) the clinical characteristics of participants are consistent with the HoNOS measures’ target population; (5) relevant results are not duplicated elsewhere (where the same data are reported in more than one source, the more comprehensive version will be included) and (6) a full text version can be obtained. We will restrict studies to those with the primary aim of developing or testing the measurement properties of a HoNOS measure, as evidence from studies designed for other purposes may be difficult to interpret.44 There will be no restrictions placed on study design, setting, context of study or language of publication. Studies using any language versions of HoNOS measures will be included.
Two reviewers will independently screen the titles and abstracts yielded by the search against the eligibility criteria. Full-text articles will be obtained when one of the two reviewers concludes that the abstract/titles indicate that the HoNOS was potentially used in the study or when abstracts are not available. Relevant excerpts from articles published in languages other than English will be translated into English. Two reviewers will independently read the full-text articles and decide whether they meet the eligibility criteria. Disagreements will be resolved via discussion or, where consensus cannot be reached, with recourse to a third review author.
Appraisal of the methodological quality of included studies
The primary outcomes of this review will be the measurement properties of the HoNOS measures. The scope of measurement properties considered in this review is based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy.36 In this taxonomy, measurement properties are clustered into domains—reliability (internal consistency, test–retest reliability, inter-rater reliability and measurement error), validity (content validity, construct validity and criterion validity) and responsiveness. We will include interpretability, which is not considered as a measurement property in COSMIN, but is an important characteristic of a measurement instrument. We will also include feasibility/utility which is important when assessing items for use in routine service delivery contexts. The definitions for each domain, measurement property and specific aspects to be assessed are shown in table 1.
The appraisal of the methodological quality of the evaluation of measurement properties in included studies will be undertaken using the COSMIN checklist.36 45 46 The checklist includes 12 boxes; each box corresponds to a measurement property or aspect and contains between 4 and 18 items; each item is scored on a four-point rating scale (excellent, good, fair, poor) guided by descriptive anchor points.47 The items focus on study attributes specific to the evaluation of measurement properties, including items that assess the adequacy of sample sizes, and whether appropriate statistical tests were performed. An overall quality score for a given measurement property is determined by taking the lowest rating assigned to any constituent item. For measurement properties where there is a particular item that is consistently under-reported, we may consider reporting the overall quality ratings with and without the inclusion of the item, informed by existing precedents (eg, Sitnikova et al 48). A quality score will not be derived for feasibility/utility because it is not included in COSMIN. For studies that used item response theory (IRT) methods, an additional set of items is provided for recording whether the requirements for IRT models have been met. These ratings are then taken into account when assigning an overall rating score on each measurement property assessed in that study.
Complementary information about the quality of reporting of reliability studies will be evaluated using the Guidelines for Reporting Reliability and Agreement Studies (GRRAS)49 GRRAS comprises 15 guidelines that should be followed when agreement and reliability are reported. These cover the reporting of title and abstract (one guideline), introduction (four guidelines), methods (five guidelines), results (three guidelines), discussion (one guideline) and auxiliary material (one guideline). GRRAS guidance is narrative only; the guidelines are not accompanied by standardised system for coding the information extracted from included studies.
Data will be extracted from each study into a series of prespecified templates. For each study, we will capture information about:
publication information (including authors, year);
questionnaire characteristics (including the HoNOS measure(s) under evaluation; version number if relevant; items used in the study or modifications made to the measure);
study characteristics (year(s) of study entry; country; language of measure(s); study design; setting; eligibility criteria; assessment occasion(s); sampling method; treatment/intervention type; experimental or routine conditions; and mode of HoNOS measure completion).
Some studies report multiple findings for a given measurement property, based on different subgroups (defined by patient characteristics, setting, assessment occasion,or other study characteristics) or using more than one metric. Therefore for each subgroup, we will capture information about:
subgroup characteristics (including clinical profile; mean age and SD or age range; percentage female; treatment setting; point in care; and rater profession/experience and training in rating HoNOS);
details of analysis (including number of patients involved; whether results pertain to scale, subscale or total score; metric reported and analyses undertaken);
results reported (values on an appropriate metric or a narrative statement of results, as appropriate).
The COSMIN checklist with four-point rating scale will be completed for each study. For studies reporting on reliability, additional templates will be developed to capture the information required to assess the study against the GRRAS. For example, to assess the quality of reporting against GRRAS guideline 13 ‘report estimates of reliability and agreement including statistical measures of uncertainty’, information to be extracted will include: the reliability or agreement metric used; whether or not uncertainty was reported; whether a sample size calculation was performed; whether results were pooled across subgroups and, if so, the pooling procedure and evidence of heterogeneity.
Data extraction will be undertaken independently by two reviewers; discrepancies that cannot be resolved via discussion will be referred to a third review author. Where necessary, we will contact authors for clarification of published data. The data extraction templates will be piloted on several studies and modified as needed before use.
Methodological appraisal of the measurement properties within the reliability, validity and responsiveness domains will be interpreted using the adequacy criteria proposed by the COSMIN group,50 modified as necessary drawing on precedents set by previous reviews24 51–53 For each measurement property examined in a given study, the reported findings will be compared with a statistical threshold for adequacy and rated on a three-point system indicating whether or not it meets the threshold (‘+’, positive rating; ‘?’, indeterminate rating; ‘−’, negative rating). By way of example, the original COSMIN adequacy criteria for inter-rater reliability are: ‘+’ ICC or weighted kappa ≥0.70; ‘?’ ICC or weighted kappa not reported; and ‘-−’ ICC or weighted kappa <0.70. For interpretability and feasibility/utility, information regarding relevant aspects will be evaluated against available thresholds or, where thresholds are not available, via a narrative synthesis24 50
We will start by conducting a narrative synthesis of the information captured in the templates, comparing and contrasting methodological parameters and results across studies, and over time. Following COSMIN recommendations,44 a best evidence synthesis will then be undertaken at the measurement property level. First, an overall rating (positive, indeterminate, negative) of the adequacy of each measurement property will be made. Second, a level of evidence (strong, moderate, limited, conflicting, unknown) of the methodological quality of the included studies for each measurement property will be assigned. The ratings will be guided by the original criteria developed by Terwee and colleagues47 and any relevant modifications.24 Qualitative findings regarding interpretability and feasibility/utility will be synthesised narratively. For reliability studies, quality of reporting information will be synthesised narratively, guided by the GRRAS.49
In undertaking these syntheses, we will initially consider information from different language versions of the same measure separately, as it cannot be assumed that different language versions will have the same measurement properties.53
In addition, where the volume of available data is adequate, meta-analyses will be undertaken using Comprehensive Meta-Analysis software V.3 to generate pooled estimates of relevant measurement property metrics (eg, ICC, kappa, Cronbach’s alpha and Pearson’s correlation coefficient) for individual scales, subscales and total scores. Syntheses will involve combining data at the level reported in the paper (eg, individual scale, subscale, 10-item total or 12-item total), not across levels. It is not our intention to synthesise data across variants of the HoNOS (eg, HoNOSCA and HoNOS-LD) or across versions of a given HoNOS measure (eg, if modifications have been made).
We will start by including all relevant values on the properties of interest in a meta-analysis and then investigate whether heterogeneity is lower for subgroups that would be expected to be more homogenous by design. We will test for statistical heterogeneity using the I2 statistic and, where present, will use the random effects model. If indicated, outcomes for subgroups of interest (eg, treatment setting, rater profession/experience or training) will be explored using meta-regression or subgroup analysis. Where there are more than 10 studies for a particular outcome we will test for publication bias using funnel plots. To examine the impact of study quality, we will conduct sensitivity analyses by excluding studies of ‘poor’ quality according to the COSMIN checklist. Where pooling of data across studies is not feasible, we will undertake narrative syntheses to explore differences in findings across subgroups, and according to study quality.
Ethics and dissemination
No primary data will be collected, therefore ethics approval is not required. We intend to summarise the outcomes of this study in one or more reports and/or presentations to the funder of this review. It is also planned that outcomes will be reported in peer-reviewed journal publications and in presentations at academic conferences and other meetings.
Contributors MGH and PMB are the guarantors and they conceived the study. MGH, PMB and JP designed the study, and CS, RS, TC, TR, SK, KH-B and JS provided critical inputs. The search strategy was developed by MGH, PMB, JP, CS, RS and SK. The selection criteria were developed by MGH, PMB, JP and CS. The analysis strategy was developed by MGH, PMB, CS, SK, KH-B and JS. Preliminary searches were conducted by RS and CS. Data extraction templates were developed by MGH and CS. MGH and CS drafted the protocol manuscript, and RS, TC, JP, TR, SK, KH-B, JS and PMB provided critical revisions. The final draft of the protocol manuscript was approved by all authors.
Funding This work is being led by the Australian Mental Health Outcomes and Classification Network which is funded by the Australian Government Department of Health.
Disclaimer The funding source had no involvement in the study design; collection, analysis and interpretation of data; the writing of the protocol; or in the decision to submit the protocol for publication.
Competing interests MGH, CS, RS, TC, JP, SK and PMB report that they or their institution received funding from the Australian Government Department of Health to conduct this work. TR, KH-B and JS have nothing to disclose.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.