Article Text

Original research
Clinical outcome measures and their evidence base in degenerative cervical myelopathy: a systematic review to inform a core measurement set (AO Spine RECODE-DCM)
  1. Alvaro Yanez Touzet1,
  2. Aniqah Bhatti2,
  3. Esmee Dohle2,
  4. Faheem Bhatti2,
  5. Keng Siang Lee3,
  6. Julio C Furlan4,5,6,
  7. Michael G Fehlings7,
  8. James S Harrop8,
  9. Carl Moritz Zipser9,
  10. Ricardo Rodrigues-Pinto10,11,
  11. James Milligan12,
  12. Ellen Sarewitz13,
  13. Armin Curt9,
  14. Vafa Rahimi-Movaghar14,
  15. Bizhan Aarabi15,
  16. Timothy F Boerger16,
  17. Lindsay Tetreault17,
  18. Robert Chen17,18,
  19. James D Guest19,
  20. Sukhvinder Kalsi-Ryan6,
  21. Angus GK McNair20,21,
  22. Mark Kotter22,23,
  23. Benjamin Davies24
  24. On behalf of the AO Spine RECODE-DCM Steering Committee
  1. 1School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, Manchester, UK
  2. 2School of Clinical Medicine, University of Cambridge, Cambridge, UK
  3. 3Bristol Medical School, Faculty of Health Sciences, University of Bristol, Bristol, UK
  4. 4Department of Medicine, Division of Physical Medicine and Rehabilita, University of Toronto, Toronto, Ontario, Canada
  5. 5Division of Physical Medicine and Rehabilitation, Toronto Rehabilitation Institute, University Health Network, Toronto, Ontario, Canada
  6. 6KITE Research Institute, University Health Network, Toronto, Ontario, Canada
  7. 7Division of Neurosurgery and Spinal Program, Toronto Western Hospital, University of Toronto, Toronto, Ontario, Canada
  8. 8Thomas Jefferson University, Jefferson Health System, St Louis, Philadelphia, USA
  9. 9Spinal Cord Injury Center, Balgrist University Hospital, Zurich, Switzerland
  10. 10Spinal Unit (UVM), Department of Orthopaedics, Centro Hospitalar Universitário do Porto, Porto, Portugal
  11. 11Instituto de Ciências Biomédicas Abel Salazar, Porto, Portugal
  12. 12Department of Family Medicine, McMaster University, Hamilton, Ontario, Canada
  13. 13Myelopathy.org, Cambridge, UK
  14. 14Academic Department of Neurological Surgery, Sina Trauma and Surgery Research Center, Tehran University of Medical Sciences, Tehran, Tehran, Iran
  15. 15Division of Neurosurgery, University of Maryland School of Medicine, Baltimore, Maryland, USA
  16. 16Department of Neurosurgery, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
  17. 17Toronto Western Hospital, University of Toronto, Toronto, Ontario, Canada
  18. 18Krembil Research Institute, Toronto, Ontario, Canada
  19. 19Department of Neurological Surgery and The Miami Project to Cure Paralysis, University of Miami Miller School of Medicine, Miami, Florida, USA
  20. 20Centre for Surgical Research, Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, Avon, UK
  21. 21GI Surgery, North Bristol NHS Trust, Bristol, UK
  22. 22Department of Clinical Neurosurgery, University of Cambridge, Cambridge, UK
  23. 23Department of Clinical Neurosciences, Ann McLaren Laboratory of Regenerative Medicine, Cambridge, UK
  24. 24Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
  1. Correspondence to Dr Benjamin Davies; bd375{at}cam.ac.uk

Abstract

Objectives To evaluate the measurement properties of outcome measures currently used in the assessment of degenerative cervical myelopathy (DCM) for clinical research.

Design Systematic review

Data sources MEDLINE and EMBASE were searched through 4 August 2020.

Eligibility criteria Primary clinical research published in English and whose primary purpose was to evaluate the measurement properties or clinically important differences of instruments used in DCM.

Data extraction and synthesis Psychometric properties and clinically important differences were both extracted from each study, assessed for risk of bias and presented in accordance with the Consensus-based Standards for the selection of health Measurement Instruments criteria.

Results Twenty-nine outcome instruments were identified from 52 studies published between 1999 and 2020. They measured neuromuscular function (16 instruments), life impact (five instruments), pain (five instruments) and radiological scoring (five instruments). No instrument had evaluations for all 10 measurement properties and <50% had assessments for all three domains (ie, reliability, validity and responsiveness). There was a paucity of high-quality evidence. Notably, there were no studies that reported on structural validity and no high-quality evidence that discussed content validity. In this context, we identified nine instruments that are interpretable by clinicians: the arm and neck pain scores; the 12-item and 36-item short form health surveys; the Japanese Orthopaedic Association (JOA) score, modified JOA and JOA Cervical Myelopathy Evaluation Questionnaire; the neck disability index; and the visual analogue scale for pain. These include six scores with barriers to application and one score with insufficient criterion and construct validity.

Conclusions This review aggregates studies evaluating outcome measures used to assess patients with DCM. Overall, there is a need for a set of agreed tools to measure outcomes in DCM. These findings will be used to inform the development of a core measurement set as part of AO Spine RECODE-DCM.

  • degenerative cervical myelopathy
  • cervical spondylotic myelopathy
  • spinal cord compression
  • outcome measures
  • core measurement set

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Consensus-based reporting guidelines were used to evaluate the properties and clinically important differences of degenerative cervical myelopathy measurement instruments.

  • Only instruments that are currently in use were evaluated in this study.

  • Interpretability was used as an important characteristic to make recommendations, a posteriori, due to the absence of category A Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) recommendations.

  • Interpretability and feasibility were evaluated using bespoke criteria adapted, a priori, from the COSMIN methodology.

INTRODUCTION

The most common adult spinal cord disease, degenerative cervical myelopathy (DCM), is both measured and reported inconsistently across clinical research.1–4 DCM is a progressive spinal cord disease caused by degenerative changes in the cervical spine that lead to stress and injury to the cervical spinal cord. It usually initially presents as a loss of digital dexterity, subtle gait disturbances and mild pain which, if left untreated, can potentially lead to tetraplegia and wheelchair dependence.5

In 2019, AO Spine launched the Research Objectives and Common Data Elements for Degenerative Cervical Myelopathy (AO Spine RECODE-DCM; www.aospine.org/recode) initiative with the aim of creating a 'research toolkit’ to help accelerate knowledge discovery and improve outcomes in DCM.3 6 The initiative identified the need to improve consistency in measurement and reporting across DCM research to enable studies to be compared and/or aggregated, and to ensure the most meaningful aspects of the disease are captured.7 8 This process started by creating a list of essential outcomes (ie, core outcome set) and baseline characteristics (ie, core data elements). To truly enable consistent reporting, however, these datasets should be partnered with a core measurement set (CMS): a set of agreed tools that are used to measure the outcomes and data elements of DCM.9–17

Several approaches have been employed to form a CMS, ranging from the development of novel measurement instruments to adopting the use of existing ones.18–20 For AO Spine RECODE-DCM, it was decided to recommend existing instruments and, preferably, those already used in DCM. This was to allow a more rapid introduction of the CMS, cognisant that many new tools are in development and the CMS can be updated in the future.

Consequently, we sought to examine the tools used in DCM research and assess their quality21 using objective criteria. In recognition of variable quality among reported outcome measures, the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) initiative has developed clinimetric tools to assess instrument quality.22 We searched the literature for studies evaluating one or more psychometric properties defined by the COSMIN guidelines, as well as studies that defined clinically important differences such as the minimally clinical important difference (MCID) and substantial clinical benefits (SCBs). Data were rated, aggregated and assessed for methodology bias using the COSMIN manual for systematic reviews of patient-reported outcome measures (PROMs).23–25 This work builds on the protocol for the AO Spine RECODE-DCM initiative3 6 ,26 and complements two earlier reviews of outcome measures in DCM.2 21

Methods

Search

A search string was developed to identify original research assessing the psychometric properties of instruments currently used in the clinical research of DCM.27 This comprised synonyms of ‘psychometric’ and ‘DCM’ (online supplemental table 1). The search was developed with oversight of a medical librarian (IK) and informed by previously developed search filters for DCM.27–29 The search was applied to MEDLINE and EMBASE, from inception until 4 August 2020, using OVID (Wolters Kluwer, Netherlands). The search also focused on DCM tools identified in previous scoping reviews.2 21 30

Study selection

All titles and abstracts were screened independently against a set of predefined eligibility criteria by four reviewers (AYT, AB, ED and FB). A full list of inclusion and exclusion criteria of studies are stated in table 1.

Table 1

Inclusion and exclusion criteria

Potentially eligible studies were selected for full-text analysis. In the event of multiple publications analysing the same cohort for the same purpose, the most recent paper was used for evaluation. At each stage, two reviewers independently (AYT, AB, ED, FB) reviewed all the screened studies for inclusion to ensure reliability of study selection (online supplemental table 2). Disagreements were resolved by consensus or appeal to a third senior reviewer (BMD).

Quality assessment

The quality of included studies was assessed using the COSMIN risk of bias checklist.23–25 Briefly, the COSMIN risk of bias tool assesses 10 measurement properties, including nine psychometric properties (ie, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity and responsiveness) and clinically important differences. A list of definitions is presented in table 2. Interpretability and feasibility were also evaluated using criteria adapted a priori from the COSMIN methodology (online supplemental tables 3 and 4), respectively). Namely, interpretability was evaluated for each measurement instrument through the availability of anchor-based MCIDs,23–25 while feasibility was assessed with respect to the ease of application of the instrument.

Table 2

Definitions of domains, measurement properties and aspects of measurement properties, adapted from the COSMIN guidelines23–25 48 and studies of clinically important differences49 50

The methodological quality of each study was scored as ‘very good’, ‘adequate’, ‘doubtful’, ‘inadequate’ or ‘not applicable’. Overall ratings were then made for each property using the modified Grading of Recommendations Assessment, Development, and Evaluation approach from the COSMIN risk of bias checklist.23–25 For each study, one review author (AYT) assessed the quality, feasibility and interpretability from included studies and a second (BD) checked the assessments. Disagreements were resolved by consensus.

Data extraction

A proforma adapted from COSMIN was employed by one reviewer (AYT) to extract the following: study details, sample size, patient demographics, measurement properties and qualitative and/or quantitative results for each property. This was checked by a second reviewer (BD) and any disagreements were resolved by consensus. Examples of qualitative and quantitative results included observations (eg, narrative syntheses) and statistics (eg, correlation coefficients). These result types are specific for each measurement property and are listed in the COSMIN guidelines.23–25

Data analysis

Each result was rated as ‘sufficient’, ‘indeterminate’ or ‘insufficient’. All results were qualitatively summarised and given an overall rating as ‘sufficient’, ‘indeterminate’, ‘inconsistent’ or ‘insufficient’. The definitions of these ratings are available in the COSMIN guidelines.23–25 Measurement instruments were categorised into three recommendation groups:

  1. Instruments with evidence of sufficient content validity and at least low-quality evidence of sufficient internal consistency.

  2. Instruments categorised not in 1 or 3.

  3. Instruments with high-quality evidence of an insufficient measurement property.23–25

Recommendations for each instrument were presented in tandem with interpretability and feasibility assessments and reported as a narrative synthesis.31 We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist when writing our report.32

Patient and public involvement

This project forms part of a larger, international multi-stakeholder co-production initiative called AO Spine RECODE-DCM, which aims to develop a framework to accelerate knowledge discovery that can improve outcomes in DCM. Patients and the public were therefore involved in its overall design, conduct, management, and dissemination, and are recognised among the authors of this article. For further information, please refer to www.aospine.org/recode.

Results

Literature search

The primary literature search identified a total of 3239 unduplicated studies (MEDLINE: 2389, EMBASE: 1550). Abstract and full-text screening excluded 3187 studies. Therefore, this review included a total of 52 studies (figure 1 and online supplemental table 2).

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart. A systematic review of Medline and EMBASE was conducted through 4 August 2020 to identify original research on the measurement properties of instruments currently used in degenerative cervical myelopathy research.

Study properties

The 52 included studies assessed a total of 7395 patients worldwide (female: 3217, male: 4178) with 29 instruments (table 3). These were classified into four domains based on the DCM core outcome set33: neuromuscular function, life impact, pain, and radiological scoring.

Table 3

Study properties

Table 4

Summary of findings

Measurement properties

The measurement properties of the 29 instruments were evaluated using the COSMIN methodology for systematic reviews.23–25 A summary of findings is presented in table 41: the overall feasibility rating,2 the overall interpretability rating and3 the overall recommendation category based on existing evidence. Included studies reported on at least one of the 10 COSMIN properties for all instruments. No instrument had evidence for all 10 properties and <50% (13/29) of instruments had evidence for at least one property per measurement domain (figure 2).

Figure 2

Number of studies for each outcome measure and property (normalised). Included studies reported on at least one of the 10 COSMIN properties for all instruments. No instrument had evidence for all 10 properties and <50% (13/29) of instruments had evidence for at least one property per measurement domain (see table 2 for definitions). Notably, no instruments were evaluated for structural validity, attained sufficient evidence for content validity or obtained a category A recommendation based on COSMIN criteria. 30MWT, 30‐m Walking Test; BBS, Berg Balance Scale; COSMIN, Consensus-based Standards for the selection of health Measurement Instruments; EQ-5D, EuroQol-5 Dimension; JOA, Japanese Orthopaedic Association; JOACMEQ, Japanese Orthopaedic Association Cervical Myelopathy Evaluation Questionnaire; MDI, Myelopathy Disability Index; mJOA, modified Japanese Orthopaedic Association; NDI, Neck Disability Index; P-mJOA, patient-derived version of the mJOA; SF-12, 12-Item Short Form Health Survey; SF-36, 36-Item Short Form Health Survey; VAS, Visual Analogue Scale; WHOQOL-Bref, World Health Organisation Quality of Life

Content validity

Only three measurement instruments were evaluated for content validity: the JOA Cervical Myelopathy Evaluation Questionnaire (JOACMEQ), the modified JOA (mJOA) score and the Berg Balance Scale (BBS) (online supplemental table 5). The overall ratings for content validity, however, were indeterminate due to the uncertainty of the methods used to assess comprehensibility, and the very low quality of the evidence.

Structural validity

No instruments were assessed for structural validity.

Internal consistency

Ten measurement instruments were evaluated for internal consistency, including the JOACMEQ, JOA, mJOA, 12-Item Short Form Health Survey (SF-12) and SF-36 (online supplemental table 6). Since structural validity is required for the interpretation of internal consistency, the overall ratings for internal consistency were indeterminate, given the aforementioned absence of studies on structural validity.

Cross-Cultural validity

Only three measurement instruments were evaluated for cross-cultural validity: JOACMEQ, JOA and mJOA (online supplemental table 7). The overall ratings were indeterminate due to the absence of multiple group factors analyses and differential item functioning analyses. The quality of evidence was also very low due to the uncertainty of the approaches used to analyse the data.

Reliability

Seventeen measurement instruments were evaluated for reliability, including JOACMEQ, JOA and mJOA (online supplemental table 8). The reported measures of reliability were test–retest reliability, intraobserver reliability and interobserver reliability. No instrument attained high-quality evidence for sufficient or insufficient reliability due to (1) imprecision (sample sizes <100), (2) serious inconsistency and/or (c) serious risk of bias.

Measurement error

Nine instruments were evaluated for measurement error, including JOACMEQ, JOA, mJOA, NDI, SF-36 and Visual Analogue Scale (VAS) for pain (online supplemental table 9)). The measures of error reported were minimal detectable change and distribution-based MCID.23–25 34 The mJOA was the only score to attain high-quality evidence for sufficiency (distribution-based MCID range: 1.2–1.4, total sample size: 868). Due to the inconsistency of results, the quality of the evidence of most other instruments could not to be rated.

Criterion validity

Twelve measurement instruments were evaluated for criterion validity, including the JOACMEQ, JOA, mJOA, NDI and SF-36 (online supplemental table 10). Both the mJOA and the patient-derived version of the mJOA (P-mJOA) attained high-quality evidence for sufficient criterion validity as whole scales. However, three of four items of the mJOA, along with the 10 s step test and foot tapping test, attained high-quality evidence for insufficient criterion validity (ie, these subdomains lack criterion validity for their use as separate measures). The quality of the evidence of most of the remaining instruments was not high due to (1) imprecision (ie, sample sizes <100) or (b) important methodological flaws in the design or statistical methods.

Construct validity

Sixteen measurement instruments were evaluated for construct validity, including JOACMEQ, JOA, mJOA, NDI, arm and neck pain scores and SF-12 (online supplemental table 11). From these, 8 of 16 attained high-quality evidence for sufficient construct validity; these included the NDI, arm and neck pain scores and SF-12. Two instruments achieved high-quality evidence for insufficient construct validity. Notably, the mJOA had both high-quality sufficiency and insufficiency depending on the comparator tool (eg, sufficiency with respect to the NDI and SF-36 and insufficiency with respect to the 30 m walking test (30MWT) and EuroQol-5 Dimension (EQ-5D)). While the designs and statistical methods applied were adequate for the research questions posed, the quality of the evidence of most of the remaining tools ranged from ‘low’ to ‘moderate’ due to imprecision (ie, sample sizes <100). Importantly, only one study formulated a hypothesis a priori.35

Responsiveness

Sixteen measurement instruments were evaluated for responsiveness, including the JOACMEQ, JOA, mJOA, NDI, SF-12 and SF-36 (online supplemental table 12). The mJOA was the only score to attain high-quality evidence for sufficient responsiveness (effect size range: 0.87–1.0, total sample size: 352). The 30MWT, on the other hand, was the only score to attain high-quality evidence for insufficient responsiveness (standardised response mean: 0.3, total sample size: 484). The quality of the evidence of most of the remaining tools ranged from ‘very low’ to ‘moderate’ due to (1) imprecision (ie, sample sizes <100) and (b) uncertainty of the statistical methods.

Clinically important differences

Ten measurement instruments were evaluated for clinically important differences, including the JOACMEQ, JOA, mJOA, NDI, arm and neck pain scores, SF-12, SF-36 and VAS for pain (online supplemental table 3). From these, 7 of 10 attained a sufficient rating, including the JOACMEQ, JOA, mJOA, NDI and SF-36. Only anchor-based measures were accepted for the assessment of the MCID.23–25 36–39

Interpretability and feasibility

Interpretability and feasibility were described using criteria adapted from the COSMIN methodology (online supplemental tables 3 and 4, respectively). Interpretability was summarised in terms of the degree to which clinicians may assign qualitative meaning to the scores or change in scores (ie, the clinically important differences), while feasibility was described in terms of the ease of application of the measurement instrument. No or minimal application barriers were identified for most outcome measures (table 4). Nine instruments were, however, deemed uninterpretable due to the absence of anchor-based MCIDs.23–25

Recommendations

No category A recommendations were made as no measurement instrument had sufficient evidence for content validity (table 4 and figure 2). Furthermore, five instruments were recommended for category C due to the availability of high-quality evidence for insufficient criterion validity, construct validity and/or responsiveness. Most instruments were classed into category B due to the notable absence of high-quality evidence for most measurement properties.

In light of these results, and given both (1) the very strict quality standards of the COSMIN framework and (2) that the absence of category A evidence is not the same as presence of poor-quality evidence, we propose that instruments most suitable for use should be interpretable by clinicians and offer qualitative meaning to either clinicians or people with lived experience of DCM (ie, they should have an available assessment of clinically important differences). To this end, the measurement properties of the nine interpretable instruments are presented in table 5: the arm and neck pain scores; SF-12 and SF-36; JOA, mJOA and JOACMEQ; NDI; and VAS for pain. These include one score with insufficient criterion and construct validity (ie, mJOA) and six scores with barriers to application.

Table 5

Interpretable measurement instruments

Discussion

DCM is measured and reported inconsistently across clinical trials.1–4 In light of these inconsistencies, AO Spine launched RECODE-DCM (www.aospine.org/recode) with the aim of creating a 'research toolkit’ that helps to accelerate knowledge discovery and improve outcomes in DCM. One of the objectives of the RECODE-DCM initiative was to develop a CMS.3 6 ,26 This systematic review consists of an initial step towards building this CMS by identifying tools that have been used in DCM research and examining their quality, in accordance with the COSMIN standards.23–25

Overall, we identified 29 instruments with at least 1 in 10 measurement properties evaluated (figure 2); none, however, had evaluations for all 10 properties and <50% had more than one property evaluated per measurement domain (ie, reliability, validity and responsiveness) (table 2). We also noted a paucity in the quantity and quality of studies evaluating DCM instruments; this is visible by the absence of category A recommendations and the classification of most tools in category B (table 4). Acknowledging both the stringency of the COSMIN standards and that absence of category A evidence is not equivalent to presence of poor-quality evidence, we proposed nine instruments that seem interpretable to clinicians and appear to offer qualitative meaning to clinicians and people with lived experience of DCM. These instruments are the SF-12 and SF-36; JOA, mJOA, and JOACMEQ; NDI; and VAS for pain (table 5).

The fact that most outcomes received B-category recommendations due to absence of high-quality evidence is not unexpected. In this review, the most common reasons for low-quality evidence, as per the COSMIN guidelines, were (1) important methodological flaws in study design or statistical methods, (2) uncertainty of approaches used to analyse the data and (3) imprecision due to sample size below the recommended power and significance levels. The rigour (or stringency) of the COSMIN standards may have accentuated these limitations due to the highly specific nature of some standards and the expectation of psychometric expertise within the DCM context. For example, results for internal consistency must be rated ‘indeterminate’ if there is not at least low-quality evidence for structural validity. No such studies were available in this review, possibly because this is a more recent and complex criterion, or because of the search or selection criteria. Similarly, studies on content validity cannot score higher than ‘inadequate’ if there are no recordings/verbatim transcriptions of patient focus groups or interviews. Likewise, analyses of reliability cannot score higher than ‘doubtful’ if statistics other than the Pearson or Spearman correlation coefficients are used. These thresholds of acceptability may account for some of the lacking information and are an important entry challenge for instruments into DCM research—a field where the routine involvement of stakeholders with lived experience is at an early stage,3 8 inconsistent study reporting is prevalent,2 4 few studies have involved >100 patients, and where there is a bias in the availability of measurement literature (ie, some tools, such as the SF-12, are used because they are the only tools available and, therefore, have available literature due to their routine use). From the application of these COSMIN criteria in other research fields, however, it appears that these methodological deficiencies are not exclusive to DCM instruments, including those in current use.40–42 The lack of high-quality assessments, thus, should not necessarily imply that (1) the identified outcome measures are generally inadequate, or (b) that the COSMIN standards are not fit for the DCM context.

Measurement rigour is universally important and, in DCM, particularly relevant as the development of new instruments is a top 10 research priority. This rank reinforces the decision of the steering committee to make the initial CMS recommendations based existing on tools, rather than on tools under development.26 This decision was taken recognising that the success of a CMS requires widespread adoption, and that the adoption of clinical recommendations can be challenging without stakeholder awareness, familiarity and/or confidence.43–46 We hypothesised that asking the global field to align with new innovations would be more challenging, and premature, at this stage. Thus, for this first iteration of the DCM CMS, there is a focus on current instruments in academic usage. While, currently, few have met the bar set by the COSMIN methodology, there are nine reasonable candidates using our post-hoc thresholds (table 5). Ultimately, the CMS process will need to lean significantly on the expertise of those involved in the consensus phase in order to make final recommendations that are methodologically rigorous and representative of those with lived experience.

Despite its conscientious design, this systematic review has limitations. In searching for existing instruments, we have neither identified nor assessed tools under development, or those currently being translated into clinical or research settings or published in languages other than English. To the extent that DCM instruments are currently in use, however, this review only identified tools in four of the six core domains from RECODE-DCM’s minimum dataset,33 and did not consider the construct of the disease as a factor in evaluating the outcomes. For those missing outcomes, focused scoping reviews (informed by a gap analysis that will be published separately) will be conducted in the future. Next, clinician-reported outcome measures and performance-based outcome measures were analysed with the exact same methods as PROMs. While COSMIN explicitly allows this,23–25 methods may be differentially adapted to tailor to these distinct instrument types; we chose not to do so out of prudence and consistency, and results across these instrument groups should be interpreted accordingly. Feasibility and interpretability were also evaluated using bespoke criteria which, despite being adapted from the COSMIN methodology, may not weigh all criteria accurately. Importantly, our decision to shortlist the clinically interpretable instruments was made a posteriori due to the unexpected absence of category A recommendations. This decision was informed by our judgement that instruments in a CMS should be interpretable by clinicians and offer qualitative meaning to clinicians and people with lived experience. While the COSMIN taxonomy does indeed class interpretability as an important and stand-alone characteristic,23–25 the aforementioned shortlist may inevitably represent a placement bias. Notably, some nuances of different versions of measurement instruments (eg, mJOA) were not extensively evaluated.47 Lastly, and as is frequently the case in this body of reviews,40–42 none of the authors is specifically trained in measurement theory and, therefore, this work represents our best attempt to implement the guidelines and standards set forward by the COSMIN methodology in the context of DCM.

Conclusions

Currently, none of the measurement instruments used in DCM holds sufficient evidence to meet the COSMIN criteria for a strong recommendation for use. However, there are leading contenders that appear to offer qualitative meaning to clinicians and people with lived experience of DCM; namely, the SF-12 and SF-36; JOA, mJOA, and JOACMEQ; NDI; and VAS for pain. The findings of this review will inform a consensus process to form a CMS for DCM. As the development of new assessments for DCM is an active research priority, greater awareness of the COSMIN framework is pertinent to DCM researchers.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication

Ethics approval

This study does not involve human participants.

Acknowledgments

We thank Isla Khun for her assistance with the systematic search and all patients who advised, and continue to advise, the AO Spine RECODE-DCM initiative.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @AYanezTouzet, @angusgkmcnair

  • Contributors BD was responsible for conceiving the article and is the guarantor. KSL conducted the search and AYT, AB, ED and FB conducted the screening. AYT and BD extracted and analysed the data and wrote the manuscript. AYT, JCF, MGF, JSH, CMZ, RR-P, JM, ES, AC, VR-M, BA, TFB, LT, RC, JDG, SK-R, AGKM, MK and BD provided critical appraisal of the manuscript. All authors critically revised and approved the manuscript.

  • Funding This work was supported by AO Spine through the AO Spine Knowledge Forum Spinal Cord Injury, a focused group of international Spinal Cord Injury experts. AO Spine is a clinical division of the AO Foundation, which is an independent medically guided not-for-profit organisation. Study support was provided directly through the AO Spine Research Department. An award/grant number is not applicable.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.