Article Text

Download PDFPDF

Predictive models for short-term and long-term improvement in women under physiotherapy for chronic disabling neck pain: a longitudinal cohort study
  1. Tony Bohman1,
  2. Matteo Bottai2,
  3. Martin Björklund3,4
  1. 1 Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
  2. 2 Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
  3. 3 Department of Community Medicine and Rehabilitation, Umeå University, Umeå, Sweden
  4. 4 Department of Occupational Health Sciences and Psychology, University of Gävle, Gävle, Sweden
  1. Correspondence to Tony Bohman; tony.bohman{at}


Objectives To develop predictive models for short-term and long-term clinically important improvement in women with non-specific chronic disabling neck pain during the clinical course of physiotherapy.

Design Longitudinal cohort study based on data from a randomised controlled trial evaluating short-term and long-term effects on sensorimotor function over 11 weeks of physiotherapy.

Participants and settings Eighty-nine women aged 31–65 years with non-specific chronic disabling neck pain from Gävle, Sweden.

Measures The outcome, clinically important improvement, was measured with the Patient Global Impression of Change Scale (PGICS) and the Neck Disability Index (NDI), assessed by self-administered questionnaires at 3, 9 and 15 months from the start of the interventions (baseline). Twelve baseline prognostic factors were considered in the analyses. The predictive models were built using random-effects logistic regression. The predictive ability of the models was measured by the area under the receiver operating characteristic curve (AUC). Internal validity was assessed with cross-validation using the bootstrap resampling technique.

Results Factors included in the final PGICS model were neck disability and age, and in the NDI model, neck disability, depression and catastrophising. In both models, the odds for short-term and long-term improvement increased with higher baseline neck disability, while the odds decreased with increasing age (PGICS model), and with increasing level of depression (NDI model). In the NDI model, higher baseline levels of catastrophising indicated increased odds for short-term improvement and decreased odds for long-term improvement. Both models showed acceptable predictive validity with an AUC of 0.64 (95% CI 0.55 to 0.73) and 0.67 (95% CI 0.59 to 0.75), respectively.

Conclusion Age, neck disability and psychological factors seem to be important predictors of improvement, and may inform clinical decisions about physiotherapy in women with chronic neck pain. Before using the developed predictive models in clinical practice, however, they should be validated in other populations and tested in clinical settings.

  • prediction
  • prognosis
  • non-specific neck pain
  • neck pain
  • longitudinal analyses
  • cohort
  • clinical important improvement
  • discrimination

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • Strengths of this study are a well-defined sample and thorough development of predictive models.

  • The longitudinal design with a short-term and long-term follow-up time, and the inclusion of biopsychosocial prognostic factors in the analyses further strengthens the study.

  • A limitation of this study is the relatively small sample.


Neck pain is a common health problem and a cause of substantial disability which has a considerable social and economic impact throughout the world.1 Hogg-Johnson et al estimated the 12-month prevalence of neck pain to range between 30% and 50%, with a higher prevalence among women.1 The prevalence of neck pain has increased during the last decades, and in 2015 neck and low back pain were the leading causes of disability.2 3 Neck pain often is unrelated to any other specific pathology.4 Although single episodes of neck pain generally dissolve over time, they are likely recurrent and may become chronic.5 A review from Carroll et al concluded that 50%–85% of persons with neck pain do not experience complete resolution of their pain.5

Prognostic research can help understand the course and future outcome in individuals with neck pain.6 Guzman et al summarised the work of the 2000–2010 Neck Pain Task Force concluding that younger age, no previous pain, good physiological and psychological health, good coping, good social support, exercise and sports, and no prior sick leave may increase the chance of recovery from neck pain.7 Walton et al surveyed prognostic factors for prolonged recovery from neck pain in 13 systematic reviews.8 They found evidence only for history of musculoskeletal disorders and older age to prolong recovery in non-whiplash related neck pain. For the remaining factors evaluated, there was insufficient evidence for the influence on neck pain, and more research was recommended.

Neck-pain patients commonly seek physical therapy, but predicting treatment outcome and prognosis for these patients is challenging.9 Predictive models based on multiple prognostic factors could guide healthcare providers, such as physiotherapists, in determining which patients are more likely to improve and to help all patients develop informed expectations.10 11 It is generally recommended to build predictive models that are applicable to a well-defined patient population and the healthcare system.10 Women have a higher risk of chronic neck pain than men, and it has been suggested that women should be assessed separately from men in prognostic research.12 13 Using the search strategy for predictive models in physiotherapy and musculoskeletal complaints suggested by van Oort et al, we found only one study assessing chronic non-specific neck pain, and none with a model specific for women.14 15

The individual perception of pain and disability vary during the course of an episode, and the effect of prognostic factors is therefore expected to change over time following physiotherapy.15 16 The analysis of longitudinal data is effective in evaluating these potential changes over time, as it can separate the change between individuals from that within any given patient. Longitudinal analyses have been recommended in prospective cohort studies on musculoskeletal problems.17 18 The aim of this study was to develop predictive models for short-term and long-term clinically important improvement in women with chronic disabling neck pain in the clinical course of physiotherapy.


Design and study population

This longitudinal cohort study sought to develop predictive models, including multiple prognostic factors jointly, for chronic neck pain in women. Predictive models are one of the four themes of the Prognosis Research Strategy Framework for Prognostic Studies.6 11 Carroll et al classify predictive models as ‘Phase II’ studies in a 3-level hierarchy of evidence from longitudinal studies, and suggests them suitable for predicting recovery from neck pain.5

We used data from a randomised controlled trial (RCT) evaluating the short-term and long-term effects, following 11 weeks of physiotherapy interventions (coordination exercise, strength training and massage), on sensorimotor function in 108 women with non-specific chronic disabling neck pain.19 The trial was carried out in Gävle, Sweden, in 2008.

The RCT included Swedish-speaking women aged 25–65 years with non-specific disabling neck pain lasting for 3 months or more (chronic). The women were recruited via the social insurance agency and with advertisement in local papers and invitations at municipality and county council work sites, primary and occupational healthcare units. Non-specific chronic disabling neck pain was defined as neck pain reported by the patient as the ‘most painful area’ along with disability, measured as >9 normalised points of the first 19 items in the Disability Arm Shoulder Hand questionnaire.20 21 These 19 items refer to disability in activities of daily living regarding neck, shoulders and arms. Excluded were individuals with trauma to the head or neck, a diagnosis of rheumatic, neurological, connective tissue, inflammatory or endocrine disease or psychiatric diagnosis affecting their everyday life, fibromyalgia, cancer, stroke, cardiac infarction, diabetes type 1, cervical radiculopathy, vestibular disorders, surgery or fracture to the back, neck or shoulder in the last 3 years or shoulder luxation in the last year. Finally, strenuous exercise more than three times per week during the previous 6 months also led to exclusion.19

Patient and public involvement

As this study was based on secondary data from a RCT, patients were not directly involved in the design and completion of the study.

Data collection and variables

Before the start of the intervention (baseline), participants filled in a self-administered questionnaire, containing instruments and questions to measure the potential prognostic factors. Outcome information was collected by follow-up questionnaires at 3, 9 and 15 months from baseline. Participants who did not provide information on the outcome at any of the three follow-ups were excluded, resulting in a study population of 89 participants (figure 1).

Figure 1

Included participants and progress of participants along the follow-up period concerning the outcome measures. NDI, The Neck Disability Index; PGICS, The Patient Global Impression of Change Scale; RCT, randomised controlled trial.


The Patient Global Impression of Change Scale (PGICS) provided information for the outcome ‘global perceived change of general health’ by comparing general health at follow-up with general health at baseline. PGICS is a 7-point ordinal Likert scale (very much improved, much improved, minimally improved, not changed, minimally worse, much worse and very much worse).22 The scale categories were dichotomised into ‘improved’ (very much improved, much improved) and ‘not improved’ (all the other categories). According to the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) recommendations for clinical important outcomes in chronic pain, the 7-point scale is recommended when assessing general health. The categories ‘very much improved’ and ‘much improved’ reflects what patients consider to be a clinical important improvement in general health.23 The Neck Disability Index (NDI) provided information for the second outcome.24 The NDI has 10 items (pain intensity, personal care, lifting, reading, headaches, concentration, work, driving, sleeping and recreational activities) with six possible answers in each item, scored 0 (no limitation) to 5 (major limitations). Total score of the NDI ranged from 0 to 50. The NDI has high reliability, strong internal consistency and strong validity, when compared with other instruments for evaluating patients with neck pain.24 In the present study, we used the normalised NDI (NDI%, 0–100). The minimal important change (MIC) for the NDI% was set equal to 6.3.25 Using MIC as a cut-off, a dichotomised NDI outcome was constructed for each follow-up, where a clinical important improvement was defined as a NDI% decrease of more than 6.3 between the baseline and the follow-up, and ‘no improvement’ as an increase, or a decrease less than or equal to 6.3 NDI%. Participants with a NDI% of 0 at follow-up were also defined as having a clinical important improvement regardless of the baseline score.

Potential prognostic factors

The selection of potential prognostic factors was based on systematic reviews and prospective studies on the prognosis of non-specific neck pain, clinical considerations and availability in the data (table 1). Eight potential prognostic factors had support from literature; age, neck disability, average pain intensity, depression, fear of movement, catastrophising, social support and leisure physical activity.5 8 26 27 Chronic widespread pain, recovery expectation, physical work load and dual-working were considered potential prognostic factors based on clinical considerations.

Table 1

Potential prognostic factors and corresponding bibliographical references to definition and psychometric properties of the factors

Average pain intensity was not limited to neck pain, but all participants reported neck pain as the ‘most painful area’, as this was a criterion for inclusion in the RCT.19 Dual-working was a combination of three items: (1) Working/not working, (2) What does your household look like? (living alone, living alone with children, living with another adult, living with another adult and children) and (3) Who is primarily performing the housework in your family? (myself, someone else and equally shared). The factor dual-working was then dichotomised according to the answers into; yes (working, living with children and/or another adult and performing the housework) and no (any other combination of the three items). Dual-working has, to the best of our knowledge, not been evaluated in previous studies, but in our opinion, it is possibly associated with the prognosis of neck pain in women.

Statistical methods

Descriptive statistics are presented as means and SD or median and IQR when appropriate. The level of the CIs was set at 95% and that of the tests at 5%. All p values were two-sided. Stata V.14.2 was used in the analyses.

Random-effects logistic regression models were used to estimate the probability of a clinically important improvement of the outcomes from baseline to 3 and 15 months follow-up.17 Data from all three follow-ups, 3, 9 and 15 months, were used in the analyses. Random-effects logistic regression can model longitudinal data efficiently, while taking into account the potential dependence of the repeated measures taken on each participant.17 The regression models were estimated for each of the two outcomes, PGICS and NDI, separately. Both outcomes showed strong associations between the follow-ups. The linearity of the relationship between the logit of the probability of the outcome variables and numeric independent variables (potential prognostic factors) was verified by means of restricted cubic splines.17 28 None of the numeric independent variables showed evidence against linearity.

Random-effects logistic regressions, including data from all three follow-ups, were used for all potential prognostic factors one at a time to estimate the time-specific ORs with 95% CI for each of the two outcome measures, separately. If the interaction term (factor-by-time) was not statistically significant, the analyses were repeated without the interaction term. This resulted in a single OR for the factor from 3 to 15 months. When the interaction was significant, the OR at 3 and 15 months were reported separately. Potential prognostic factors with a p value ≤0.2 for the estimated OR were considered candidate factors for the predictive models.28 Based on an a priori decision of clinical relevance, statistical effect measure modifications were tested between pain intensity and depression or recovery expectations. If the effect measure modification was statistically significant, this was included in the subsequent analyses.28 29

Developing the models

A sequential backward manual selection procedure with the random-effects logistic regressions was used to build the predictive models.28 All candidate factors were included in the initial model. The factor with the highest p value (Wald-test) were excluded one by one until all factors in the model had a p value ≤0.1.28 If a candidate factor showed effect-modification with time, then the factor was included together with the effect measure modification in the model. During the development process, all the analyses were adjusted for the assigned RCT interventions. ‘Intervention’ was then removed from the final predictive models before evaluating the models and presenting the results. The association between the prognostic factor and the outcome was reported as beta-coefficient (β) with SE, OR with 95% CI and associated p value.

Evaluation of the models

The bias corrected area under the receiver operating characteristic curve (AUC) with 95% CI was obtained by cross-validation based on 100 design-matrix bootstrap replicates and used to determine the internally validated predictive ability of the models.28 29 The AUC represents a summary of the sensitivity and specificity of the model in distinguishing participants who improve during the follow-up period from those who do not. The AUC ranges from 0.5 (no predictive ability) to 1.0 (perfect predictive ability). Overfitting was assessed by calculating the heuristic shrinkage factor.30 A shrinkage factor of 1.0 indicates no overfitting of the model. The shrinkage corrected beta-coefficients (Sβ) were calculated by the formula; shrinkage factor x β.

Sensitivity analyses

To compare the study population with that of the non-responders, we used the χ2 test for categorical variables, and the t-test or the Wilcoxon’s rank-sum test for continuous variables.


Study population

Baseline characteristics of the study population are listed in table 2. The women’s age ranged between 31 and 65 years, and all women reported neck pain duration of more than 8 months, with a median of 120 months (IQR 60–216). Eighty women (91%) were working and 66 (74%) reported no sick leave because of neck pain during the previous 6 months. Using PGICS as outcome measure, 47 (53%) women were categorised as improved at 3-month follow-up, 31 (35%) at 9-month follow-up and 26 (30%) at 15-month follow-up, while using NDI as outcome measure, 39 (44%) women had improved at 3-month follow-up, 37 (42%) at 9-month follow-up and 31 (36%) at 15-month follow-up. Only one participant reported ‘no neck disability’ (NDI%=0) at the 3-month and 15-month follow-ups, respectively, while none reported ‘no neck disability’ at the 9-month follow-up.

Table 2

Baseline characteristics of the study population, n=89

Development and evaluation of the predictive model

The analyses on each potential prognostic factor are presented in table 3. The follow-up data observations used for the analyses differed slightly between the two outcome measures because of missing responses (figure 1). No effect measure modification was observed between pain intensity and depression or recovery expectations. In the analyses with PGICS as outcome measure age, neck disability and average pain intensity met the criteria for inclusion into the predictive model analysis. No potential prognostic factor appeared to be modifyed by time. In the corresponding analyses with NDI as outcome measure, five potential prognostic factors met the criteria for inclusion: age, neck disability, depression, catastrophising and leisure physical activity. The effect of catastrophising changed over time, therefore an effect measure modification term (catastrophising x time) was added to the predictive model analysis.

Table 3

Random-effects regression analyses* for potential short-term and long-term prognostic factors with PGICS and NDI as outcome, n=89

The resulting predictive models from the sequential selection procedure are presented in table 4. Internal validation showed acceptable predictive ability for the models of 0.64 and 0.67, respectively (table 4). The calculated shrinkage factor was 0.73 for the PGICS model and 0.65 for the NDI model.

Table 4

The final predictive models* of clinically important improvement in chronic disabling neck pain with PGICS and NDI as outcome

Sensitivity analyses

Non-responders (n=19) showed no statistically significant differences in the characteristics from the study population except for average pain intensity, where non-responders had a higher median of 6.5 compared with a median of 5 (p=0.03) in the study population.


We developed and internally validated predictive models for clinically important short-term and long-term improvement from chronic disabling neck pain among women in the clinical course of physiotherapy. The models were developed for the ‘global perceived change of general health’ (PGICS) and the NDI, separately. Both models had acceptable predictive ability.29 The results showed that, when assessing potential biopsychosocial prognostic factors, perceived disabilities related to neck pain, depression, catastrophising and age are of importance for predicting short-term and long-term improvements in these women. Further, we found that prognostic factors could predict different results depending on the follow-up time, as for catastrophising in the NDI model. This is important in the management of patients in clinical settings. Similarly to previous studies, we found that the prognostic factors in the models could differ depending on the outcome.16 31

Our results indicate that the odds for clinically important improvement, with PGICS as outcome measure, decrease with age and increase with higher baseline levels of disability related to neck pain. With NDI as an outcome measure, the odds for improvement increased with higher baseline levels of neck pain-related disability and decrease with higher levels of depression. These results were valid over the total follow-up time of 15 months. However, the NDI model indicated increased odds for short-term improvement, but decreased odds for long-term improvement with higher levels of catastrophising at baseline. This interesting finding was made possible by the longitudinal nature of our study design. To the best of our knowledge, this is the first study using longitudinal analyses in predictive models for neck pain. Interestingly, depression was found to be a predictor even though women with a psychiatric diagnosis affecting their everyday life were excluded from the study.

The relative strength of the incorporated factors in the model should be interpreted with great caution as their independent effect on the outcome was not thoroughly examined.

Our findings are somewhat different from those of other similar studies, because of the different study populations, potential prognostic factors examined, outcomes and follow-up times. We found only one prediction study on patients with chronic non-specific neck pain in physiotherapy, but with a study population also including males.15 Poor outcome was predicted by pain medication at discharge and at 1 year follow-up, and by catastrophising at 1 year follow-up, the latter in line with our results. Similar to our study, the outcome was assessed as a minimal clinical important difference, but was based on the Northwick neck pain questionnaire.

We found four reports on non-specific neck pain predictive models with reasonably similar outcomes, methods and follow-up time as in the present study.31–34 However, all four reports included both male and female patients with different durations of pain (acute, subacute and chronic), and were not performed in the course of physiotherapy only. Similar to us, Hill et al and Verhagen et al found the psychological factors catastrophising and depression/distress to predict their outcomes.31 34 Also, baseline neck disability was included in the models by Hill et al and by Kjellman et al.31 33 Other factors in their models were pain intensity, treatment expectations, concomitant back pain, manual social class, well-being and somatisation, while we found that treatment expectations and pain intensity were not important predictors in our study.31 33 34 Shellingerhout et al presented a predictive model with a set of nine predictors for global perceived recovery in an adult primary care population.32 With the exception of age, none of the predictors found were similar to our predictors, a discrepancy that may be explained by different study settings, population and outcome compared with our study. In their population, only 34% reported chronic neck pain and 60% were women. Further, only 28% of the patients were referred to physiotherapy while the others were referred to ‘usual care’, spinal manipulation or a behaviour graded activity programme. Even though Shellingerhout et al reported a model including nine predictors, their predictive ability (AUC=0.66) was similar to that of our models (AUC of 0.64 and 0.67). One could speculate that as prognostic factors in research related to neck pain most often are weak, reaching higher levels of predictive ability may be hard even with large samples and well conducted analyses.5 8

Our study meets most of the important criteria for deriving predictive models in relation to musculoskeletal disorders and physiotherapy suggested by Beneciuk et al: a well-defined study population, longer follow-up times than 6 months and psychological and psychosocial assessments incorporated in the model development.35 Other strengths of our study are a follow-up rate of 82% and a small number of missing data. We used longitudinal analyses, an advantage when assessing individual change over time.18 The outcome, clinical important improvement, was based on the IMMPACT recommendations (PGICS), and on responsiveness analyses in which the same sample as in the present study was included (NDI).23 25 The use of conservative ‘cut-points’ for the Wald test in the selection procedure in order to decrease the risk of type II errors is also supported in the literature.28 29

Our study also has limitations. The sample size of 89 participants could be considered small when conducting a prediction study. A small sample size increases the risk of overdispersion during analyses. However, the sample size was large enough to ensure the recommendations of 10–15 participants for each parameter estimate (coefficient) included in the analyses.28 35 Furthermore, our results are based on secondary analyses of data, possibly limiting information on additional prognostic factors which could have influenced the derivation of the models. Of the 108 women included at baseline, 19 did not respond to any of the three follow-ups and were excluded from the analyses. They showed similar characteristics as the included women except for an average pain intensity with a median of 6.5 compared with a median of 5 in the study population. Therefore, attrition might have influenced the results. Some of the participants were recruited by advertisement and could have different characteristics than ordinary healthcare patients.36 However, the baseline levels of average pain intensity, neck pain-related disability and fear of movement in the study were similar to those in the study by Shellingerhout et al, who investigated prognostic factor for neck pain in a sample of patients referred to primary care.32

The clinical implication of our study findings is that baseline neck disability seems to be an important factor to consider, as a higher baseline disability was associated with higher odds for clinical important improvement in both our models. This is in line with the findings of Bot et al in exploring predictors for patients with neck or shoulder pain in general practice.37 They found that being more disabled at baseline predicted a larger reduction in disability at 3 and 12 months follow-up. Furthermore, psychological factors are important to consider in the clinical course of physiotherapy concerning women with chronic neck pain. As psychological factors are modifiable, they may be targeted in order to increase the possibility for short-term and long-term improvement in the management of these women. The information about the prognostic factors and the outcomes included in our models is easily collected at the first consultation by a physiotherapist, and the models could be a valuable tool to help physiotherapists manage these patients.

While the predictive ability of our models was acceptable, it also indicated that there may still be other factors that could help predict the outcome more precisely. It would therefore be valuable to include variables in future investigations that were not included in the present study, like for example lifestyle factors such as tobacco and alcohol use, and psychological factors related to work.38 39 Our models will make clinicians aware of what factors are important to consider when predicting which patients will have the best chance of a clinical important improvement. Also notable to clinicians is that factors can vary depending on the outcome measured, and that the chance of improvement can change over time for some factors, as for catastrophising in the current study. Furthermore, our results facilitate future prognostic research related to chronicity in neck pain. A predictive model development, including internal validation, is only the first step in the process of deriving a model that could be implemented in clinic. A second step should be external validation, for example, by using data from RCTs including other types of interventions targeting chronic neck pain. Finally, further external validation should be done by investigating the impact of the model in clinical practice before implementation.11 Because all women in our sample underwent physiotherapy, our predictive models should be restricted to this population and setting. The validity in other populations and other healthcare providers will remain unknown until external validations will be available.


The developed predictive models evaluating clinical important improvement from chronic non-specific neck pain among women in the clinical course of physiotherapy showed acceptable predictive ability. Age, neck pain-related disability, depression and catastrophising seem to be factors that can be of guidance for physiotherapists trying to predict short-term and long-term clinical important improvement in these patients. With the exception of neck pain-related disability, the different outcome measures had different sets of prognostic factors. The effect of some factors may be modified by time. The present study is the first step towards developing predictive models for clinical practice. The next steps will include external validations in different neck pain populations and clinical settings.


We thank Maria Frykman and Thomas Rudolfsson for excellent administration and data acquisition.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.


  • Contributors TBo, MBj and MBo contributed to the design of the study. TBo made the statistical analyses supported by MBo and wrote the first manuscript version. All authors contributed to the interpretation of the data and critically revised all versions of the manuscript and finally approved the last version.

  • Funding The study was funded by Alfta Research Foundation, grants from the Swedish Council for Working Life and Social Research (2006-1162), Länsförsäkringar Forskning och Framtid (51-1010/06) and Forte Centre Working Life ‘The body at work—from problem to potential’ (2009-1761).

  • Competing interests None declared.

  • Ethics approval The ethics review board in Uppsala, Sweden, approved the study (registration number 207/206).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data available.

  • Patient consent for publication Not required.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.