Article Text

Development and validation of a screening tool to predict the risk of chronic low back pain in patients presenting with acute low back pain: a study protocol
  1. Adrian Traeger1,2,
  2. Nicholas Henschke3,
  3. Markus Hübscher1,2,
  4. Christopher M Williams4,5,
  5. Steven J Kamper5,
  6. Chris G Maher5,
  7. G Lorimer Moseley2,6,
  8. James H McAuley1,2
  1. 1School of Medical Sciences, University of New South Wales, Sydney, New South Wales, Australia
  2. 2Neuroscience Research Australia, Sydney, New South Wales, Australia
  3. 3Institute of Public Health, University of Heidelberg, Heidelberg, Germany
  4. 4Hunter Medical Research Institute and School of Medicine and Public Health, University of Newcastle, Newcastle, New South Wales, Australia
  5. 5The George Institute for Global Health, University of Sydney, Sydney, New South Wales, Australia
  6. 6Sansom Institute for Health Research, University of South Australia, Adelaide, South Australia, Australia
  1. Correspondence to Dr James McAuley; j.mcauley{at}


Introduction Around 40% of people presenting to primary care with an episode of acute low back pain develop chronic low back pain. In order to reduce the risk of developing chronic low back pain, effective secondary prevention strategies are needed. Early identification of at-risk patients allows clinicians to make informed decisions based on prognostic profile, and researchers to select appropriate participants for secondary prevention trials. The aim of this study is to develop and validate a prognostic screening tool that identifies patients with acute low back pain in primary care who are at risk of developing chronic low back pain. This paper describes the methods and analysis plan for the development and validation of the tool.

Methods/analysis The prognostic screening tool will be developed using methods recommended by the Prognosis Research Strategy (PROGRESS) Group and reported using the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. In the development stage, we will use data from 1248 patients recruited for a prospective cohort study of acute low back pain in primary care. We will construct 3 logistic regression models to predict chronic low back pain according to 3 definitions: any pain, high pain and disability at 3 months. In the validation stage, we will use data from a separate sample of 1643 patients with acute low back pain to assess the performance of each prognostic model. We will produce validation plots showing Nagelkerke R2 and Brier score (overall performance), area under the curve statistic (discrimination) and the calibration slope and intercept (calibration).

Ethics and dissemination Ethical approval from the University of Sydney Ethics Committee was obtained for both of the original studies that we plan to analyse using the methods outlined in this protocol (Henschke et al, ref 11-2002/3/3144; Williams et al, ref 11638).


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • First prognostic tool in low back pain to follow, a priori, the Prognosis Research Strategy (PROGRESS) framework, and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) reporting guidelines for prognostic research.

  • Prespecifies statistical analysis plan and informative levels of tool performance to increase transparency of results and of the final report.

  • Restricted to only the use of predictor variables measured in previous data sets.

  • Minor differences in the way variables are measured between the development and the validation data sets.


Acute non-specific low back pain (LBP) is widely reported to have a favourable prognosis;1 pain intensity reduces rapidly in the first few weeks,2 and around 60% have fully recovered (return to work, no disability, and no pain) by 3 months.3 However, for the 40% of patients who continue to report pain at 3 months, or ‘chronic’ LBP, the prognosis is much poorer.4 Despite this difference in prognosis, many randomised trials apply the same treatment to all patients with acute LBP, which is an inefficient approach. An alternative approach is to target specific interventions to those at higher risk of developing chronic LBP. Research evaluating this ‘stratified’ approach to acute LBP management has been identified as a priority.5

Secondary prevention in LBP refers to preventing patients with acute LBP from developing chronic LBP.6 The first step in secondary prevention is to identify factors that are associated with poor outcome, or prognostic factors.7 Recently, the Prognosis Research Strategy (PROGRESS) group proposed a framework for conducting prognosis studies that included standards for identifying prognostic factors.8 Once prognostic factors have been identified they can be combined to produce prognostic models9 ,10 to assist clinicians in management decisions and researchers in trial design. For example, prognostic models applied to patients with LBP have predicted outcome better than chance or clinical judgment alone.11 ,12 Models that are used to screen patients into risk groups are known as prognostic screening tools.

The prognostic screening tools that are currently available for LBP are of limited use in secondary prevention.13 For example, the majority of existing tools were developed in secondary care,14 ,15 and in groups of patients with acute LBP and chronic LBP,16 ,17 making their routine application in primary care problematic without further testing. In addition, the predictive validity of these tools is invariably poorer when used outside of the sample in which they were developed.18

These limitations demonstrate the importance of external validation,19 which is rarely reported.20 ,21 To assess the external validity of a prognostic model, discrimination (the probability that patients who develop the health outcome of interest are allocated higher risk scores) and calibration (how closely predicted outcomes match actual outcomes) should ideally be assessed in a separate sample from that used to initially derive the model. For the few tools that have been externally validated in acute LBP,22–24 discrimination ranged from poor to moderate, and calibration was only reported in one study.23

There is also variability in the outcomes that prognostic tools have been developed to predict. Although these tools should predict reliable and clinically important outcomes,25 in LBP there is no consensus on which outcomes are the most important.26 Most published tools predict disability or return to work outcomes, rather than pain, which is the most common outcome used to define chronic LBP.21 ,27 Von Korff and Saunders28 have argued that pain, in particular high pain intensity and/or disability, is the most important outcome to assess at 3 months. Three months is also the time at which a marked change in prognosis occurs.3 A recently formed research task force27 agreed on a uniform definition of chronic LBP: 3 months’ worth of pain days in the past 6 months. The task force also emphasised the importance of grading the impact of chronic pain and disability, though validated cut-offs were not available. The recent progress made to define chronic LBP highlights the need to identify not only the patients at risk of ongoing pain for 3 months, but also the patients at risk of developing high-impact chronic pain and disability. Such patients are logical targets for early secondary prevention strategies. We are not aware of any tool that was developed to predict the onset of chronic pain.

The aim of this protocol is to describe the method and analysis plan for the development and validation of a prognostic screening tool for acute LBP that is suitable for secondary prevention.


Our study design is informed by the PROGRESS framework and specific recommendations for statistical approaches to prognostic research (table 1). We will report the study in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement for prediction studies.34

Table 1

Recommendations for prognostic research

The PROGRESS framework outlines four types of prognostic research: (1) fundamental prognosis research; (2) prognostic factor research; (3) prognostic model research; (4) stratified medicine research. The proposed study is Type 3, prognostic model research (table 1). To develop the prognostic screening tool, we will consider the recommendations of Kamper et al29 and Royston et al.9 To validate the prognostic screening tool, we will use the method suggested by Altman et al30 and Steyerberg et al.35

Development of the tool

Development sample

The development sample data for the proposed study is from a cohort study that assessed the prognosis of acute LBP in primary care. Details of this study have been published elsewhere.3 In short, consecutive patients were recruited between November 2003 and July 2005 from primary care practices in the Sydney Metropolitan area, Australia. In the Australian healthcare system, ‘primary care’ includes first contact care provided by general practitioners, allied health practitioners (physiotherapists, chiropractors), and pharmacists. The original study recruited an inception cohort of 1248 patients with acute LBP (<4 weeks’ duration), 973 of which had pain of less than 2 weeks’ duration. All patients were managed according to the Australian National Health and Medical Research Council guidelines for acute LBP.36 Outcomes were measured via telephone at 6 weeks, 3 months and 12 months. Key design features of the study are described in table 2 below.

Table 2

Prognosis study by Henschke et al3 adherence to PROGRESS recommendations



The primary outcome will be the presence of chronic LBP. Chronic LBP will be defined as having greater than ‘very mild’ pain 3 months after the initial assessment. In the development sample, this was classified as >2 on a 6-point Likert scale (How much back pain have you had in the past week? 1=‘none’, 2=‘very mild’, 3=’mild’, 4=‘moderate’, 5=‘severe’, or 6=‘very severe’).41 The validation sample used a different pain rating scale, so we have defined greater than ‘very mild’ pain on this numeric rating scale as ≥3/10. We chose these cut-offs because a large proportion of patients with ‘ very mild’ pain (1–2/10) pain consider themselves to be recovered.26

We will also develop two additional prognostic screening tools that consider the impact of pain and disability when defining chronic LBP.27 For the tool predicting high pain, patients will be classified as having ‘chronic LBP (high pain)’ if they reported ‘moderate’, ‘severe’ or ‘very severe’ pain intensity on a 6-point scale, or ≥5 on the 11-point scale in the validation sample, at 3 month follow-up.42 Because there is still no consensus on what constitutes ‘high impact’ or ‘severe’ chronic LBP, we have selected our own cut-off in line with Von Korff and Saunders'28 recommendation of ‘moderate’ or greater pain intensity at 3 months. For the tool predicting disability, patients will be classified as having ‘chronic LBP (disability)’ if they reported ≥2/5 on a 5-point disability scale (During the past week, how much has LBP interfered with your normal work (including both work outside the home and housework)? 1=‘not at all’, 2=‘a little bit’, 3=‘moderately’, 4=‘quite a bit’, 5=‘extremely’)41 at 3 month follow-up. Once again, because no established cut-offs exist,27 we selected our own cut-off for what might constitute clinically important disability. We chose a disability score of ≥2/5 on the 5-point scale used in the development sample,41 or ≥7/24 on the 24-point Roland Morris Disability scale used in the validation sample. When both are converted to a 0–10 scale, these values approximate each other (7/24×10=3/10 or 1.5/5).2 Choosing this disability cut-off will allow comparison to two recently published prognostic screening tools, which also selected a cut-off of 7 on the Roland scale.16 ,24


Candidate predictors will be selected from those measured at baseline in the cohort study if they are: (1) simple and reliable to measure in practice and (2) have a theoretical association with the development of chronic LBP. Candidate predictors are listed in online supplementary appendix A.

Statistical analysis

Cases with missing values will be removed from the dataset if follow-up rates are higher than 95%. If missing data exceeds 5%, ‘single imputation’43 will be used. As per the recommendation from the PROGRESS group (table 1), we will ensure at least 10 cases per candidate predictor variable, to adequately power the regression analysis.9 ,44

Variable selection

A logistic regression analysis will be used to investigate the relationship between the prognostic variables at baseline and the measures of chronic LBP. Age, sex and duration of the pain episode will be entered into block 1 of a multivariate analysis to reduce the complexity of the final model for clinical use, and ensure maximum sensitivity of the final models. In block 2, each potential prognostic factor will be added stepwise to the model by using an automated forward selection procedure. We will set a liberal significance level (p<0.10) to select variables that remain in the model. The predicted probability of chronic LBP will be modelled using the logistic regression equationEmbedded Image

Continuous predictor variables will be treated as linear in the first multivariable regression model. After the initial predicted probabilities of chronic LBP are calculated, the linearity of continuous predictor variables will be examined with the predicted probability of chronic LBP as the dependent variable using scatter plots and the Box-Tidwell transformation.45 The Box-Tidwell transformation involves log transforming each continuous predictor variable, producing an interaction term between the original variable and its log (eg, pain×pain LN), and including this term in the regression analysis. If the interaction term is significant, this indicates significant non-linearity. Continuous predictor variables that demonstrate a non-linear relationship with the dependent variable (predicted probability of chronic LBP) will be transformed by using fractional polynomial procedure.46 Regression analyses will be performed using SPSS.46


Each individual will be allocated a risk score. The risk score will be calculated by a sum of the products of individual values of each predictor variable and its regression coefficient.47 The full algorithm will be used to produce a score in the first instance to maximise predictive capacity. For the purpose of examining the performance of the predictive tool, patients will be classified as low, medium and high risks, based on their quartile of risk. Those in the highest quartile will be classified as high risk and those in the lowest quartile as low risk. The middle two quartiles will be classified as medium risk.


To examine the apparent performance (internal validity) of the prognostic screening tools, we will assess measures of overall performance, calibration and discrimination. Overall performance will be assessed using the Nagelkerke R2 and Brier score. The Brier score is a method of quantifying differences between actual binary outcomes and their predictions, that is, average prediction error.35 The Brier score ranges from 0 to 0.25, values close to 0 represent a useful model and values close to 0.25 a non-informative model. Calibration, that is, the agreement between observed and predicted frequencies of a given outcome, will be determined by plotting the mean predicted versus observed cases of chronic LBP for 10 risk stratification levels. The calibration slope and calibration-in-the-large statistic (intercept) will be calculated by constructing calibration plots. Discrimination, that is, the ability of the tool to discriminate between patients who did (+ve case) or did not (−ve case) develop chronic LBP, will be determined by using a Receiver Operator Characteristic Curve analysis, by calculating discrimination slope (box plots) and by examining risk-stratified likelihood ratios. Performance indices and plots will be calculated using R software.48 Rules for interpretation of these statistics are presented in the Discussion.

After the performance indices have been calculated, we will internally validate the model using bootstrapping techniques suggested by Moons et al33 (see online supplementary appendix B, Table B). Bootstrapping will be performed in SPSS using syntax available at To assess model fit and optimism, bootstrapped estimates of the Nagelkerke R2 and its SE will be compared with the original model estimates. We will conduct a sensitivity analysis to assess performance of the tool for patients in different settings (general practice, physiotherapy, chiropractic).

Validation of the tool

Validation sample

The validation sample consists of 1643 participants from a randomised trial conducted over 235 primary care centres in Greater Metropolitan Sydney, Australia. The trial, published elsewhere,49–51 found no effect of paracetamol on recovery from acute LBP compared with placebo. In short, all participants were adults with acute non-specific LBP (<6 weeks duration) who had presented to primary care (GP, physiotherapist or pharmacist) between November 2009 and March 2013. Participants received up to 4 weeks of regular paracetamol, as-needed paracetamol, or placebo, and were followed up for 3 months on measures of pain and recovery.



Data on chronic LBP incidence (pain and disability at 3 months) will be extracted from the validation sample. Where necessary, all outcomes extracted from the validation sample will be transformed to match the format of the development sample. Patients will be classified as having chronic LBP, chronic LBP (high pain) and chronic LBP (disability), using the same definitions as were used in the development stage.

Statistical analysis

Each prognostic model will be assessed in the validation sample using the same statistical procedures as in the development stage. We will calculate estimates of overall performance (Nagelkerke R2, Brier Score) discrimination (area under the curve (AUC), likelihood ratios, discrimination slope), and calibration (Calibration plot, Hosmer-Lemeshow test).

Posterior probability

We will calculate posterior probability along with 95% confidence limits according to the method recommended by Haskins et al.52 The 95% confidence limits will be calculated in Microsoft Excel using the following β distribution equations:Embedded Image

Validation plots

Validation plots of predicted versus observed risks showing the intercept, slope, AUC, scaled Brier score, and R2 will be produced to summarise and compare performance of the tool in the development and validation samples.

Updating the tool

Because the validation sample is more recent, we will consider updating and extending the model in the validation sample. We will update the model using a recalibration method described by Steyerberg et al,53 which involves multiplying the regression coefficients and intercept from the original logistic equation by the calibration slope (β) and the calibration intercept (α or ‘calibration in the large statistic’) from the validation data, to produce a new logistic equation. To extend the model, we will add new, potentially useful predictors from the validation sample for example, sleep quality.

If, based on our prespecified criteria below, we find the predictive validity of the prognostic model to be informative, we will consider simplifying it for clinical use (table 1). This process may involve steps such as refining and specifying measurement of the predictors, simplifying and clearly describing calculation of the prediction score/strata, and producing an electronic or paper-based form designed for clinical application.


We have described the methods and statistical analysis plan to develop and validate a prognostic screening tool for acute LBP. To our knowledge, this tool will be the first of its kind in LBP to follow, a priori, the PROGRESS framework and TRIPOD reporting guidelines for prognostic research. Importantly for secondary prevention, the tool will be developed specifically to predict the onset chronic LBP at its inception at 3 months. The study is limited to using predictors measured in previously collected data sets.

We will use contemporary statistical methods to assess the calibration and discrimination of the screening tool in the two samples. The relative importance of calibration and discrimination ultimately depends on the purpose of the screening tool. For example, if the purpose of the tool is to aid clinical decision-making and provide accurate estimates of risk to patients, then calibration is an important consideration. If a clinician were to inform their patient that they had a 10% chance of chronic LBP, this estimate would be misleading if the tool was not well calibrated and, for example, 40% of patients with the same level of risk actually developed chronic LBP. If, on the other hand, the purpose of the tool is to select appropriate patients to include in a randomised trial, for example, in a trial to prevent chronic LBP, then adequate discrimination is important. In this example, a poorly discriminating tool would misclassify a large proportion of patients, including a number of inappropriate (low risk) patients and excluding appropriate (high risk) patients.

Ideally, a useful screening tool should have discrimination and calibration that is considered informative for its purpose. In general, Hosmer and Lemeshow suggest that for a logistic regression model, an AUC statistic of <0.7 represents poor discrimination, 0.7–0.8 acceptable discrimination, 0.8–0.9 excellent discrimination and 0.9–1.0 outstanding discrimination.54 However, Steyerberg points out that for clinical decisions that are close to a ‘toss-up’, a tool with an AUC of 0.6 can be informative.35 Published LBP tools report AUC values that range between 0.623 and 0.7555 for pain, and between 0.6856 and 0.8355 for disability at 3–6 months of follow-up. On the basis of the reference standard values and those of previous work, we will consider AUC values of less than 0.6 to be non-informative. We also plan to use our additional measures of discrimination (likelihood ratios, discrimination plots) to determine whether or not the tool is informative. For example, overlapping likelihood ratio estimates among low-risk, medium-risk and high-risk groups would indicate poor discrimination. If the posterior probability CI includes the prevalence rate, we will consider the tool to not be informative, and not likely to be clinically useful.

Acceptable calibration of the tool will be based on the results of the calibration plots. If observed frequencies of the chronic LBP in the validation sample fall within 5% of predicted frequencies, the tool will be considered to have acceptable calibration. Calibration in the large statistic (intercept) should be close to 0 and the slope close to 1. With such a large sample, the p value of the Hosmer-Lemeshow test will be interpreted with caution.


This protocol outlines the design of development and validation studies for a prognostic screening tool in acute LBP. Results coming from this study will be interpreted for both clinical and research purposes.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors CGM, NH, SJK, JM and CMW acquired the original data for this research. AT, JM, MH, CMW, SJK, GLM and NH formulated the methods and designed the protocol. AT drafted the manuscript. All authors contributed to revisions and approved the final version of the manuscript.

  • Funding AT and HL are supported by a National Health and Medical Research Council PhD Scholarships. GLM, CGM and SJK are supported by National Health and Medical Research Council research fellowships. JM and MH are supported by a National Health and Medical Research Council research grant ID 1047827. The two original studies to be used in the planned analysis were funded by National Health and Medical Research Council research grants ID 2536343 and ID 352576.23

  • Competing interests None declared.

  • Ethics approval University of Sydney Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.