Article Text

Original research
Intrarater reliability of the Abilitator—a self-report questionnaire on work ability and functioning aimed at the population in a weak labour market position: a test–retest study
  1. Miia Wikström1,2,
  2. Anne Kouvonen2,3,
  3. Matti Joensuu1
  1. 1 Finnish Institute of Occupational Health, Helsinki, Finland
  2. 2 Faculty of Social Sciences, University of Helsinki, Helsinki, Uusimaa, Finland
  3. 3 Centre for Public Health, Queen's University Belfast, Belfast, UK
  1. Correspondence to Miia Wikström; miia.wikstrom{at}ttl.fi

Abstract

Objectives The Abilitator is a patient-reported outcome measure (PROM) of work ability and functioning of those in a weak labour market position. It covers items for work ability and self-rated health, for example, and summary scales for social, psychological, cognitive and physical functioning, as well as everyday skills. The aim of this study was to evaluate the intrarater test–retest reliability, internal consistency and basic psychometric properties of the Finnish version of the Abilitator.

Design, setting and outcome The test–retest study was conducted in European Social Fund projects in 2018–2019. The participants completed two Abilitator questionnaires over 7–14 days. The internal consistency analysis was based on data collected in 2017–2019 in services for the long-term unemployed. The reliability was assessed using correlations (r, rs , intraclass correlation coefficient (ICC)), agreement with Bland-Altman analysis and internal consistency with Cronbach’s alpha.

Participants The test–retest study had 67 participants (52% men, mean age 43.9 years) and the internal consistency study 10 923 (48% men, mean age 38.58 years), respectively. Of all the participants, 80% had been unemployed for over a year.

Results The test–retest r or rs ranged from 0.71 to 0.93 and ICC from 0.74 to 0.93 for the items and summary scales. An exception was the life satisfaction item, with an rs of 0.60 and ICC of 0.45. A statistically significant difference was observed in the summary scale for social functioning (t=−2.01, p=0.049). Agreement was observed for all variables except social functioning. Alphas for summary scales ranged from 0.74 to 0.91.

Conclusions The Finnish version of the Abilitator is a reliable PROM for the target group and has acceptable to excellent intrarater test–retest reliability and internal consistency, apart from the life satisfaction item. Further testing is needed for the social functioning summary scale.

  • public health
  • health services administration & management
  • health economics

Data availability statement

Data are available upon reasonable request. Access to the data can be requested from the corresponding author in line with the FIOH data sharing policy.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This was the first test–retest reliability and internal consistency study of the Abilitator.

  • It was a strength that the study was conducted, and the data collected in a realistic service setting.

  • It was a strength that the study covered a wide range of participants in a weak labour market position.

  • It was a limitation that some real-life service activities took place during the test–retest period.

  • It was a limitation that the sample size in the test–retest study was too small to conduct further subgroup analyses, such as with different language versions of the Abilitator and participants using mobility aids.

Introduction

In Western European welfare states, a broad selection of services promotes the health, rehabilitation, social well-being, education and employment of the working-age population. These services often require an individual assessment of the clients’ work ability and functioning, to identify any needs for support and to suggest participation in appropriate service programmes. Organisations providing employment and other welfare services should also assess the effectiveness of their services.1 Reliable data on the clients’ needs and changes during the services are essential for planning the allocation of resources. The use of patient-reported outcomes (PROs) has begun to increase, and their use is now widely recommended.2–5

PROs can be defined as measurements of the patients’ or service clients’ self-reported health, functioning, well-being and health-related quality of life.6–9 Patient-reported outcome measurements (PROMs) are standardised, reliable and validated instruments that include and elicit PROs.6 7 10 11 PROMs can be either generic or specific, depending on whether they measure the clients’ perceived health status in general or in relation to specific conditions or diseases.7 11 PROMs can be either unidimensional, when they measure a single characteristic (construct), or multidimensional, when multiple different constructs are measured by the same instrument.12

Currently, only a few generic, feasible and validated PROMs exist for multidimensionally assessing work ability and functioning of the unemployed.13–15 The need for PROMs with a broad view of health and well-being, especially suitable for those with comorbidities, is apparent.10 A PROM with rapid administration, good feasibility and informational output could be a starting point to support decisions and actions in practice.9

The concept of work ability is a combination of health, functioning, basic standard competence and the relevant occupational traits required for managing reasonable work tasks in an acceptable environment.16 17 Functioning is closely related to health and comprises a psychological, social, physical and cognitive dimension.18–24

Unemployed people generally have poorer health and work ability than those who are employed.25–28 The pressures of modern working life accumulate among those who are in a weak labour market position. This is a heterogenous group of working-age people who persistently have challenges attaining employment. This can be due to a lack of employment history, a low level of education, disabilities, chronic or multiple health problems, prolonged unemployment or a migrant background.29–32

The Abilitator is a generic, multidimensional PROM for measuring the self-reported work ability and functioning of the population in a weak labour market position.33 34 It consists of a digital questionnaire34 (online supplemental file 1), analyses the responses and produces two kinds of reports: an individual written feedback report for the respondent (online supplemental file 2) and a group-level report (online supplemental file 3). The group-level report can be constructed by the user organisation from their accumulated data in the Abilitator’s digital service, in either a cross-sectional or longitudinal setting.34

Supplemental material

Supplemental material

Supplemental material

As an unemployed person has no job, the content related to work ability and functioning in the Abilitator corresponds to the general demands of working life, including the constructs of employability and inclusion.33 The Abilitator’s purpose is to help the service clients to identify their strengths and challenges in terms of their work ability and functioning. It is designed to provide the client with a basis for individual goal setting in a dialogue with service professionals. Another purpose is to help the professionals to implement the most suitable services for each client. It also provides the professionals with a means to follow the client’s progress in a positive and empowering way.33

A high-quality PROM is valid, reliable, responsive and interpretable.6 A previous study on the Abilitator’s content validity found that it sufficiently comprehensively covered the aspects necessary for enabling the assessment of the overall work ability and functioning of the population in a weak labour market position.33 The instrument also had acceptable concurrent validity for assessing different aspects of the functioning of working-age people.35 However, further evidence of the Abilitator’s psychometric properties is needed, because the demand to implement the instrument in different services is growing. The reliability of the Abilitator has not been examined before.

Aims

The first aim of this study was to evaluate the Abilitator’s intrarater test–retest reliability and measurement error. The second aim was to analyse the Abilitator’s internal consistency. The third aim was to evaluate the basic psychometric properties of the Abilitator.

Methods

The Abilitator self-report questionnaire

The Abilitator was developed in 2014–2017 at the Finnish Institute of Occupational Health (FIOH) in the national coordination project Social Inclusion and the Change of One’s Work Ability and Capacity (Solmu), funded by the European Social Fund (ESF) Priority 5 programme (2014–2022).33 The Abilitator is currently used in Finland to support primary-level decision-making in different services through which large numbers of clients meet professionals with various occupational backgrounds.34 36 37

The Abilitator has the following nine domains (sections): A. Personal information, B. Well-being, C. Inclusion, D. Mind, E. Everyday life, F. Skills, G. Body, H. Background information and I. Work and the Future.34 Each section contains 4–14 items (online supplemental file 1). The measure of each reported section is a summary scale, with a score of 0%–100% of the selected items, which cover different aspects of work ability and functioning33 (figure 1). In this study, we analysed the reliability of the five individual items in section B and the summary scales derived from sections C, D, E, F and G, covering the dimensions of social functioning and inclusion, psychological functioning, everyday skills, cognitive functioning and physical functioning, respectively. The Overall situation score, which is the mean of the sums derived from these five summary scales, was also included in the analysis. The Abilitator is currently available in nine languages, but we only included the Finnish version in this study.

Figure 1

The Abilitator’s domains, total number of items in each domain, items included in the summary scales and summary scales included in the overall situation score.

Study populations and procedure

The data collection for the test–retest study was conducted by FIOH in 2018–2019 in cooperation with the ESF Priority 3 and 5 projects. The Priority 3 programme aims to promote employment among young unemployed adults and other groups in a weak labour market position.38 The Priority 5 programme aims to improve the work ability and functioning of people outside working life to help them proceed on employment paths and to strengthen social inclusion.38 First, we recruited those Priority 3 and 5 projects, that were already using the Abilitator as a part of their client assessment procedures. The recruitment was conducted via Solmu’s Facebook page and by directly contacting the potential projects’ project managers. Second, the client participants were recruited face to face at the group meetings arranged in cooperation with the ESF projects’ staff and FIOH’s researchers. All phases of the test–retest study were incorporated in the preplanned daily activities of the ESF projects. These activities included, for example, social interaction, light physical activity, gardening, cooking and rehabilitative work tasks such as sorting clothes for recycling.

The Consensus-based Standards for the selection of the Health Measurement Instruments (COSMIN) recommendation for a minimum sample size in reliability studies with two repeated measurements with intraclass correlation coefficient (ICC) 0.8 (95% CI ±0.1) is at least 50 participants.6 However, we aimed for an additional 30% (n=15) to account for possible non-response.39 The participants completed the questionnaire at two time points (figure 2). They first voluntarily responded to the Abilitator as a part of the ESF project procedures (test 1). The second questionnaire (test 2) was preplanned in cooperation with the projects’ staff, to be completed within 7–14 days of test 1. Before completing the second questionnaire, the project participants were informed both verbally and in writing of the ongoing test–retest reliability study, data protection and the voluntary nature of participation. The participants who agreed to participate in the study completed both questionnaires independently on paper and mostly as part of group activity sessions, during which one or two project employees were available to help if any questions arose. Some participants completed both questionnaires independently but always with a project employee nearby. For the internal consistency analysis, we used a larger set of data (n=14 895), which were collected between 2017 and 2019 in an online database maintained by FIOH. These data were accrued from the ESF projects and other organisations that provide services for the unemployed. FIOH obtained the organisations’ consent to use these data for research purposes.

Figure 2

Flow chart of the test–retest study protocol with the number of non-respondents and analysed cases.

Patient and public involvement

The Abilitator was codeveloped and assessed by members of academic expert panels, practical expert panels of service professionals and target group clients.33 In this study, neither the participants nor the service professionals were involved in the design phase or in the dissemination of the study results. However, the latter group was involved in participant recruitment, in scheduling the test–retest period, and in conducting the data collection in cooperation with research scientists from FIOH.

Data analysis

The data analysis included the items and summary scales described in the Abilitator’s group-level report34 (online supplemental file 3). The study followed the guidelines of the COSMIN panel.6 First, we described the study population and analysed the number of missing cases. We also used the independent samples t-test to determine: (1) whether the two different ESF groups in the test–retest sample were different from each other, and (2) whether the larger sample used for the internal consistency analysis differed from the test–retest sample in terms of the reported variables. The significance level was set at p≤0.05 for all the analyses. In accordance with de Vet et al,6 the data were analysed for possible floor and ceiling effects. These effects may occur if more than 15% of all responses score at the lower or upper end of the scales. Second, we analysed the relative reliability of the test–retest data using the paired samples t-test (t) and Wilcoxon signed ranks test (z). Depending on whether the data obtained were normally distributed, the correlations between the two testing points were analysed using the Pearson’s r (r) or Spearman’s Rho (rs) tests, and a value of ≥0.70 was considered a strong correlation.40 41 We also calculated the ICCconsistency to assess the consistency of the test–retest measurements. The ICC values were interpreted as follows: ≥0.70 acceptable, ≥0.80 acceptable for research use, and ≥0.90 acceptable for clinical use.6 We further analysed the test–retest data for effect sizes for gender and ESF priority axes using two-way analysis of variance (ANOVA), where Partial Eta-squared ƞ2 <0.06 was considered a small effect, ƞ2>0.06–0.13 a medium effect and ƞ2>0.14 a large effect.42 Third, we quantified absolute reliability using the Bland-Altman method by plotting the differences (d) between test 1 and test 2 against the means of the two measurements with 95% CI and 95% limits of agreement (LOA).43 In this study, an acceptable result was that the line of equality, meaning zero, fell within the 95% CI of d.44 45 The measurement error reflects the intraindividual variation in the scores and was estimated as the SE of measurement (SEMpooled).6 The SEMpooled values were further converted into the smallest detectable change (SDC95), which indicates the smallest within-person change with 95% CI that can be interpreted as a true change above the measurement error.46 Fourth, an internal consistency analysis was conducted on the summary scales (C-G, Overall situation), and the acceptable level for Cronbach’s alpha (α) was set at between 0.70 and 0.90.6 Participants who responded in a language other than Finnish or had no language choice information available were excluded from the analysis (n=3972). All analyses were conducted using IBM SPSS Statistics V.27.47

Results

Participant characteristics

Table 1 presents the participant characteristics in the test–retest study. In all, 67 participants responded to the Abilitator during test 1, 46% of whom were participants of the ESF Priority 3 projects, and 54% of the ESF Priority 5 projects. The two priority groups were statistically significantly different in all reported variables other than item B1 and scale D. Mind. On average, the Priority 3 group had higher Abilitator scores even though they were significantly older. Most participants had good perceived general functioning (B3) and perceived work ability (B4), but in the Priority 3 group, more participants were distributed in the higher point categories. The duration of unemployment (I2) was also statistically significantly different, as the participants in the Priority 5 group had been unemployed for a longer time and more participants had never been in employment. The test 2 response rate was 70%–100% across different sections (figure 2). The median response time from test 1 to test 2 was 14 days, with 86% (n=56) responding within 7–14 days. No floor or ceiling effects were found, as less than 15% of the responses per analysed item or summary scale scored at the lowest or highest score of the scale. The data set including the participants in the internal consistency analysis (n=10 923) was statistically significantly different (p<0.05) from the test–retest participants in all reported variables other than items B1 and B2 and scales C. Inclusion, D. Mind, and G. Body. On average they were younger, with a mean age of 38.90 (95% CI 38.68 to 39.12, SD 13.32), and their scores were lower in both B3, with a mean of 6.86 (95% CI 6.82 to 6.89, SD 2.13), and B4, with a mean of 6.15 (95% CI 6.11 to 6.19, SD 2.53). In terms of duration of unemployment (I2), the participants in the internal consistency analysis were statistically significantly different from the Priority 3 group but not the Priority 5 group.

Table 1

Participant characteristics by the European Social Fund (ESF) Priority groups in the test–retest study.

Relative intrarater test–retest reliability of the Abilitator

Table 2 presents the results of the relative test–retest reliability analysis of the Abilitator. The values obtained from the Pearson’s r (r) and Spearman’s Rho (rs ) tests showed strong positive (≥0.70) to very strong positive correlations (≥0.90) between test 1 and test 2. An exception was item B1, which showed a moderate positive correlation of 0.60. The paired samples t-test showed that only C. Inclusion differed statistically significantly in the repeated measurements, with a higher mean in test 2. The Wilcoxon signed ranks test revealed that even though the median of C. Inclusion did not change statistically significantly (p≤0.05), test 2 had more positive ranks. This difference was statistically significant when the paired values of variable C. Inclusion were analysed using the Abilitator’s feedback categories, as the Wilcoxon signed ranks test (z=−2.00, p=0.46) showed that there was a statistically significant positive shift as 10 participants reached one category higher in test 2. The effect sizes were small in both the paired samples t-test (D<0.50) and the Wilcoxon signed ranks test (R<0.30). The effect sizes analysed using two-way ANOVA revealed that neither gender nor participation in the ESF Priority 3 or 5 programmes influenced the Abilitator’s test–retest results (ƞ2<0.06). The ICC values were above the acceptable level of 0.70 in all items other than B1.

Table 2

The relative test–retest reliability of the Abilitator

Absolute intrarater test–retest reliability and the internal consistency of the Abilitator

Table 3 presents the results of the absolute test–retest reliability analysis and the measurement error. In the Bland-Altman analysis, zero was included within the 95% CI of d in all measurements other than C. Inclusion. This implies that changes occurred in summary scale C within the 14-day period. The SDC95 for the Abilitator’s summary scales varied from 5.10% to 7.71%. The internal consistency values (α, 95% CI) of the summary scales were C. Inclusion 0.91 (0.909 to 0.914), D. Mind 0.88 (0.876 to 0.883), E. Everyday life 0.86 (0.859 to 0.865), F. Skills 0.87 (0.863 to 0.870), G. Body 0.74 (0.733 to 0.748) or 0.75 (0.685 to 0.796) if mobility aid was used, and Overall situation 0.86 (0.856 to 0.867). The α level was within the acceptable range in all summary scales other than C. Inclusion, in which it was slightly higher than recommended.

Table 3

The absolute test–retest reliability of the Abilitator with SE of measurement and smallest detectable change

Discussion

In this study, we assessed the intrarater test–retest reliability, measurement error, internal consistency and basic psychometric properties of the Finnish version of the Abilitator, in line with the COSMIN guidelines. The study covered the items and the summary scales presented in the Abilitator’s group-level report. The results showed that the relative reliability of the Abilitator varied from acceptable to excellent. The test–retest correlations (r, rs) were high or very high and the intraclass correlations (ICC) fulfilled the acceptable, research or clinical use criteria. In addition, the means of the two tested time points were not different. An exception was item B1. Life satisfaction, with unacceptable rs and ICC. It thus seems to describe the respondent’s situation only at the moment of responding. In addition, the mean of summary scale C. Inclusion showed a small but statistically significant positive change during the test–retest period. This change was also statistically significant when analysed using the Abilitator’s feedback categories. However, the effect sizes indicate that the impact of these changes was small. In terms of absolute reliability, a small degree of variation in the measurements among individuals was observed. The analyses showed acceptable agreement (LOA) between test 1 and test 2, with no systematic pattern, except in summary scale C. Inclusion, in which a small but true change occurred beyond SEM and SDC. The Abilitator had good to excellent internal consistency, which shows that the items in each summary scale are closely related. However, in summary scale C. Inclusion, α was only borderline acceptable. In addition, neither floor nor ceiling effects were observed for any of the analysed items or summary scales, which means that neither the items nor the scales need to be reduced or extended when this instrument is used in the target population.

There are four possible explanations for the significant differences in summary scale C. Inclusion. First, this scale might not be a reliable PROM. The analyses conducted do not support this explanation, as the scale’s test–retest correlation was high and intraclass correlation reached acceptable criteria. However, the internal consistency of summary scale C. Inclusion was slightly too high, and therefore item reduction could be considered in the future. Second, this scale might be very responsive and sensitive to small changes in social functioning and social inclusion. This aspect will be analysed in more detail in a future study of the Abilitator’s responsiveness. Third, between test 1 and test 2, the participants took part in ESF project activities. These activities focused on strengthening social inclusion and social interaction. Examination of the test–retest results of the individual items revealed that the significant positive change occurred in items C7. ‘I feel part of society’ and C16. ‘I find it easy to get to know new people’. These aspects of inclusion and social functioning might have been strengthened by the services in which the study population participated. Another test–retest study should thus be conducted without service activities during the test–retest period. Fourth, the dimensions of social functioning and social inclusion may be very sensitive to change in general, and they may be even more challenging to reliably measure in the target population. It has been suggested that another person, such as a family member, could assess social functioning objectively alongside the respondent’s self-report.48 However, it is known that prolonged unemployment may lead to social exclusion, marginalisation, reduced self-esteem and feelings of shame.49–54 Therefore, it might be more encouraging for the target population for the assessment procedures to be based on self-reports.

As there are no gold standard measures of self-reported work ability and functioning for the population in a weak labour market position, we compared the results of the Abilitator with two validated, commonly used self-report instruments: the Work Ability Index (WAI)14 and the WHO Disability Assessment Schedule (WHODAS 2.0).15 The WAI is used by occupational health services and research to assess employee work ability.14 It has also been administered to the unemployed.54 Overall, the WAI has been found to be a reliable instrument for self-reported work ability.55–60 The mean ICC values ranged from 0.72 to 0.84,55 56 60 and α levels from 0.65 to 0.82.57–59 WHODAS 2.0 is a generic PROM instrument for measuring health and disability.15 The test–retest reliability of WHODAS 2.0 was good, with ICC at the domain level ranging from 0.93 to 0.96, and at the item level from 0.69 to 0.89. The internal consistency of WHODAS 2.0 was α=0.96 overall and ranged from 0.79 to 0.96 at the domain level.61 62 The WAI and WHODAS 2.0 studies had larger sample sizes and longer response periods than the current study of the Abilitator. Those studies with a test–retest period over 14 days may have tested the stability of the phenomenon rather than reliability of the method.5 Moreover, it was not clearly stated whether any intervention was implemented during the test–retest period, which was not the case in our study. The Abilitator reached similar test–retest reliability and internal consistency to that of the WAI and WHODAS 2.0. However, WHODAS 2.0 reached ICC levels high enough for clinical use more often than the WAI and the Abilitator. The main difference in the test–retest study results was that the WAI and WHODAS 2.0 had no items or sections with significant differences within the response period, but the Abilitator did, in summary scale C. Inclusion.

This study had several strengths. First, it was conducted in a realistic service setting with a target population that used real-life services. Second, the study covered a wide range of participants within the target group, as the population in a weak labour market position is very heterogenous. As anticipated, the ESF Priority 5 participants were in a weaker labour market position than the ESF Priority 3 participants. The larger population in the internal consistency analysis had poorer work ability and general functioning than the test–retest population. Despite these strengths, the study also had limitations. The first was that the population in the test–retest study used services within the response time period. Therefore, the results of the test–retest analyses might have been influenced by the activities conducted in the ESF projects. The second weakness of this study was the sample size. Even though it was sufficient for the test–retest analysis, a larger sample may have enabled subgroup analyses. For example, in terms of summary scale G. Body, we had to exclude those who used a mobility aid, because the sample size was too small for a reliable analysis in this subgroup.

There are also practical implications of this study. First, those who use the Abilitator can now assume that the instrument is reliable. However, the results are only generalisable to the target group and to those responding in Finnish. Second, the new information about the Abilitator’s SEM and SDC allows for easier interpretation of its results. Third, this study showed that life satisfaction is not a reliable aspect to measure in the target group. It also showed that rapid positive changes in social functioning and inclusion may occur when the target group is participating in low-threshold service activities such as those that the ESF projects provide.

Conclusion

This study showed that the Finnish version of the Abilitator is a reliable tool to use with individuals in a weak labour market position, and that it has acceptable to excellent intrarater test–retest reliability and internal consistency. Overall, the Abilitator meets the reliability requirements for systematic use in different services aimed at the target group. In most parts, the Abilitator also meets the reliability requirements for research and clinical use. However, in item B1, conclusions about the respondents’ life satisfaction should be drawn with caution, and further testing needs to be conducted to determine the reliability of summary scale C. Inclusion. Future research on the Abilitator’s psychometric properties should cover its structural and predictive validity and responsiveness. Moreover, item reduction and the definition of a clinically important change could promote the Abilitator’s usability and strengthen its position as a systematically implemented PROM in a wide range of primary level services.

Data availability statement

Data are available upon reasonable request. Access to the data can be requested from the corresponding author in line with the FIOH data sharing policy.

Ethics statements

Patient consent for publication

Ethics approval

All parts of this study were approved by the ethics board of the Finnish Institute of Occupational Health in June 2017, reference number ETR Joensuu 6/2017. The three external expert panels involved in the co-development process were informed both verbally and in writing of the research and voluntary participation was emphasised before co-development took place.

Acknowledgments

The staff of the ESF Priority 5 Project Solmu participated in recruitment of ESF Priority 3 and 5 projects that took part in this study. The staff and participants of one ESF Priority 3 project, and three ESF Priority 5 projects volunteered to cooperate in data collection. In addition, Adjunct Professor Vesa A. Niskanen, PhD from the University of Helsinki generously consulted with us in the data analysis.

References

Supplementary materials

Footnotes

  • Twitter @WikstromMiia, @AKouvonen

  • Contributors MW was responsible for the study design, data collection, data analyses, and writing of the manuscript. MW is acting as the guarantor of this study. AK and MJ contributed to the study design, data analyses and critically revising the manuscript. MJ also participated in the data collection. MJ is the principal investigator of the Abilitator studies at FIOH. All the authors have read and approved the manuscript.

  • Funding This work was supported by the European Social Fund (ESF) Priority 5 programme via Solmu-project, grant number S20237. The funders played no role in the study design, data collection, data analysis, interpretation or writing of the report.

  • Competing interests All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.