Article Text


Inter-rater reliability of the Berg Balance Scale, 30 s chair stand test and 6 m walking test, and construct validity of the Berg Balance Scale in nursing home residents with mild-to-moderate dementia
  1. Elisabeth Wiken Telenius1,
  2. Knut Engedal2,
  3. Astrid Bergland1
  1. 1Faculty of Health Sciences, Department of Physiotherapy, Oslo and Akershus University College of Applied Sciences, Oslo, Norway
  2. 2Department of Psychiatry, Norwegian Centre of Aging and Health, Vestfold Health Trust, Tønsberg, Norway
  1. Correspondence to Elisabeth Wiken Telenius; elisabeth-wiken.telenius{at}


Objective When testing physical function, patients must be alert and have the capacity to understand and respond to instructions. Patients with dementia may have difficulties fulfilling these requirements and, therefore, the reliability of the measures may be compromised. We aimed to assess the inter-rater reliability between pairs of observers independently rating the participant in the Berg Balance Scale (BBS), 30 s chair stand test (CST) and 6 m walking test. We also wanted to investigate the internal consistency of the BBS.

Design Cross-sectional study.

Setting We included 33 nursing home patients with a mild-to-moderate degree of dementia and tested them once with two evaluators present. One evaluator gave instructions and both evaluators scored the patients’ performance. Weighted κ, intraclass correlation coefficient (ICC) model 2.1 with 95% CIs and minimal detectable change (MDC) were used to measure inter-rater reliability. Cronbach's α was calculated to evaluate the internal consistency of the BBS sum score.

Results The mean values of the BBS scored by the two evaluators were 38±13.7 and 38.0±13.8, respectively. Weighted κ scores for the BBS items varied from 0.83 to 1.0. ICC for the BBS's sum score was 0.99, and the MDC was 2.7% and 7%, respectively. The Cronbach’s α of the BBS's sum score was 0.9. The ICC of the CST and 6 m walking test was 1 and 0.97, respectively. The MDC on the 6 m walking test was 0.08% and 15.2%, respectively.

Conclusions The results reveal an excellent relative inter-rater reliability of the BBS, CST and 6 m walking test as well as high internal consistency for the BBS in a population of nursing home residents with mild-to-moderate dementia. The absolute reliability was 2.7 on the BBS and 0.08 on the 6 m walking test.


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • The study included a well-defined population of older people living in a nursing home and scoring 1 or 2 on a Clinical Dementia Rating Scale.

  • Three commonly used clinical tests were evaluated.

  • The number of participants was limited.


The worldwide prevalence of people with dementia is estimated to nearly double every 20 years, reaching 40.8 million in 2020 and 90.3 million in 2040.1 Dementia affects balance, mobility and gait performance,2–4 and people with dementia have a twofold increased risk of falls compared to non-demented elderly.5 Even though the literature is unequivocal, studies show important benefits through exercise and physical activity for older adults with dementia in areas of physical health, including activities of daily living (ADL) and of mental health.6–9 Consequently, physical therapists are likely to be treating an increasing number of people with dementia.10 For this reason, the demand for reliable and valid measures to assess physical function in these patients will increase.11 According to Hauer and Oster,12 testing of physical function assumes that test participants are able to (1) comprehend the test commands, (2) develop an adequate physical action and sequence and (3) remember both during execution of the test. Another prerequisite is that test participants show adequate attention during testing. The presence of dementia will influence these factors and could thereby affect reliability.

The lack of reliability tested physical function instruments for nursing home patients with dementia has been repeatedly expressed in the literature.13 ,14 To the best of our knowledge, only one other study has investigated the reliability of the BBS in a population of nursing home residents.15 In that study, 67% had dementia. They demonstrated a high ICC value but a relatively low-absolute reliability (minimal detectable change) of 7.7 points. However, inter-rater reliability was not tested. Suttanon et al,16 found that the reliability of different mobility and balance measures ranged between fair to excellent in a population of mostly community dwelling elderly people with Alzheimer's disease. The authors stressed the importance of considering reliability when deciding which balance and mobility measures to use for this group.

Three functional tests were investigated in this study: the Berg Balance Scale (BBS), 30 s chair stand test (CST) and 6 m walking test. Balance is often impaired in older people with dementia, and improvement in balance is an important goal of rehabilitation.17 Measuring balance can assist the clinician in selecting the most appropriate therapy and outcome measurement.18 ,19 The BBS is used extensively in the clinic, has frequently been compared with other balance measures and is considered to be the gold standard of measuring balance.20 ,21 The BBS has been found to have a high intrarater and inter-rater reliability, but variable absolute reliability.22 The 30 s CST is one of the most important functional evaluation clinical tests because it measures lower body strength and relates it to the most demanding daily life activities.23 ,24 Lower limb muscle weakness has been identified as a risk factor for falls and for the inability to perform lower extremity functional tasks such as walking, sitting-to-standing transfers, climbing steps and lower body dressing.25–27 Walking speed is associated with reduced balance ability and increased risk of falling. It can predict health status, survival and hospital costs.28–30 Walking speed tests are frequently used to evaluate mobility in elderly people.31 ,32

Test-retest reliability has been more frequently investigated than inter-rater reliability.10 ,33 However, during rehabilitation, an elderly patient may be assessed by more than one physiotherapist, and high reliability between scorings made by different evaluators are therefore essential. This is also important when testing in multicentre research projects. We aimed to assess the inter-rater reliability between pairs of observers independently rating the participant in BBS, CST and 6 m walking test. We also wanted to assess the internal consistency of the BBS.



We included 33 participants residing in four different nursing homes in the area around Oslo, Norway. They were recruited from a randomised controlled trial that aimed to investigate the effect of a high-intensity exercise programme in nursing home residents with dementia. The inclusion criteria were: being above 55 years of age, having dementia to a mild or moderate degree, as measured by the Clinical Dementia Rating scale (CDR 1 or 2), being able to stand up alone or with the help of one person and being able to walk 6 m with or without a walking aid. The exclusion criteria were: patients being medically unstable, psychotic or having severe communication problems. Details about the participants can be found in table 1.

Table 1

Demographic characteristics of participants


The study was carried out by two physiotherapists. The examiners were trained in the standardised instructions of the tests and had experience from testing 120 patients in a study 3 months earlier. The patients were tested only once, in the following order: the BBS first, followed by the CST and 6 m walking test; the whole test procedure took about 30 min. The two physiotherapists scored the test performance simultaneously without knowledge of each other's rating (‘blind’), and alternated between instructing the participant and observing the patient. In this way, they both administered the test in half of the patient population. The reason for choosing this model was: some of the participants were undergoing rehabilitation and could have improved, and if they had been tested on two different days within a week, their performance could have changed and, thus, test-retest reliability would have been biased. Certain steps were taken to optimise communication with the participants on all tests.34 The progression of cueing was predefined and based on suggestions by Vogelpohl et al.35 The first step was verbal cueing, which progressed to demonstrating/mirroring, and then to tactile guidance and physical assistance.


The BBS is a performance-based instrument originally developed by Berg et al36 for assessment of functional balance in older adults. The BBS assesses performance on five levels, from 0 (cannot perform) to 4 (normal performance), on 14 different tasks involving functional balance control, including transfer, turning and stepping, giving a score between 0 (poor) and 56 (normal). It takes 15–20 min to complete. We used the Norwegian version of the test.37

The 30 s CST measures lower limb muscle strength. The score equals the number of rises from a chair in 30 s with arms folded across the chest.23 During performance of the 6 m walking test the participant walks 6 m at comfortable speed with or without a walking aid. The time in seconds was recorded and calculated to metres per second.38

To measure the patients’ dependence/independence in the ADL, we employed the Barthel Index (BI), a widely used questionnaire for assessing ADL.39 ,40 The CDR Scale and the mini-mental state examination (MMSE) were used to measure cognition. We used the CDR to validate the dementia diagnosis of the patients. Two Norwegian studies have shown that CDR staging is a valid substitute for dementia assessment among nursing-home patients, to rate dementia and dementia severity.41 ,42 The MMSE was used to assess global cognition and consists of 20 items concerning orientation, word registration and recall, attention, naming, reading, writing, following commands and figure copying.43 Information about the participants’ medical history was obtained from their medical records.


Written and verbal information about the study was given to the patients and their relatives by their primary caregiver. All the participants gave written consent to participate and were informed that they could refuse to participate at any stage in the study.


Inter-rater reliability for the sum score of the BBS, CST and 6 m walking test was measured with intraclass correlation coefficients (ICCs) in SPSS V.22. The ICC quantifies the relative reliability where the relationship between two or more sets of measurements is examined. An ICC of 1 corresponds to perfect agreement. An ICC of 0.8 or higher reflects high relative reliability, between 0.6 and 0.8 moderate reliability and less than 0.6 indicates poor reliability.44 According to Shrout and Fleiss, 1979, the ICC category in the current study was case 2 because the evaluators are considered to be a random sample from a population of potential raters.45 To test absolute reliability we calculated SEM, minimal detectable change (MDC)95 and MDC95%.46Embedded Image

Inter-rater agreement on individual items of the BBS was analysed with weighted κ. The weighted κ score measures the agreement among raters, adjusted for the amount of agreement expected by chance and the magnitude of disagreement.47 A κ value of 0.75 or higher indicates excellent agreement, between 0.4 and 0.74 indicates fair to low agreement and less than 0.4 indicates poor agreement.48 Weighted κ was calculated in Excel V.2011 for Mac with Real Statistics Resource Pack. Cronbach's α for each evaluator's scorings were calculated to assess the internal consistency of the BBS. Cronbach's α is regarded as excellent when it is higher than 0.9, as good between 0.7 and 0.9 and as acceptable between 0.6 and 0.7.49 Internal consistency of the BBS was also tested by item-to-total correlation. An item-to-total correlation shows the degree of association between each individual item and the total score of the other items in the scale. An item-to-total correlation is considered adequate if it is above 0.4.44


Demographic characteristics

Thirty-three nursing home residents (25 women, 8 men) with mild-to-moderate dementia participated in this study. Mean stay at the nursing home was almost 2 years, however, it ranged between 3 months and 9 years. Four of the participants used a wheelchair, and 17 used Zimmer frames to move about. The most common neurological diseases among the participants were stroke (n=3) and migraine (n=3). The most common heart diseases were hypertension (n=10), atrial fibrillation (n=4) and angina pectoris (n=3), and most common musculoskeletal diseases were osteoporosis (n=4) and arthritis in the knee or hip (n=2). Characteristics are presented in table 1.

Distribution of scores

The mean total score ±SD of the BBS was similar between the evaluators (table 2). Table 3 demonstrates the distributions on the BBS for each evaluator. The table shows the number of patients with a score of zero, one, two, three and four on each item. On the CST, the two evaluators scored identically. On average, the participants walked 6 m in 12 s, which equals a speed of 0.5 m/s.

Table 2

ICC of BBS, CST and 6 m walking test

Table 3

Distribution of Berg Balance Scale scores from each evaluator: evaluator 1 (E1) and evaluator 2 (E2)

Inter-rater reliability

Weighted κ scores for each of the 14 items on the BBS obtained by the evaluators varied from 0.83 to 1 (table 4). On the BBS, the evaluators scored differently on only 32 occasions of the total 462, which gives an agreement per cent of 93.1. ICC for the BBS's sum score was very high. The MDC indicates that a change score of almost three points can be caused by the effect of being tested by a different evaluator and not necessarily clinical change. The CST had an ICC of 1, while the 6 m walking test ICC score was 0.98 with an MDC of 0.47 (table 2).

Table 4

Weighted κ of the individual items of the BBS

Construct validity

Cronbach's α coefficient of the BBS was 0.948. The correlation matrices, which included the 14 items of the BBS and sum score, are presented in table 5. The item-to-total correlations were r>0.4 for all items except item 3. The scores were very uniform on item three: one participant scored 0 and the rest scored 4 points.

Table 5

Correlation matrix


The weighted κ in the current study ranged between 0.83 and 1, indicating an excellent inter-rater reliability when using the BBS in a population of nursing home residents with dementia. These results fit well with the results from studies on other populations.36 ,50–52 The ICC of the BBS sum score was very high, which also concurs with studies on multiple sclerosis-patients52 and lower limb amputees.53 In the current study, the MDC was 2.7, which means that one must allow for a difference in almost 3 points between evaluators. In agreement with other studies,37 ,52 our findings indicate a high internal consistency of the BBS. All of the item-to-total correlation coefficients were 0.6 or above (except item number 3 because of little variability within scores). The high internal consistency of the BBS showed that the items of this instrument measured the same concept. Some of the items showed fairly high correlation, and a few correlation coefficients exceeded 0.9, which may indicate item redundancy. This should be investigated further.

In our study, the mean value of BBS was 38 points. A study from three nursing homes in Sweden demonstrated a mean BBS score of 30 points.15 Reasons for this discrepancy may be that our participants took part in an exercise study and therefore were more fit than the general nursing home population, and that we had somewhat stricter inclusion criteria regarding physical function. However, the current population had a lower mean MMSE score (16 points) than the Swedish study (17.5 points). It is interesting to note that even when testing a fitter group of nursing home residents, there does not seem to be a ceiling effect of the BBS, as none of the participants scored the maximum amount of points on it.54 Only one participant scored 0 points, which means no floor effect was detected for this population. Floor and ceiling effect have been shown in other studies.51 ,55 Our results concur with the results of Halsaa et al.37

The ICC of the 6 m walking test was also very high, and this has been found in similar populations by others.56 Their study demonstrated high inter-rater reliability for both the 4 and 6 m walking test, with ICC of 0.96 and 0.88, in a group of elderly participants with cognitive impairment from both a day centre and a nursing home. The participants in the current study scored lower on the CST (6±3.2) than a similar population in a study by Blankevoort et al13: 8.1±2.95. They also had a slower walking speed, 0.5 m/s±0.2 versus 0.8±0.3 m/s, respectively. To the best of our knowledge, inter-rater reliability has never before been investigated on the CST. In our study, the two evaluators scored identically on the CST. Discrepancies in interpretation of when not to approve repetitions (participant fails to fully extend hip/knee or does not sit down between counts) were expected, but the two evaluators agreed in all 33 performances. Both the CST and 6 m walking test have been found to have good test-retest reliability in a similar population of elderly people with dementia, living at home or in a nursing home, with a mean MMSE score of 19 (range 10–28).13

Limitations of the study

We had a relatively small sample size; nevertheless, there was sufficient information to make interesting observations in a population not frequently included in research studies. One limitation of the study is that the inclusion criteria restrict our findings to nursing home residents who can rise from a chair with one person's help and who are able to walk 6 m with or without a walking aid. Even though some of the participants used an electrical wheel chair and managed to move 6 m only with the help of walking aids, this means that the frailest have not been included. In the clinic there may be more than two raters, therefore it may be considered a limitation that this study only investigated the use of two evaluators. The evaluations were performed simultaneously. This may lead to an overestimation of reliability due to the fact that one evaluator watches the other evaluator instruct and score. The second evaluator may thereby gain information about the instructor's scoring through watching his/her positioning, body language or choice of words.

Implications for practice

This study indicates that the BBS, 30 s CST and 6 m walking test have very good inter-rater reliability in older people with dementia living in nursing homes, and that the tests can be used both in research and for clinical purposes, to assess physical functioning. Studies report that older individuals with cognitive impairments benefit from exercise regimens.7 ,57 Our study shows that patients with mild-to-moderate dementia are able to take instructions, which makes reliable assessments possible.


The results reveal an excellent relative inter-rater reliability of the BBS, CST and 6 m walking test, as well as high internal consistency for the BBS, in a population of nursing home residents with mild-to-moderate dementia. The absolute reliability was 2.7 on the BBS and 0.08 on the 6 m walking test.


View Abstract


  • Twitter Follow Elisabeth Telenius at @#bissilissi

  • Contributors EWT, KE and AB participated in contribution to the design of the study, accountability for all aspects of the work and approval of the published version. EWT was involved in drafting of the work. KE and AB were responsible for revising the work.

  • Funding This study is funded by the Norwegian Extra Foundation for Health and Rehabilitation.

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethics approval The study was approved by the Regional Committee for Medical Ethics in south east of Norway on 5 September 2012.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.