Objectives The American Orthopaedic Foot and Ankle Society (AOFAS) Ankle-Hindfoot Scale is among the most used questionnaires for measuring functional recovery after a hindfoot injury. Recently, this instrument was translated and culturally adapted into a Dutch version. In this study, the measurement properties of the Dutch language version (DLV) were investigated in patients with a unilateral hindfoot fracture.
Design Multicentre, prospective observational study.
Setting This multicentre study was conducted in three Dutch hospitals.
Participants In total, 118 patients with a unilateral hindfoot fracture were included. Three patients were lost to follow-up.
Primary and secondary outcome measures Patients were asked to complete the AOFAS-DLV, the Foot Function Index and the Short Form-36 on three occasions. Descriptive statistics (including floor and ceiling effects), reliability (ie, internal consistency), construct validity, reproducibility (ie, test–retest reliability, agreement and smallest detectable change (SDC)) and responsiveness were determined.
Results Internal consistency was inadequate for the AOFAS-DLV total scale (α=0.585), but adequate for the function subscale (α=0.863). The questionnaire had adequate construct validity (82.4% of predefined hypotheses were confirmed), but inadequate longitudinal validity (70.6%). No floor effects were found, but ceiling effects were present in all AOFAS-DLV (sub)scales, most pronounced from 6 to 24 months after trauma onwards. Responsiveness was only adequate for the pain and alignment subscales, with a SDC of 1.7 points.
Conclusions The AOFAS Ankle-Hindfoot Scale DLV has adequate construct validity and is reliable, making it a suitable instrument for cross-sectional studies investigating functional outcome in patients with a hindfoot fracture. The inadequate longitudinal validity and responsiveness, however, hamper the use of the questionnaire in longitudinal studies and for assessing long-term functional outcome.
Trial registration number NTR5613; Post-results.
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This prospective, multicentre, observational study shows substantial, previously unknown information about the performance of the American Orthopaedic Foot and Ankle Society (AOFAS) Ankle-Hindfoot Scale.
The topic of the clinical study is relevant for orthopaedic trauma surgeons, since there is growing need for translated and validated patient-reported outcome measures that can be used for determining functional outcome over time.
The methodological design of the study is strong, and statistical analyses complied with the COnsensus-based Standards for the selection of health Measurement INstruments guidelines.
Although the study is mostly relevant for the Dutch-speaking regions, it is also informative for other regions.
Implementation of the AOFAS Ankle-Hindfoot Scale is limited by the fact that a clinician is required to complete the physician-reported part of the questionnaire. This hampers its use in, for example, large-scale registers.
Hindfoot fractures are rare, but invalidating injuries. Since most patients are in their wage-earning age combined with the long-term disabilities, these injuries have a high socioeconomic impact.1 2 The incidence rate of calcaneal fractures is 11.5 per 100 000 person-years and these fractures occur 2.4 times more frequently in men than women.3 Fractures of the talus are even more rare with a reported annual incidence of 3.2 per 100 000, and occur 4.5 times more often in men.4 Despite the facts that these fractures are relatively rare, they have received considerable attention in recent literature, presumable by the long-term recovery and therewith socioeconomic burden.
To monitor functional outcome, quality of life and recovery after treatment, patient-reported outcome measures and other instruments are increasingly used in clinical practice and clinical research. The American Orthopaedic Foot and Ankle Society (AOFAS) Ankle-Hindfoot Scale is one of the most used assessment tools in foot surgery.5 This clinical rating system combines a patient-reported part and a physician-reported part. In its original language version the AOFAS Ankle-Hindfoot Scale, as a complete scale has been shown to be responsive and valid.6–9 The study populations involved non-traumatic diagnoses, such as general ankle-hindfoot complaints,8 pending ankle or foot surgery10 and end-stage ankle osteoarthritis.7
Recently, a Dutch version of the AOFAS Ankle-Hindfoot Scale became available.11 It was translated and culturally adapted to the Dutch population according to the guideline for Cross-Cultural Adaptation of Self-Report Measures.12 13 The AOFAS Ankle-Hindfoot Scale was shown to be valid, reliable and responsive in patients with an ankle fracture.11 Thus study aimed to determine the measurement properties of the AOFAS-Dutch language version (DLV) in patients who sustained a hindfoot fracture.
Study design and ethics statement
This multicentre, prospective, observational study was performed at three hospitals. The study is registered at the Netherlands Trial Register (NTR5613). A detailed study protocol is published elsewhere.13 The Medical Research Ethics Committees or Local Ethics Boards of all participating centres approved the study.
Patients were recruited from 1 May 2014 to 1 November 2016. Patients were identified from hospital records, based on their International Coding of Diseases, 10th revision (ICD-10) code or Diagnosis-Related Group code. Inclusion criteria were: (1) unilateral hindfoot fracture; (2) age 18 years or older and (3) provision of informed consent. Exclusion criteria were: (1) multiple trauma affecting the outcome scores); (2) pathological fracture; (3) severe physical comorbidity (ie, American Society of Anaesthesiologists ≥3); (4) patient was non-ambulatory prior to the injury; (5) insufficient comprehension of the Dutch language and (6) expected problems of maintaining follow-up.
A total of 118 individual patients were included; 78 completed t=1 and t=2, and 113 completed t=2 and t=3 (figure 1). Three patients were lost to follow-up during the course of the study.
The median age was 51 years (P25–P75 36–58) and the majority of patients (n=69; 61.1%) were men (table 1). The most common injuries were calcaneal fractures (n=82; 72.6%) and talar fractures (n=36; 31.9%). Fractures were mostly treated non-operatively (n=72; 73.6%).
Questionnaires and data collection
Demographic, injury and treatment data were collected from the patient’s medical files. To complete the physician-reported part of the AOFAS Ankle-Hindfoot Scale-DLV, a research physician or research assistant performed the physical examination using a standardised protocol. Patients were asked to complete the AOFAS-DLV patient-reported part, Foot Function Index (FFI-DLV) and the Short Form Health Survey (SF-36-DLV) questionnaires on three occasions: between 3 and 6 months after trauma (t=1), 5–6 months later (t=2) and 2–3 weeks later (t=3). Patients were allowed to participate in both the responsiveness and test–retest part. A physician completed the physician-reported part of the AOFAS-DLV.
The AOFAS Ankle-Hindfoot Scale consists of three subscales: pain, function and alignment and includes a total of nine items. The minimum score is 0 points (indicating severe pain and impairment), the maximum score is 100 points (no impairment).
The FFI is a questionnaire, which focusses on disabilities and measures the impact of foot disorders. The FFI includes three subscales: pain, disability and activity limitations, which are spread over a total of 23 items. In this scoring system, a score of 0 points means ‘no disability’, 100 points implies the highest level of disability.14
The SF-36 Health Survey is a generic measure of health status.15–22 It consists of 36 items, representing eight domains that are grouped into a physical component summary (PCS) and a mental component summary (MCS). All (sub)scales are normalised to a mean of 50 points with a SD of 10 points.
Statistical Package for Social Sciences (SPSS V.21) was used for analysis. Data are reported following the STrengthening the Reporting of OBservational studies in Epidemiology.23 Missing data were not imputed. Patient characteristics and questionnaire scores were analysed using descriptive statistics. Measurement properties of the AOFAS Ankle-Hindfoot Scale were determined in compliance with the COnsensus-based Standards for the selection of health Measurement Instruments guidelines.24 The already validated FFI and SF-36 (sub)scales were used to compare the AOFAS-DLV with. A summary of the measurement properties and statistical analysis is given in table 2. A more detailed description is published in the study protocol.13
Supplementary file 1
The changes over time in AOFAS-total, FFI-total, SF-36 PCS and SF-36 MCS are shown in figure 2. In the period from t=1 to t=2, the AOFAS, SF-36 PCS and (less pronounced) SF-36 MCS increased in scores. The FFI score decreases as expected, since this questionnaire focusses on disabilities. Scores at t=2 and t=3 were similar for all instruments.
Floor and ceiling effects
A floor effect was only present in the SF-36 RP and RE subscales at all follow-up moments. The percentage of patients reporting the minimum score varied between 52.6% (t=1) and 32.4% (t=3) for SF36 RP and between 25.6% (t=1) and 19.0% (t=3) for the SF36 RE subscale (figure 3A).
Ceiling effects were seen in several (sub)scales, especially at longer follow-up (figure 3B). The AOFAS as a total scale only showed a ceiling effect at t=3; 16.2% of patients reported the maximum score. The AOFAS pain and alignment subscales had a ceiling effect from the t=1 onwards (12.8% and 62.8%, respectively). The AOFAS function subscale showed ceiling effects from t=2 onwards (22.7%). The FFI pain and disability subscales showed ceiling effects from t=2 onwards. The FFI limitation, SF-36 RP, SF and RE subscales showed ceiling effects at all follow-up moments.
For the AOFAS total scale the Cronbach’s α was 0.585 (table 3). This may suggest inadequate internal consistency, but as the entire scale contains three subscales, this value should, however, be interpreted carefully. The Cronbach’s α for the AOFAS function subscale was 0.863, representing adequate internal consistency. Being single-item domains, Cronbach’s α could not be determined for the AOFAS pain and alignment subscales.
The FFI scale only showed adequate internal consistency for the subscale activity limitation (α=0.841). The internal consistency was not adequate for the FFI scale as a total (α=0.599) and for the subscales pain (α=0.653) and disability (α=0.558). For the total scale, this may be due to the fact that it is not unidimensional. Except for the subscale GH (α=0.627), all SF-36 (sub)scales showed adequate internal consistency.
Spearman’s rank correlations regarding construct validity are shown in table 4. Construct validity was only adequate for the AOFAS scale as a total and the function subscale, in both (sub)scales 82.4% of the predefined hypotheses were predicted correctly. For the pain subscale, only 8 out of 17 correlations (47.1%) were in accordance with predefined hypotheses. This was 12 (70.6%) for the alignment subscale. Both percentages were below the 75% threshold.
The intraclass correlation coefficient (ICC), indicating the reliability, of each (sub)scale is shown in table 5. The ICC for all AOFAS (sub)scales ranged from 0.89 to 0.97, indicating adequate test–retest reliability. For all FFI and SF-36 (sub)scales, the ICC was also adequate (>0.70).
Agreement and smallest detectable change
The level of agreement is indicated by the smallest detectable change (SDC) and the corresponding Reliable Change Index (RCI) (table 5). The SDC was 1.7 (RCI: 1.7%) for the AOFAS total scale, −2.9 (RCI: −2.9%) for the FFI total scale, 0.16 (RCI: 0.2%) for the SF-36 PCS subscale and −0.29 (RCI: −0.4%) for the SF-36 MCS subscale.
The Bland and Altman analysis shows that for each (sub)scale the 95% limits of agreement for the mean change in scores contains zero; this confirms that there is no bias in measurements (figure 4 and table 5).
Spearman’s rank correlation coefficients for longitudinal validity are shown in table 6. Longitudinal validity was adequate for the AOFAS pain and alignment subscale; out of 17 correlations, 15 (88.2%) were in line with predefined hypotheses for the pain subscale and 17 (100.0%) for the AOFAS alignment subscale. Longitudinal validity was not sufficient for the function subscale (10/17; 58.8%) and for the total scale (12/17; 70.6%).
The standardised response mean (SRM) and the effect size (ES) of the instruments are shown in table 7. The magnitude of change was large for the AOFAS total scale (SRM 0.79, ES 0.63) and moderate for the function subscale (SRM 0.94, ES 0.61). The ES were small for the one-item subscales pain (SRM 0.26) and alignment (SRM 0.06).
The results of this study showed that the AOFAS Ankle-Hindfoot Scale (AOFAS-DLV) has adequate construct validity and is reliable for measuring functional outcome in patients with a hindfoot fracture. However, longitudinal validity and responsiveness were inadequate in the study population.
Floor effects were not present for the AOFAS-DLV, but all (sub)scales showed an increasing ceiling effect over time. That suggests that an increasing number of patients achieved full recovery over time. This is in line with previous findings.11 17 The single-item subscales pain and alignment showed a ceiling effect from t=1 onwards. This could be due to the fact that (minor) extra-articular fractures may not be an issue with alignment. The high rate of operative treatment may also have improved alignment, especially for the intra-articular fractures. Alternatively, the limited answers for the pain and alignment subscales and the choice of administering the AOFAS-DLV at 3–6 months after trauma for the first time, may also have contributed to the ceiling effects.
Adequate construct validity of the AOFAS total scale and function subscale is also in correspondence with previous research.10 11 The AOFAS subscales pain and alignment did not show adequate construct validity, in contrast with earlier data in ankle fractures.11 The AOFAS pain and alignment subscales consist only of one item. In the hindfoot series, the correlations with other (sub)scales were generally overestimated for the pain subscale and underestimated for the alignment subscale. This difference is unlikely due to the (heterogeneity) in (sub)scale scores between the ankle and hindfoot fracture cohorts. There is also no clear pathophysiological explanation for this difference, other than the fact that hindfoot and ankle fractures are different injuries. Another possible explanation may be a difference in follow-up moment used for hindfoot and ankle fractures.
With a Cronbach’s α above 0.7, internal consistency of the AOFAS-DLV function subscale was adequate. For the total scale, this remains inconclusive; the Cronbach’s α of 0.585 should be interpreted carefully as the total scale is not unidimensional. In ankle fractures,11 ankle sprains25 and ankle arthroplasty and arthrodesis,26 the Cronbach’s α for the total scale ranged from 0.92 to 0.95. To our knowledge, no recent literature on this topic is available for hindfoot fractures. Deleting the pain question increases Cronbach’s α to 0.843 (data not shown). This may suggest that the pain question is difficult to answer for patients. This could be due to the fact that three out of four answers combine pain severity and frequency. Such linguistic issues have been noted before.26 27
The ICC values between 0.89 and 0.97 confirm adequate test–retest reliability of the AOFAS-DLV total scale and all subscales. Similar ICCs (ranging from 0.89 to 0.95) were found for the Turkish and Portuguese version of the AOFAS Ankle-Hindfoot Scale in patients with foot and ankle disorders.11 28 29
Responsiveness is a product of magnitude of change and longitudinal validity. The longitudinal validity of the AOFAS subscales pain and alignment was adequate (ie, >75% of the hypothesised correlations predicted correctly). However, the AOFAS subscale function and the total scale were not proven adequate, as only 58.8% and 70.6% of the predefined hypothesis were confirmed, respectively. The inadequate longitudinal validity makes the AOFAS-DLV less useful for longitudinal studies measuring recovery over time in patients with a hindfoot fracture. Longitudinal validity was adequate for all (sub)scales of the AOFAS-DLV in patients with ankle fractures in previous research.11 In the hindfoot series, the correlations of the difference in score between t=1 and t=2 with other (sub)scales were generally overestimated for the AOFAS function subscale and total scale. Similar as for the construct validity, there is no clear pathophysiological explanation for this difference, other than the difference in (severity of) the injuries and follow-up moments used.
The magnitude of change was moderate for the AOFAS Ankle-Hindfoot scale DLV as a total, with a SRM of 0.79 and an ES of 0.63. This is comparable to the magnitude of change for the total FFI (SRM 0.89, ES 0.60) and the SF-36 subscales PCS, PF and RP as in our recent study on ankle fractures.11 Previous data for hindfoot injuries are not available.
The Bland and Altman analysis confirmed absence of systematic bias for repeated recordings of the AOFAS Ankle-Hindfoot Scale-DLV. With an SDC of 1.7 points, the measurement error is very small. This measurement error was lower than reported for a variety of foot and ankle disorders in the Turkish population (SDC 13.3) and for ankle fractures in the Dutch population (SDC 12.0).11 28
The AOFAS Ankle-Hindfoot Scale DLV has adequate construct validity and is reliable, making it a suitable instrument for cross-sectional studies investigating functional outcome in patients with a hindfoot fracture. The inadequate longitudinal validity and responsiveness, however, hamper the use of the questionnaire in longitudinal studies and for assessing long-term functional outcome.
Contributors EMMVL, ASDB, DEM, CHVdV, PTDH, WET and MHJV developed the study. ASDB and EMMVL drafted the manuscript. EMMVL acted as trial principal investigator. ASDB, CHVdV, PTDH, DEM and MHJV participated inpatient inclusion and outcome assessment. ASDB, WET and EMMVL performed statistical analysis of the study data. All authors have read and approved the final manuscript.
Competing interests None declared.
Patient consent Obtained.
Ethics approval This study has been exempted by the medical research ethics committee (MREC) Erasmus MC (Rotterdam, The Netherlands). Each participant provided written consent to participate and remained anonymised during the study. The study is registered at the Netherlands Trial Register (NTR5613; 05 Jan 2016).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All data are processed in this manuscript. There are no further unpublished data from this study available.
Collaborators D A Newhall, J Romeo, R J C Tjioe, F Van der Sijde, E N Van der Velden–Macauley, L Vellekoop