Objective To determine the diagnostic accuracy of tuning fork tests for detecting fractures.
Design Systematic review of primary studies evaluating the diagnostic accuracy of tuning fork tests for the presence of fracture.
Data source We searched MEDLINE, CINAHL, AMED, EMBASE, Sports Discus, CAB Abstracts and Web of Science from commencement to November 2012. We manually searched the reference lists of any review papers and any identified relevant studies.
Study selection and data extraction Two reviewers independently reviewed the list of potentially eligible studies and rated the studies for quality using the QUADAS-2 tool. Data were extracted to form 2×2 contingency tables. The primary outcome measure was the accuracy of the test as measured by its sensitivity and specificity with 95% CIs.
Data synthesis We included six studies (329 patients), with two types of tuning fork tests (pain induction and loss of sound transmission). The studies included patients with an age range 7–60 years. The prevalence of fracture ranged from 10% to 80%. The sensitivity of the tuning fork tests was high, ranging from 75% to 100%. The specificity of the tests was highly heterogeneous, ranging from 18% to 95%.
Conclusions Based on the studies in this review, tuning fork tests have some value in ruling out fractures, but are not sufficiently reliable or accurate for widespread clinical use. The small sample size of the studies and the observed heterogeneity make generalisable conclusion difficult.
- Qualitative Research
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strength and limitations of this study
Based on the studies in this review, tuning fork tests have value in ruling out some fractures, but current evidence is insufficient to state the circumstances when it is reliable.
Quantification of the degree and causes of heterogeneity of the studies was not feasible, because of small sample size and varying methods of the studies.
Therefore, this review does not support the current clinical use of tuning forks as a triage test for the diagnosis of fractures.
Although imaging for suspected fractures is generally cheap and readily accessible, there are situations such as remote settings, where imaging is not readily available. Other clinical tests for fracture may then assist in decision making. One test which was proposed at least 60 years ago is the use of a tuning fork.1
Two methods of using tuning forks to detect fracture(s) have been developed. The first method uses a vibrating tuning fork placed directly over, or closely proximal to the suspected fracture site. Because the periosteum is heavily innervated, mechanical vibration over a fracture site stimulates the overlying periosteum, causing pain.2 The pain stops or decreases with the removal of the tuning fork. The second method uses a vibrating tuning fork placed over a bony prominence distal to the fracture site. Using a stethoscope to listen to the sound over a bony prominence proximal to the fracture site, the fracture is detected by a reduction in the sound conducted along the bone compared to the unaffected limb.1
The aim of this review was to identify the techniques used to diagnose fractures using a tuning fork and assess all studies of the diagnostic accuracy of tuning fork tests for the presence of fracture.
The inclusion criteria for the review were primary studies that assessed the diagnostic accuracy of tuning forks, using either pain or reduction of sound as the index test, measured against a recognised reference standard, such as X-ray, MRI or bone scan for the diagnosis of fractures. We included studies that enrolled patients of all ages and in all clinical settings with no exclusion by the language of publication. We excluded case series, case–control studies and narrative review papers.
We searched MEDLINE, CINAHL, AMED, EMBASE, Sports Discus, CAB Abstracts and Web of Science from commencement to November 2012. We also searched the reference lists of any identified studies or review papers. We also searched for any systematic reviews or meta-analyses carried out on this diagnostic test.
The Medline search strategy is shown in box 1, and was run without a methodological filter.
Ovid MEDLINE (<1948 to November Week 3 2012>)
tuning fork*.tw. (302)
barford test*.tw. (1)
tf test*.tw. (79)
exp Fractures, Bone/(133424 )
5 and 8 (20)
Data extraction and management
We selected studies in a two-stage process. The titles and abstracts of all search results were screened by two authors (KM and JD) and full manuscripts for all potential relevant papers were obtained. Two review authors (KM and JD) independently reviewed each paper for inclusion according to the predefined inclusion criteria, rated the study quality and then extracted relevant data. In the case of duplicate publication, we selected the most complete version of the study. We resolved disagreements through discussion with the third author (PG).
The primary outcome measure of interest was the accuracy of the test as measured by its sensitivity and specificity. Wherever possible, we used the raw data to construct 2×2 tables. 95% CIs for sensitivity and specificity were calculated with the Wilson score method and 95% CIs for positive and negative likelihood ratios were calculated with the method described by Simel et al.3 ,4 We appraised each article using the QUADAS-2 tool.5
Literature identification and study quality
We identified 62 citations from the electronic and bibliographic searches. Sixteen articles in full text were obtained for further scrutiny. Six primary studies (329 patients) were included in the final review (figure 1).
The characteristics of the participants and the methods of testing are shown in table 1. Most studies included only adults; one study included paediatric patients. The prevalence of fracture ranged from 10% to 80%. Two studies used the tuning fork test to investigate any suspected fracture,2 ,6 one suspected femoral neck fracture,7 one ankle inversion injury8 and two stress fractures.9 ,10 The studies investigating any fracture, femoral or ankle fractures used X-ray as a reference standard and the studies of stress fractures used either bone scan or X-ray and bone scan as a reference standard. The study of patients with ankle inversion injuries included patients who had tested positive to the ‘Ottawa ankle rule’.
Four studies detected fractures using pain induced by the vibrating tuning fork,2 ,8–10while two studies used reduced sound conduction.7 ,6 Four studies used a 128 Hz tuning fork alone,6–9 but two studies compared the diagnostic accuracy of different frequency tuning forks within the studies.2 ,10
The methodological quality of the included studies was modest, with important elements that may indicate a risk of bias being unclear or not reported. For example, in most studies it was either unclear or not stated whether the comparison between the tuning fork test and the reference test had been blind and independent of the reference standard (table 2).
Figure 2 shows sensitivity versus 1-specificity (receiver operating characteristic plot) for the six included studies. The sensitivity of the tuning fork tests was generally high, ranging from 75% to 100%. In the study to rule out fracture in patients who had tested positive to the ‘Ottawa ankle rule’, the use of the tuning fork on either the tip of the lateral malleolus or the distal fibula shaft gave a sensitivity of 100%, albeit there were only five patients with fractures.8 However, the specificity of the test in the six studies was highly heterogeneous, ranging from 18% to 95%.
Two studies showed reasonable overall diagnostic accuracy with diagnostic ORs >10, but other studies showed only modest values (table 3). The two studies that compared the diagnostic accuracy of different frequency tuning forks on the same patients found no differences between frequencies.2 ,10 One study assessed the differences between pain ratings but differences were small. The study that assessed inter-tester reliability showed only low reliability.9
Two forms of tuning fork test, one based on pain induction and the other on sound transmission, showed modest diagnostic accuracy with some ability to rule out fractures. However, the estimated sensitivity (ranging from 75% to 100%) is not sufficient to be relied on to rule out fractures based on a negative test. The specificity is particularly heterogeneous, potentially resulting in a high proportion of false-positive test results. The reasons for this variation in accuracy are unclear, but may be related to both the way the test is performed or to characteristics of the injuries and fractures.
The low inter-tester reliability suggests that the techniques would benefit from standardisation and training. Wilder et al10 compared different frequencies and found a higher induction of fracture pain using 256 Hz, but pain also occurred in patients without fractures resulting in a low specificity.
Based on the results in this review, the tuning fork test was less accurate for stress fractures than other types of fractures, but a number of features of this type of injury may modify the accuracy. Lesho9 suggests that in the early stages, stress fractures might not be identified by the tuning fork test, because the bone shell is still more or less intact. A bone scan, however, would show an increased activity in the fractured area. Timing may also affect the accuracy of the test.
A mineralised callus where fracture healing has been initiated might not be identified by these tests. It is unclear whether a discontinuity of the cortical bone is required in order to give a positive test result. Both types of tuning fork tests seem to be more accurate in diagnosing transverse fractures than other types of fractures. It is also unclear whether swelling or bruising in the area of the injury might affect the results.
A systematic review,11 which examined a variety of methods for the diagnosis of stress fractures, included only two of the six studies we used in this review.
In conclusion, both tuning fork methods have some discrimination ability, but current techniques are not sufficiently reliable or accurate to rule in or out fractures and currently should have only limited use in clinical practice. The small sample size of the studies and the observed heterogeneity make generalisable conclusion difficult. However, the clinical usefulness of these tests might be in remote areas or athletic fields with no easy access to other options.
The authors extend their gratitude to SarahThorning (Trial search coordinator, Bond University) for valuable help in literature search, and Elaine Beller (Statistician, Bond University) for statistical support.
Contributors KM, JD, BK and PG contributed to the concepts of the work and acquisition, analysis and interpretation of data. KM drafted the work. JD, BK and PG revised the work critically for important intellectual content. All authors approved the final version.
Funding Kayalvili Mugunthan was supported by a Primary Health Care Research Evaluation & Development (PHCRED) fellowship, Bond University.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.