Article Text


Development and validation of a screening tool for the identification of inappropriate transthoracic echocardiograms
  1. Ricardo Fonseca1,
  2. Faraz Pathan1,
  3. Thomas H Marwick2
  1. 1Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
  2. 2Baker IDI Heart and Diabetes Institute, Melbourne, Tasmania, Australia
  1. Correspondence to Dr Thomas H Marwick; tom.marwick{at}


Objective We sought whether simple clinical markers could be used in a questionnaire for recognition of inappropriate (or rarely appropriate, RA) tests at point-of-service. Most applications of appropriateness criteria (AC) for transthoracic echocardiogram (TTE) have been at the point of order, but a simple means of identifying RA tests in an audit process would be of value.

Design, setting and participants The study was performed in 2 major hospitals in Tasmania. 2 reviewers created a questionnaire based on 4 questions most commonly associated with RA (suspected endocarditis with no positive blood cultures or new murmur, lack of cardiovascular symptoms or no change in clinical status or cardiac examination, routine surveillance and previous TTE within a year) in a derivation cohort of 814 patients. This was prospectively applied to 499 TTEs to calculate sensitivity and specificity for prediction of RA, and validated in the external group (n=880).

Results Of 499 prospective TTEs, the questionnaire selected 18% requests as being potentially RA. As 7.4% were actually RA (κ 89%), the sensitivity and specificity of the questionnaire were 84% and 87%, respectively. In the external validation cohort, the model found 11% requests needed to be screened for appropriateness with a sensitivity and specificity of 80% and 95%.

Conclusions A questionnaire based on 4 questions detects a high proportion of RA TTE, and could be used for audit.

Statistics from

Strengths and limitations of this study

  • Four binary questions encapsulate characteristics of rarely appropriate (RA) tests according to the appropriate use criteria for echocardiography.

  • The questionnaire, applied to the transthoracic echocardiogram requests, selected around one in five requests as being potentially RA.

  • Two or more affirmative answers had a high sensitivity and specificity to discriminate RA tests.

  • It is a feasible tool which can be used at the point of service to screen for inappropriate tests with a low impact on the workflow.

  • The use of this approach requires review of medical records to adjudicate appropriateness when inadequate information is provided on the request form.


Investigations constitute an important component of resource utilisation and waste in medicine. The appropriateness criteria (AC) for transthoracic echocardiography (TTE)1 have become widely adopted in the USA, and seek to control resource usage, reduce variability of practice and to improve decision-making and patient care.2 Evaluations of appropriate use have exposed potential targets for improvement,3 ,4 but despite attempts to decrease rarely appropriate (RA) testing at the point of order, there is limited evidence of a decline in the number of tests,5–9 and in some cases, results are contradictory.4 ,10 AC are probably not responsible for the reduction in imaging over the past decade;11 the trend of appropriate and RA tests has not improved during this time.12 This lack of clear improvement in the rate of RA tests12 contrasts with the heightened level of awareness on RA test use.2 ,5 ,13 A recent review noted that guidelines for quality in cardiovascular imaging advocate implementation of AC, though there are limited effective methods to reduce the rate of inappropriate testing. They contend that part of the complexity of implementing AC may be unfamiliarity with the classification and the time required to review each imaging request with the guidelines.14

Various processes have been devised to police RA requests by screening at point-of-order, for example, based on the use of radiology benefit managers or software integrated with the ordering process.15–17 An alternative approach might be laboratory-based,6 but matching patient details against >100 AC is impractical and inefficient. In this study, we sought to determine whether a simple point-of-service questionnaire (PSQ) based on the most common RA characteristics according to the 2011 AC for TTE1 ,18 could facilitate recognition of these tests in the echocardiography laboratory.


We sought to develop the PSQ and then perform a diagnostic accuracy analysis.19 The model was developed in three steps (figure 1). Eligible requests were selected from two separate hospitals and at the times specified in steps 1–3. TTE requests were excluded for analysis if the patient was <18 years, or if classification was not possible because of insufficient clinical documentation.

Figure 1

Design of the study.

The study design pertained to appropriate selection of tests already ordered by the patient's physician. We elected not to discuss the uncertainties about appropriate use with these patients. Nonetheless, we have discussed appropriate use with patient representatives in meetings about medical expenditure in Australia. Because of the copayments associated with outpatient echocardiography in this country, this is perceived as a very important topic.

Our target condition was to determine ‘inappropriate’ (also known as ‘rarely appropriate’, RA) TTE requests at the point of service as adjudged by the reference standard with the ‘Appropriateness Criteria for echocardiography (AC)’.1 We developed then assessed the index test (PSQ). We compared the accuracy of our index PSQ model with the reference AC standard by evaluating sensitivity, specificity, positive and negative predictive value for each of the responses as well for each of the possible cut points: one affirmative answer versus two affirmative answers versus three affirmative answers.

  1. Derivation of model: The most common causes of inappropriate tests described by the 2011 AC for TTE were identified from a retrospective group of 814 TTE requests, at a teaching hospital. After this analysis, four questions which summarise those characteristics were identified. The ‘PSQ’ comprised binary questions based on the characteristics of RA tests in our derivation cohort and also accounted for published characteristics seen in the choosing wisely programme14 and published research.18

  2. Internal validation: We then tested these questions in an internal validation cohort to ascertain their ability to identify RA requests as judged against the gold standard (AC code for a specific request). The four most common characteristics for RA tests (PSQ) were applied prospectively to all the requests (n=499) for a TTE at the same tertiary referral hospital, between March and May 2015.

  3. External validation: The PSQ was applied to a cohort of 880 requests at a regional referral hospital between May and August 2015.

Patient demographic information, inpatient/outpatient distribution, referring physician (cardiologist or not) and the indication for the study were determined from the request form. Investigators reviewed the digital medical record to capture any additional information, especially when confronted with inadequate information. The time required to access additional information was recorded whenever such an action was necessary and the result was averaged for all such requests.

For each of the steps, a general physician (RF) and a cardiologist (FP) independently recorded the results for each of the questions in the PSQ. Appropriateness of requests was scored by the same observers, independently of the PSQ evaluation, using the 2011 AC.1 Each study was scored as appropriate, RA (previously described as ‘inappropriate’) or maybe appropriate (previously ‘uncertain’). If the main indication was not listed in the AC, investigators were asked to select ‘Not classifiable’.2 When there was disagreement, a consensus between reviewers was reached. If no consensus was attained, a third investigator (THM) reviewed the data and determined AC score. This AC score served as the ‘Gold Standard’ by which the PSQ was assessed. Where a repeat study was performed to guide management (eg, repeat TTE to evaluate reverse remodelling after 3–6 months of medical therapy), despite two affirmative responses, this request was classified as appropriate as per the AC guidelines on studies which are used to guide management.

The sensitivity, specificity, positive likelihood ratio and OR were used to define the PSQ accuracy for the prediction of RA requests using R software (R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. 2008. The above parameters were analysed for each affirmative response and cumulatively. During individual analysis, a single affirmative response (question 1, 2, 3 or 4) was compared with ‘no affirmative responses’. For the assessment of the cumulative affirmative responses, comparison was made to ‘no affirmative responses’ and to ‘no or one affirmative response’.

Interobserver variability in the scoring process to determine level of appropriateness was defined using κ statistics.


Predictors of RA tests were sought in a group of 814 patients among whom 9% were RA. Our results revealed the RA requests corresponded with indications where there were no new symptoms or no change in clinical status or cardiac examination (indications 35, 53, 10, 8 of the AC) and indications for routine surveillance (indications 88, 11, 13, 40, 28). We sought to distil the underlying markers of an inappropriate request using ‘routine studies’ and the ‘absence of a change in clinical status or new symptoms’ which accounted for 88% of RA requests. Furthermore, 28% of RA tests had a TTE within the previous year. Additionally, there was a specific clinical situation that accounted for the 26% of RA tests (Indication 53: transient fever without evidence of bacteraemia or new murmur).

We identified four features associated with RA tests: evaluation in the absence of symptoms/signs of cardiac disease or no change in clinical status, routine surveillance, existence of a previous TTE within the year of the new TTE request and suspected endocarditis with no positive blood cultures or new murmur (table 1).

Table 1

Rarely appropriate tests found in derivation group

Based on the above, we developed a PSQ composed of four binary questions:

  • Q1: Was the scan requested in the absence of new cardiovascular symptoms, or change in clinical status or cardiac examination? Note that it requires symptoms to be cardiovascular (this would include transient ischaemic attacks, strokes). Pre-existing symptoms or signs such as a long-standing murmur or dyspnoea which has been evaluated and have not changed would score as a ‘yes’ response. Therefore, a ‘yes response’ (affirmative response) to question 1 means the patient does not have any new cardiovascular sign or symptom or in those with pre-existing cardiovascular illness, there has not been a worsening of their clinical status.

  • Q2: Is this a routine surveillance scan? This captures tests being considered for a ‘periodic’ evaluation since a certain period of time has elapsed. The test is not being ordered due to the anticipation of changing clinical decision-making or guiding therapy.

  • Q3: Has there been a previous TTE within the last year?

  • Q4: Is the test requested for suspected endocarditis with no positive blood cultures or new murmur?

PSQ was applied to 501 studies in the internal validation cohort at a tertiary referral hospital and to 881 TTE requests at a regional hospital (external validation cohort). Two requests within the internal validation cohort and one request in the external validation cohort were not classifiable as it was not possible to collect information to answer the questionnaire. Final analysis was made in 499 TTE requests in the internal validation and 880 TTE requests in the external validation groups, respectively.

Table 2 shows study characteristics and appropriateness classification by group. The internal and external validation cohorts are well matched; however, the former group had a lower proportion of outpatients. The 10 most common RA indications in the internal validation group are described in table 3. Inter-rater agreement for scoring between both reviewers was high (κ=89%).

Table 2

Study characteristics and appropriateness classification according to groups

Table 3

Ten most common rarely appropriate indications in prospective internal validation cohort

When question 1 (no change in symptoms/no change in clinical status/no change in cardiac examination) was answered affirmatively, it had a higher OR, greater positive predictive value and positive likelihood ratio to pick up possible RA tests when compared independently with other questions. This was driven by examination of asymptomatic or stable patients (see online supplementary table S1).

supplementary table

diagnostic tests for each of the questions in each of the groups

In the internal validation cohort, 18% of the tests had ≥2 affirmative answers (two ‘yes’ responses). A PSQ with ≥2 affirmative responses had an OR 33.96 (13.61 to 84.78), sensitivity 0.84 (0.68 to 0.94) and specificity of 0.87 (0.83 to 0.90) for RA.

In the external validation group (n=880), ≥2 affirmative answers were 11% of total requests; it had a sensitivity of 80%, specificity of 95% and OR 83.01 (table 4).

Table 4

Diagnostic tests of the point-of-service questionnaire for rarely appropriate requests

The PSQ with ≥2 affirmative responses identified 84%, and 80% of the inappropriate tests in the internal and external validation cohorts, respectively.

Around 20% of the TTE requests provided inadequate information and it was necessary to check the digital medical records (table 5).

Table 5

Differences in time when medical record (DMR) needed to be checked


The approach proposed in this study has been to encapsulate the essence of the RA tests into four binary questions which can be used rapidly to screen for these inappropriate requests. The questions which we have used are consistent with published literature with the choosing wisely programme identifying ‘Routine studies’ and ‘no change in signs or symptoms’ as unnecessary repeat testing.14 Similarly, other authors identified that 54% of inappropriate requests had a TTE within the last year.18 Although only 28% of the inappropriate studies in the derivation cohort had a TTE within the previous year, we included this question to improve the strength of our model. Finally, given the high prevalence of inappropriate endocarditis requests (26% in our derivation cohort), it was included as the final question in our model.

The primary finding of this study is that the application of the PSQ is feasible, and it identifies a high proportion of RA tests without the need to review all the TTE requests against the AC. An affirmative response to any of the questions increases the likelihood of a test being deemed RA, and when two (or more) of the questions were answered affirmatively, the chance of determining a test as inappropriate increased more than 33 times. The results were consistent in the different cohorts and scenarios (inpatients/outpatients, test referred by cardiologists/non-cardiologists). We propose a PSQ-based method for screening appropriateness (figure 2). Using this model less than one-fifth of TTE requests would need to be audited against the AC, thereby minimising interruption of workflow. Nevertheless, occasional requests such as asymptomatic severe mitral regurgitation surveillance within 1 years (AC indication 45: uncertain) or repeat echocardiography for a heart failure patient on optimal medical therapy without a change in sings on symptoms to guide therapy (AC indication 73: appropriate) could still be performed, despite two affirmative responses.

Figure 2

Comparison between the appropriateness criteria model and point-of-service-questionnaire model.

The use of radiology benefit management companies (RBM) is still an important pole in the process of performance of cardiac imaging, although one of the aims of the AC was to reduce the need of those companies.20 Prior authorisation and claim denials continue to be the top challenges of the process.21 The results of this study show that the use of the questionnaire provides a transparent solution which can be implemented with minimal delay at point-of-service, thereby minimising the need for RBMs or other middlemen.

Several attempts have been made to improve appropriateness at the point-of-order, implementing software to control the request of inappropriate tests.7 ,17 ,22 However, the use of those tools at the point-of-order is susceptible to indication drift: for example, the real indication for testing may be RA, but inactive problems are appropriate. Perhaps for this reason, the AUC literature has shown little or no improvement in requesting behaviour.7 ,17 Our proposed method could be used to facilitate appropriate use audits (simplifying to 4 questions from >200 AUC, which is useful where these data are not available in electronic format), or added to the current appropriate use process at the point-of-service. At that level, the proposed approach provides a simple screening tool to flag possibly inappropriate tests at the time of scheduling.

In our study, the prevalence of RA tests varied between 7% and 9%. These values fall on the lower end of the prevalence distribution of RA tests documented in various studies.4 ,12 ,18 ,22–25 A comprehensive plan detailing the management of inappropriate requests found at the point of service is lacking. At the very least, a clear and effective communication strategy needs to be in place to inform a discussion before a ‘rarely appropriate’ test is scheduled.

This questionnaire should act as a prompt to refer to the AC rather than an absolute assessment of appropriateness. Education has resulted in increased awareness of the AC without a significant change in clinical practice.12 It is intuitive to entrust point-of-care policing regarding the appropriateness of an investigation to those with the greatest experience. However, such a policy may result in interruption of workflow, delays and a greater burden on already busy echocardiography units. Our results show that the use of four simple binary questions identifies RA tests with a high sensitivity and specificity. The questionnaire identifies potential RA requests which can then be confirmed by reference to the AC (available online, inprint or as a mobile phone applications) or by discussing the case with a cardiologist at the echocardiographic laboratory or with the referring physician.

The use of the questionnaire may face some challenges. First, the need to review medical records to adjudicate appropriateness when inadequate information was provided on request forms is a potential limitation. However, corroboration of clinical history is a common clinical practice in all imaging units as it is mandatory to establish the question being asked of an investigation and is essential to implementing a Bayesian approach to reporting. This raises a second issue which is quality control of echocardiography requests. We identified 20% of requests as inadequate requiring further corroboration of clinical history. This result was similar to recent published data which found in a review of 1303 requests that 26.2% were inadequate to determine adherence to AC.26 The study concluded that the top three reasons for inadequacy were failure to report change in clinical status or cardiac examination, date of prior echocardiogram and type and severity of valvular lesion. Though the last two would be easily accessible in any echocardiographic laboratory (assuming the previous studies were performed within that laboratory), the first failure is critical in determining appropriateness. Clearly, it follows that access to electronic medical records is a necessary and essential component of an echocardiographic laboratory's workflow.

Second, the possible difference in referral patterns between hospitals and private labs may impact on the positive predictive value of our questionnaire. The private/public divide varies tremendously across countries and to date has not been assessed in regard to compliance with AC. Though our criteria address the issue of awareness and simplicity perhaps the greatest challenge facing an overburdened medical infrastructure is the systemic dependence of investigations.

Furthermore, although our rate of inappropriate use is lower than reported in other institution, and could be seen as a limitation, we overcame this issue by performing an analysis of over 800 requests in the derivation process. We also validated this questionnaire in over 1200 patients (internal and external validation cohorts).

Finally, previous researchers have sought to differentiate the appropriateness of a study from its clinical utility arguing an RA test does not necessarily mean a clinically useless one nor does an appropriate request always correspond with a useful one.18 ,25 Ward demonstrated that 17% of inappropriate tests had ‘new important TTE abnormalities’ and Matulevicius showed that 21.7% of RA tests led to an active change in management. Thus, while the identification of inappropriate tests is a step on the path to improving quality and appropriateness in cardiovascular imaging, decision-making has to be informed by individual characteristics.

How we handle RA requests will ultimately have financial and clinical implications. The proportion of inappropriate studies varies between 7% and 23%.12 ,14 Our study shows an inappropriate rate of 7.4–8.5%. Experience from elsewhere in Australia has demonstrated inappropriate rate of 20%.27 In 2015, the total Medicare reimbursement for TTE was ∼A$186.0 million.28 Assuming a rate of inappropriate echocardiography between 7% and 20%, the cost to the Australian health system of inappropriate TTE would be between AU$13.0 and 37.2 million. The healthcare costs are clearly proportionate to the use of TTE and prevalence of appropriate use.

A mandatory AC score (appropriate or otherwise) or point-of-service score tied to funding would enhance compliance with AC and enable continuous auditing of resource usage. There are >100 categories of appropriateness and the incremental workflow issues are prohibitive. We proposed an alternative approach where response to the PSQ serves as a less cumbersome beacon of appropriateness.

We have demonstrated that ≥2 affirmative answers at a simple PSQ detect a high proportion of RA tests. This approach can be used as a red flag for inappropriate examinations and a prompt to further discussion about the suitability for testing in individual patients. We propose this PSQ can be a quality control tool that captures the majority of inappropriate use, in the absence of the infrastructure that supports AC in North America, and a simple marker for departmental and regional audits.


View Abstract


  • RF and FP should be regarded as joint first authors.

  • Contributors THM and RF designed the study. RF and FP were responsible for data collection and data analysis under supervision of THM. RF and FP prepared the manuscript draft. THM edited and approved the final manuscript. All the authors had full access to all of the data (including statistical reports and tables) in the study.

  • Funding RF is supported by a scholarship from the Farrell Foundation, Hobart, Australia.

  • Competing interests None declared.

  • Ethics approval This study was approved by the Tasmanian Human Research Ethics Committee, reference number: H0014017.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.