Article Text

Download PDFPDF

Diagnostic accuracy of transthoracic echocardiography for pulmonary hypertension: a systematic review and meta-analysis
  1. Jin-Rong Ni1,2,3,4,
  2. Pei-Jing Yan5,6,7,
  3. Shi-Dong Liu1,2,
  4. Yuan Hu2,
  5. Ke-Hu Yang5,6,7,8,
  6. Bing Song2,
  7. Jun-Qiang Lei1,3,4,9
  1. 1The First Hospital (the First Clinical Medical School) of Lanzhou University, Lanzhou, China
  2. 2Department of Cardiovascular Surgery, the First Hospital of Lanzhou University, Lanzhou, China
  3. 3Intelligent Imaging Medical Engineering Research Center of Gansu province, Lanzhou, China
  4. 4Precision Image and Collaborative Innovation International Scientific and Technological Cooperation Base of Gansu province, Lanzhou, China
  5. 5Institute of Clinical Research and Evidence Based Medicine, Gansu Provincial Hospital, Lanzhou, China
  6. 6Evidence-Based Social Science Research Center, Lanzhou University, Lanzhou, China
  7. 7Key Laboratory of Evidence-based Medicine and Knowledge Translation of Gansu Province, Lanzhou, China
  8. 8Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, China
  9. 9Department of Radiology, the First Hospital of Lanzhou University, Lanzhou, China
  1. Correspondence to Dr Jun-Qiang Lei; leijunqiangldyy{at}; Bing Song; songbingldyyxwk{at}


Objective To evaluate the diagnostic accuracy of transthoracic echocardiography (TTE) in patients with pulmonary hypertension (PH).

Design Systematic review and meta-analysis.

Data sources and eligibility criteria Embase, Cochrane Library for clinical trials, PubMed and Web of Science were used to search studies from inception to 19 June, 2019. Studies using both TTE and right heart catheterisation (RHC) to diagnose PH were included.

Main results A total of 27 studies involving 4386 subjects were considered as eligible for analysis. TTE had a pooled sensitivity of 85%, a pooled specificity of 74%, a pooled positive likelihood ratio of 3.2, a pooled negative likelihood ratio of 0.20, a pooled diagnostic OR of 16 and finally an area under the summary receiver operating characteristic curve of 0.88. The subgroup with the shortest time interval between TTE and RHC had the best diagnostic effect, with sensitivity, specificity and area under the curve (AUC) of 88%, 90% and 0.94, respectively. TTE had lower sensitivity (81%), specificity (61%) and AUC (0.73) in the subgroup of patients with definite lung diseases. Subgroup analysis also showed that different thresholds of TTE resulted in a different diagnostic performance in the diagnosis of PH.

Conclusion TTE has a clinical value in diagnosing PH, although it cannot yet replace RHC considered as the gold standard. The accuracy of TTE may be improved by shortening the time interval between TTE and RHC and by developing an appropriate threshold. TTE may not be suitable to assess pulmonary arterial pressure in patients with pulmonary diseases.

PROSPERO registration number PROSPERO CRD42019123289.

  • hypertension
  • echocardiography
  • diagnostic radiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • A comprehensive search was conducted in the main database, more studies were included and a large sample size was obtained.

  • Detailed subgroup analysis and sensitivity analysis were performed.

  • The types of pulmonary hypertension included in the studies could not be distinguished.

  • Significant heterogeneity in our study limits the interpretation of the results.


The prevalence of pulmonary hypertension (PH) is estimated at 1% in the general population, and as high as 10% in the 600 million people older than 65.1 Early detection and accurate assessment are vital to obtain better outcomes for PH patients.2 Right heart catheterisation (RHC) is the gold standard in the diagnosis of PH,3 but it is invasive and cannot be used frequently or repeatedly.4 The latest guideline for PH recommends transthoracic echocardiography (TTE) as a non-invasive test for screening.3

High quality meta-analysis has been considered as one of the key tools for achieving evidence.5 6 Three systematic reviews and meta-analysis regarding the diagnostic accuracy of TTE for PH were published between 2010 and 2013.7–9 Studies included in these meta-analyses were all published before 2010. In addition, two of them included fewer studies and performed a simple diagnostic data synthesis.8 9 The other included a relatively large number of studies, but did not assess a detailed subgroup analysis.7 In recent years, TTE has still been used in the clinical diagnosis of PH, and many new original studies have been published.10–13 Therefore, the purpose of our study was to undertake a comprehensive systematic review and quantitative meta-analysis on the accuracy of TTE in the diagnosis of PH.


The present study is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement and the published recommendations.14 15 The detailed protocol is accessible in PROSPERO.16 17

Data sources and search

A systematic search in Embase, Cochrane Library for clinical trials, PubMed and Web of Science was performed to find the relevant literature from inception to 19 June, 2019. Subject words were combined with free words, and the search strategy was developed and adapted for each database. and the trials registers on the WHO International Clinical Trials Registry Platform were used to search unpublished trails. The references of the included studies and other systematic reviews and meta-analysis were also reviewed to obtain a comprehensive list of included studies.

Study selection

Studies were selected based on the following inclusion criteria: studies that diagnosed PH by TTE, study population represented by patients with suspected PH, TTE measurement of systolic pulmonary artery pressure (SPAP) performed using tricuspid regurgitation, RHC as the gold standard for the diagnosis of PH.

The exclusion criteria were the following: insufficient data to construct a 2×2 table, studies with less than 20 subjects, duplicate data were used (in this case, the largest sample or the latest study was selected).

Two reviewers (J-RN and P-JY) independently screened the eligible studies for suitability. Disagreements were resolved by consensus. If consensus could not be reached, a third reviewer (S-DL) was deferred to arbitration and consensus. No language restriction was applied. If a study was not conducted in the authors’ language, a professional translation software could be used.

Data extraction

The data were extracted independently by two reviewers (J-RN and P-JY) according to a predefined data extraction sheet. The following variables were extracted from the included studies: lead author, publication year, country of study, study design, study population demographics, sample size, mean age, male ratio, time interval between TTE and RHC, cut-off threshold levels for TTE and RHC and number of true-positive (TP), false-negative (FN), true-negative (TN) and false-positive (FP) observations. Extracted data were cross-checked and disagreements were resolved via discussion or referral to a third reviewer (YH).

Quality assessment

The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was used to assess the risk of bias and clinical applicability concerns of the included studies according to the Cochrane Collaboration recommendation.18 19 Two reviewers (J-RN and P-JY) independently evaluated QUADAS-2 items, and all emerging conflicts were resolved by consensus.

Data synthesis and statistical analysis

Statistical analysis was performed using Stata/SE V.15.1 (StataCorp, College Station, Texas) and Review Manager V.5.3 software (Copenhagen, Denmark, Nordic Cochrane Centre, Cochrane Collaboration, 2014). All tests were two-tailed. A p value <0.05 was considered statistically significant.

The correlation coefficient between the logarithm of sensitivity and logarithm of one minus specificity was calculated to test whether the threshold effect was one of the sources of heterogeneity.20 Deeks’ test was used to test for publication bias.21 The bivariate model for diagnostic meta-analysis was used to obtain pooled estimates of sensitivity and specificity.22 Statistical heterogeneity among studies was explored using the I2 statistic.

Pooled sensitivity, specificity, diagnostic OR (DOR), positive likelihood ratio (PLR), negative likelihood ratio (NLR) and area under the summary receiver operating characteristic (SROC) curve were calculated from the number of TPs, FNs FPs, and TNs. The 95% CI was estimated for each metric.

Subgroup analyses were performed based on the following variables: the time interval between TTE and RHC, disease classification of the study population, publication year of the study, study design (prospective or retrospective) and cut-off threshold of TTE to diagnose PH. Sensitivity analysis was undertaken by excluding low-quality studies (according to the QUADAS-2 quality assessment) or trials with characteristics different from the others.


Studies selection and characteristics

Figure 1 shows the PRISMA flow chart of the literature screening. A total of 27 articles involving 4386 subjects met our inclusion criteria (table 1).10–13 23–45 Habash’s study was divided into two independent parts because of the differences between the case group (Habash-1) and the control group (Habash-2).27

Figure 1

Flowchart for identification of the studies. ﹡Habash’s study was divided into two independent parts because of the differences between the case group (Habash-1) and the control group (Habash-2). A total of 27 studies were included, but 28 sets of data were analysed.

Table 1

Characteristics of each study included in this meta-analysis

Of the 27 eligible studies, 14 (52%) were published between 2010 to 2019,10–13 26 27 30 33 34 39 41 43–45 and 13 (48%) were published before 2010.23–25 28 29 31 32 35–38 40 42 Twelve (44%) studies were performed in Europe,12 24 26 32–35 37–39 43 44 nine (30%) in the USA,10 13 23 25 27 28 31 40 42 two (8%) in East Asia,29 36 three (12%) in the Middle East11 41 45 and one (4%) in Australia.30 Most of the studies (15/27, 56%)11 12 23 24 28–32 35 36 38 39 41 45were of prospective design versus 44% (12/27)10 13 25–27 33 34 37 40 42–44 retrospective.

All included studies used the tricuspid maximal regurgitation velocity (TRVmax) to estimate SPAP; the majority of these studies (23/27, 85%) used the classical method to calculate SPAP: 4TRVmax2+right atrial pressure (RAP).10 11 13 23–28 31–37 39–45 The RAP was calculated through the diameter and collapse rate of the inferior vena cava (IVC) during spontaneous respiration in 16 (59%) studies,10 23 25–27 31 33 35–37 39–42 44 45 through the jugular vein pressure in one study (4%),24 and using a fixed value (5 or 10 mm Hg) in three studies (11%).28 32 34 Three studies (11%) did not report their method for calculating RAP.11 13 43 Four studies (15%) used a tricuspid gradient (4TRVmax2 instead of SPAP).12 29 30 38

The majority of the studies (22/27, 81%) reported the time interval (mean or maximum) between TTE and RHC,10–13 23–29 31–35 38 40–42 44 45 while five (5/9, 19%) did not.30 36 37 39 43 Nine studies (33%) considered time intervals greater than 1 week,10 13 24 25 27 31 38 40 42 while 13 studies (48%) considered time intervals of less than 1 week.11 12 23 26 29 32–35 37 39 41 44 The time interval between TTE and RHC ranged from 4 hours to 3 months.

Quality assessment

The quality assessment of the included studies according to the QUADAS-2 inventory is shown in figure 2. Overall, the quality of the included studies was modest. The included studies were of good quality regarding the applicability concerns, but most of them were of low quality in the risk of bias. In 20 (74%) study protocols,10–13 23 24 26 28–32 34 35 37–39 41 44 45 consecutive subjects were enrolled, with no inappropriate exclusions. The risk of bias during patient recruitment was unclear in the remaining seven (26%) studies,25 27 33 36 40 42 43 as patient recruitment was not reported. In six (22%) studies investigators designed the single-blind methods for TTE.10 12 23 26 39 45 Double blinding in imaging assessment was not mentioned in any study. The risk of bias on flow and timing between the index test and reference standard was categorised as unclear in 14 (52%) study protocols that did not explicitly state the successful investigation with both index and reference tests in all included patients.24 30–40 42 43

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements regarding each domain for each included study (n=28).

Quantitative analysis

The SROC curve for TTE is shown in figure 3. Four studies fall within the 95% CI.11 26 34 44 The area under the curve (AUC) was 0.88 (95% CI 0.85 to 0.90). The pooled sensitivity and specificity for TTE were 85% (95% CI 81% to 90%) and 74% (95% CI 64% to 81%), respectively (figure 4). The pooled PLR and NLR were 3.2 (95% CI 2.3 to 4.4) and 0.20 (95% CI 0.15 to 0.26), respectively. The pooled DOR for TTE was 16 (95% CI 10 to 27).

Figure 3

Summary receiver operating characteristic graph with 95% confidence region and 95% prediction region for transthoracicechocardiography in the diagnosis of pulmonary hypertension (n=28).

Figure 4

Forest plot of the sensitivity and specificity of each individual study, summary sensitivity and specificity and I2 statistic for heterogeneity (n=28).

The heterogeneity in our study was significant. The threshold test proved that the threshold effect was not the source of heterogeneity (r=﹣0.34, p=0.12). Deeks’ test for funnel plot asymmetry suggested no publication bias (p=0.69). The results of the subgroup analysis are presented in table 2. The sensitivity (87%, 95% CI 81% to 91%), specificity (74%, 95% CI 62% to 83%) and AUC (0.89, 95% CI 0.86 to 0.91) of TTE to diagnose PH were higher for studies published in 2010 and later compared with those published before 2010. Among the time interval subgroups, the group with the shortest time interval between TTE and RHC had the best diagnostic effect, with sensitivity, specificity and AUC of 88% (95% CI 73% to 95%), 90% (95% CI 53% to 99%) and 0.94 (95% CI 0.92 to 0.96), respectively. The disease composition of the study population also affected the diagnostic accuracy of TTE. Compared with patients with other diseases, TTE had lower sensitivity (81%, 95% CI 70% to 88%), specificity (61%, 95% CI 53% to 69%) and AUC (0.73, 95% CI 0.69 to 0.77) in the subgroup of patients with definite lung diseases.

Table 2

Subgroup analysis

Subgroup analysis of different cut-off thresholds to diagnose PH based on TTE showed that the subgroup with a cut-off threshold of 35 mm Hg had a higher diagnostic accuracy than that at 40 mm Hg. The sensitivity, specificity and AUC of the former were respectively 92% (95% CI 88% to 94%), 65% (95% CI 43% to 83%) and 0.92 (95% CI 0.89 to 0.94), while the sensitivity, specificity and AUC at 40 mm Hg were 84% (95% CI 75% to 91%), 52% (95% CI 31% to 71%) and 0.80 (95% CI 76% to 83%), respectively.

The sensitivity analysis results are shown in table 3. After excluding low-quality studies and studies with specific characteristics, the sensitivity analysis did not reveal a source for the heterogeneity in the diagnostic accuracy analysis. Overall, the pooled meta-analysis results for outcomes were in accordance to our sensitivity analyses.

Table 3

Sensitivity analysis for diagnostic accuracy meta-analysis


Our study found that TTE has a better sensitivity but moderate specificity for the detection of PH. In addition, shortening the time interval between TTE and RHC and developing an appropriate threshold could improve the accuracy of TTE. However, the accuracy of TTE to diagnose PH in patients with lung diseases was low.

Although PH is a chronic disease, we still believe that the shortest possible time interval between TTE and RHC is more favourable. Otherwise, changes in the patient's condition and the application of intervention measures would lead to an increase in the deviation of the results of the two examinations. A detailed subgroup analysis was performed according to the time interval between TTE and RHC. As expected, the diagnostic accuracy was the highest when the time interval was less than or equal to 24 hours. The results also showed that the efficacy of TTE in the diagnosis of PH was gradually reduced with the extension of the time interval.

Subgroup analysis based on the disease composition of the population suggested that the diagnostic accuracy of TTE was lower in patients with lung diseases. Changes associated with chronic pulmonary disease, including a marked increase in intrathoracic gas, consolidation of lung tissue, expansion of the thoracic cage and alterations in the position of the heart, adversely affect the imaging quality and the parameter measurement of TTE.46 Therefore, the use of TTE to measure pulmonary pressure in patients with lung diseases might not be an ideal choice.

The Guideline recommend the use of IVC width and collapse rate to estimate RAP,3 which was not used in some of the included studies. The sensitivity analysis for this point showed that studies which calculated RAP through IVC do not seem to have a higher diagnostic performance. In order to avoid errors caused by RAP estimation, TRVmax was also considered as an indicator to evaluate the possibility of PH. Four studies using tricuspid regurgitation pressure gradient (TRPG) (4TRVmax2) instead of SPAP were grouped into a subgroup and showed that this subgroup had good diagnostic specificity but poor sensitivity.

The sensitivity analysis based on the mean pulmonary artery pressure (MPAP) threshold of 25 mm Hg did not result in a higher diagnostic value than the whole, indicating that the overall results were stable. A previous work suggested that a MPAP threshold of 25 mm Hg is arbitrarily chosen and lowering it to 20 mm Hg (two SDs higher than MPAP for the population) is considered a scientific method.47 However, some scientists insist that it is premature to reduce the MPAP threshold to 20 mm Hg because of the risk of over-diagnosis, unclear treatment implications and additional psychological burden on patients.48 Since none of the study we included used MPAP >20 mm Hg as the diagnostic threshold for RHC, subgroup analysis on the two thresholds of 20 mm Hg and 25 mm Hg could not be performed. Therefore, we expect that more studies may be performed in the future to verify the appropriate threshold of RHC.

In our review, the cut-off thresholds of SPAP ranged from 30 to 50 mm Hg. Subgroup analysis showed that the diagnostic accuracy of the group of 35 mm Hg was higher. Sensitivity analysis results of studies that excluded high TTE cut-off value showed that a high cut-off value increased the specificity and reduced the sensitivity of TTE. Due to the small sample size of the subgroup in this study, the value of the cut-off threshold still needs to be determined by further prospective studies of multicentre and large samples.

Subgroup analysis according to the publication year confirmed that studies published after 2010 had only a slightly higher diagnostic accuracy than previous studies. With the improvement of TTE technology and instruments in the past 10 years, the diagnostic efficiency of PH has not been significantly improved, which forces us to pay attention to other TTE parameters.49 50 Perhaps, this could be a new direction for future studies on PH diagnosis.


Several limitations are present in our study. First, the systematic review and meta-analysis is a secondary research method based on original research and the quality of the included study affects the results. In addition, the possibility of missing relevant articles objectively exists, and significant heterogeneity may limit the interpretation of the results. Second, the accuracy of echocardiography relies heavily on the operator's ability, experience and operational discipline. In order to obtain more original studies, we did not consider this aspect as an exclusion criterion. Third, the studies included in this review involve several different types of PH, and some of the included studies do not describe the basic disease and PH type in detail. It is clear that pulmonary lesions can affect the quality of TTE imaging, leading to underestimated results.


TTE has clinical value in the diagnosis of PH thanks to its better sensitivity and moderate specificity, but it cannot yet replace RHC considered as the gold standard. Shortening the time interval between TTE and RHC and developing an appropriate threshold can improve the accuracy of TTE. TTE may not be suitable to assess pulmonary arterial pressure in patients with pulmonary disease. It may be necessary to combine multiple TTE parameters and conduct multicentre, large-sample studies to further improve the accuracy of TTE in the diagnosis of PH in future research.



  • J-RN and P-JY are joint first authors.

  • Contributors The joint corresponding authors (J-QL and BS) are responsible for the design and implementation of the study. S-DL is responsible for the quality control of study selection. YH performed the quality control on the links of data extraction. K-HY provided guidance in literature retrieval and data processing methodology and was responsible for the quality evaluation part. J-RN and P-JY performed the systematic review of the literature and extracted the data. J-RN conducted the meta-analyses, and two authors (J-RN, P-JY) substantially contributed to the interpretation of the data and wrote the article. All authors repeatedly revised the article. The corresponding authors (J-QL and BS) and J-RN take responsibility for the integrity of the analyses.

  • Funding This study was supported by the Key Laboratory of Evidence Based Medicine and Knowledge Translation Foundation of Gansu Province (Grant No. GSXZYZH2018006).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.