Article Text

Original research
Reliability and validity of assessment methods available in primary care for bladder outlet obstruction and benign prostatic obstruction in men with lower urinary tract symptoms: a systematic review
  1. Tom Vredeveld1,2,
  2. Esther van Benten1,3,
  3. Rikie E P M Beekmans4,
  4. M Patrick Koops5,
  5. Johannes C F Ket6,
  6. Jurgen Mollema7,
  7. Stephan P J Ramaekers2,
  8. Jan J M Pool3,
  9. Michel W Coppieters1,8,
  10. Annelies L Pool-Goudzwaard1,9
  1. 1 Human Movement Sciences, Faculty of Behavioural and Movement Sciences, Amsterdam Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
  2. 2 Centre of Expertise Urban Vitality, Faculty of Health, Amsterdam University of Applied Sciences, Amsterdam, The Netherlands
  3. 3 HU University of Applied Sciences Utrecht, Institute of Movement Sciences, Utrecht, The Netherlands
  4. 4 Physiotherapy Practice Emmastraat, Enschede, The Netherlands
  5. 5 Physiotherapy Practice De Werfheegde, Haaksbergen, The Netherlands
  6. 6 Medical Library, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
  7. 7 Medical Library, HU University of Applied Sciences Utrecht, Utrecht, The Netherlands
  8. 8 Menzies Health Institute Queensland, Griffith University, Brisbane and Gold Coast, Queensland, Australia
  9. 9 SOMT University of Physiotherapy, Amersfoort, The Netherlands
  1. Correspondence to Tom Vredeveld; t.vredeveld{at}vu.nl

Abstract

Objectives To systematically review the literature regarding the reliability and validity of assessment methods available in primary care for bladder outlet obstruction or benign prostatic obstruction in men with lower urinary tract symptoms (LUTS).

Design Systematic review with best evidence synthesis.

Setting Primary care.

Participants Men with LUTS due to bladder outlet obstruction or benign prostatic obstruction.

Review methods PubMed, Ebsco/CINAHL and Embase databases were searched for studies on the validity and reliability of assessment methods for bladder outlet obstruction and benign prostatic obstruction in primary care. Methodological quality was assessed with the COSMIN checklist. Studies with poor methodology were excluded from the best evidence synthesis.

Results Of the 5644 studies identified, 61 were scored with the COSMIN checklist, 37 studies were included in the best evidence synthesis, 18 evaluated bladder outlet obstruction and 17 benign prostatic obstruction, 2 evaluated both. Overall, reliability was poorly evaluated. Transrectal and transabdominal ultrasound showed moderate to good validity to evaluate bladder outlet obstruction. Measured prostate volume with these ultrasound methods, to identify benign prostatic obstruction, showed moderate to good accuracy, supported by a moderate to high level of evidence. Uroflowmetry for bladder outlet obstruction showed poor to moderate diagnostic accuracy, depending on used cut-off values. Questionnaires were supported by high-quality evidence, although correlations and diagnostic accuracy were poor to moderate compared with criterion tests. Other methods were supported by low level evidence.

Conclusion Clinicians in primary care can incorporate transabdominal and transrectal ultrasound or uroflowmetry in the evaluation of men with LUTS but should not solely rely on these methods as the diagnostic accuracy is insufficient and reliability remains insufficiently researched. Low-to-moderate levels of evidence for most assessment methods were due to methodological shortcomings and inconsistency in the studies. This highlights the need for better study designs in this domain.

  • primary care
  • prostate disease
  • epidemiology

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This review consists of a broad and systematic search for literature in PubMed, Ebsco/CINAHL and Embase databases for studies on the evaluation of bladder outlet obstruction and benign prostatic obstruction in men with lower urinary tract symptoms.

  • The identified literature evaluates a variety of assessment methods, thoroughly evaluated with the COSMIN checklist on all aspects of reliability and validity.

  • A level of evidence was estimated based on methodological quality of the studies and precision, direction and consistency of the results.

  • Studies with poor COSMIN scores were excluded from the best evidence synthesis, to strengthen conclusions and recommendations.

  • Due to low methodological quality of many studies and inconsistencies in findings regarding the diagnostic accuracy, only a best-evidence synthesis was possible.

Introduction

Lower urinary tract symptoms (LUTS) include problems with storage of urine, voiding and postvoiding.1 The prevalence of one or more of these symptoms in men over 50 years and older is 50%–75% and increases with age.2–4 Men with LUTS often experience a reduced quality of life and reduced mental and physical health.5 6 These symptoms are not organ-specific and may be related to underlying pathophysiologic mechanisms.7

Bladder outlet obstruction (BOO) may cause LUTS in 24% of men.8 It has different causes, but is known to frequently occur due to benign prostatic obstruction (BPO), caused by benign prostatic hyperplasia.9 This benign growth of the prostate is harmless, until it compresses the urethra and interrupts the flow of urine. Therefore, prostate size and specific measurements of the prostate are used to evaluate BPO.

Men presenting with LUTS are often evaluated via comprehensive history taking including urological history, physical examination and questionnaires as recommended by the European Association of Urology (EAU) guideline on non-neurogenic male LUTS.10 11 Nonetheless, in primary care, patients are frequently referred to a urologist or urology clinic for further diagnostic procedures, although this is expensive and not always warranted (eg, MRI or urodynamic studies).11

Accurate evaluation of men with LUTS may result in distinct treatment pathways. Less bothersome symptoms could be targeted by conservative treatment including watchful waiting, medication, pelvic floor muscle training or lifestyle changes.12–14 These therapies can be provided by general practitioners or men’s health physiotherapists. Bothersome symptoms, with a medium (>30 mL) to large (>80 mL) size prostate, may require surgery.11 Yet, the frequent referral to urologists, even in men with a small prostate size, leads to waiting lists, increased costs of healthcare and does not always appear to be beneficial for men with LUTS.15 Therefore, accurate assessment of the role of BOO and BPO in primary care could reduce the need for referral and allow for early treatment in primary care. However, it is unknown which methods are valid and reliable.

A recent review discouraged the use of noninvasive tests, such as uroflowmetry or penile cuff tests over pressure flow studies to diagnose BOO, although the role of BPO was not specifically researched and not all aspects of validity (eg, correlations with a criterion) and reliability were covered.16

Therefore, this study aimed to systematically review the literature to determine the reliability and validity of assessment methods available in primary care to evaluate BOO and BPO in men with LUTS and to provide recommendations for the best assessment methods for clinicians in primary care.

Methods

Study design

A systematic review with best evidence synthesis was conducted, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.17

Search

Embase, PubMed and Ebsco/CINAHL databases were searched from inception up to 26 November 2020. A previously validated search strategy for terms on reliability, validity and reproducibility of measurements was used.18 This search strategy was combined with relevant terms for BOO and BPO. Studies on benign prostatic hyperplasia were also included, as the term was frequently used to define its clinical symptoms, instead of its histological features. The currently accepted definitions of BOO and BPO are provided in table 1. The full search strategy was developed by medical-information specialists (JCFK and JM) and is provided for each database in online supplemental table 1A–C.

Supplemental material

Table 1

Currently accepted definitions of BOO and BPO

Screening and selection

Studies were included if the reliability and/or validity was investigated of assessment methods to evaluate BOO, BPO or benign prostatic hyperplasia in men with LUTS. A wide variety of tests and methods is described in the literature for the evaluation of BOO and BPO. To provide a comprehensive overview, all types of assessment methods available in primary care were considered, including all types of questionnaires and all forms of clinical examination techniques and tests (eg, ultrasound imaging, free uroflowmetry).

Exclusion criteria were invasive techniques, CT or MRI, 3D/4D-ultrasound imaging, urodynamic studies, postvoid residue measurement or studies that aimed to estimate prostate size to predict prostate cancer. Studies with assessment methods for BOO or BPO in men with LUTS due to known pathology (eg, prostate cancer, neurological diseases) were also excluded.

Studies were first screened on title, then on abstract and subsequently on full text. Each full text article was independently screened by two investigators from a group of investigators (TV, EvB, REPMB, MPK, SPJR, JJMP and ALP-G). In case of disagreement, a third investigator was consulted from the same group of investigators.

Methodological quality

The methodological quality of the included studies was evaluated using the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist for measurement properties. The COSMIN checklist consists of nine sections (called: ‘boxes’) to score different aspects of validity, reliability and responsiveness. Each box contains 5–18 items on a 4-point scale (poor, fair, good or excellent) and is awarded an overall score per box based on the lowest scoring item within that box. Grading studies with the COSMIN checklist follows a tailor-made approach, as only the measurement properties of the assessment method researched by the study are graded.19

Data collection and analysis

Data extraction included: population characteristics, index and reference tests, reliability (including percentage agreement, kappa values, intraclass correlation coefficient (ICC)) and validity measures (including correlation coefficients, sensitivity and specificity, likelihood ratios).

Primary outcomes were measures for reliability (test–retest, inter-rater reliability and agreement) or hypothesis testing (construct) and criterion validity. Secondary outcomes included other measurement properties regarding reliability (internal consistency, measurement error) or validity (face validity, cross-cultural validity) as evaluated through the COSMIN checklist.

Best evidence synthesis was performed, based on criteria that include the methodological quality, imprecision, indirectness and inconsistency of results.19 Based on these criteria the level of evidence was estimated, described in detail in table 2.20 Per assessment method, between-study results were graded for consistency as consistent, inconsistent or indeterminate. Consistent findings were defined if the measures of validity and reliability were similar or of adjacent categories (eg, moderate-good). The level of evidence was downgraded one level if inconsistency was found.

Table 2

Level of evidence rating

Patient and public involvement

No patient involved.

Results

Search results and grading of evidence

The search identified 7224 articles. After removal of duplicates 5644 articles remained which were screened for title. Of these, 337 articles were screened for abstract. Then, full-text screening was performed for 152 studies, of which 61 studies met all selection criteria and were scored using the COSMIN checklist. Subsequently, 37 studies received a COSMIN score of fair, good or excellent.21–57 Twenty-four studies58–81 received ‘poor’ COSMIN scores and were therefore excluded from the best evidence synthesis. Of the 37 included studies, 18 studies evaluated the assessment of BOO and 17 BPO, 2 studied aspects relevant to both BOO and BPO. A flow chart of the study selection is provided in figure 1.

Figure 1

PRISMA flow chart of the inclusion of studies. COSMIN, COnsensus-based Standards for the selection of health Measurement INstruments; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

With 33 scores, the COSMIN box for criterion validity was scored the most, followed by 8 scores for reliability, 6 for hypothesis testing and 4 scores for measurement error. In none of the studies internal consistency, content validity, structural validity, cross-cultural validity or responsiveness were scored (see table 3 for BOO and table 4 for BPO).

Table 3

Included studies on BOO with at least one COSMIN score: fair, good or excellent

Table 4

Included studies on BPO with at least one COSMIN score: fair, good or excellent

Sensitivity, specificity, positive and negative predictive values in this section are reported as percentages. A summary of findings including scores for consistency of findings and levels of evidence is presented in table 5. Data extraction is found in online supplemental table 2A–C.

Table 5

Summary of findings

Assessment methods for BOO

Twenty studies evaluated the assessment methods for BOO. The reference tests were BOO-related measures (eg, obstruction grade number or maximum urine flow rate), rather than reference tests for prostate size.22–25 28 29 34 35 37–39 41 44 47 50 51 55–57 An overview of the COSMIN scores per study on BOO is provided in table 3.

Transrectal ultrasound

Transrectal ultrasound to measure prostate size to indicate BOO, demonstrated poor correlations with obstruction grade numbers (r=0.2255 and r=0.2956), maximum flow rate (r=0.2039 and r=−0.1155) or postvoid residue (r=0.0555 and r=0.2139). The measurement of peripheral zone thickness, transitional zone volume and index by transrectal ultrasound yielded comparable, poor correlations.39 Different cut-off scores for the prostate size yielded poor to good diagnostic accuracy, including: >25 mL (sensitivity: 85%; specificity: 27%)44 and >40 mL (sensitivity: 66%; specificity: 64%51), indicating obstruction and <40 mL (sensitivity: 43%; specificity: 83%51) and <25 mL (sensitivity: 21%; specificity: 92%51) indicating no obstruction. The COSMIN scores ranged from fair to good for hypothesis testing39 and criterion validity.44 51 55 56 Based on the COSMIN scores, consistent findings, the number of studies (n=5) and total sample size (n=1731), the level of evidence for the validity of assessment of BOO with transrectal ultrasound was graded as high (see table 5). Measures of reliability were not reported (see online supplemental tables 2A,B).

Transabdominal ultrasound

The diagnostic accuracy of transabdominal ultrasound to indicate BOO, ranged from poor to good, depending on the cut-off values for obstruction: >40 mL (sensitivity: 58%; specificity: 67%35) or >45 mL of prostate size (sensitivity: 86%; specificity: 26%22). The diagnostic accuracy was 0.658 based on the area under the curve from receiver operator curve analysis (ROC-AUC), although it appears heavily skewed data were analysed.47 The correlations of prostate size measured by transabdominal ultrasound with maximum flow rate (r=−0.40)57 or BOO index (r=0.2422 to r=0.40)35 47 from urodynamic studies were poor.

Intravesical prostatic protrusion to indicate BOO was measured by transabdominal ultrasound, with moderate to good diagnostic accuracy. Two different cut-off values to determine obstruction were used: 8 mm (sensitivity: 80%; specificity: 80%22) and 10 mm (sensitivity: 65%–81.6%; specificity: 40%–84.9%24 35 47). Correlations of intravesical prostatic protrusion correlated poorly with BOO index (r=0.5947 to r=0.69).22 35 The COSMIN scores ranged from fair47 57 to excellent22 for criterion validity for measurement of the prostate size and good24 35 to excellent22 for the criterion validity of the measurement of the intravesical prostatic protrusion. Based on the COSMIN scores, inconsistent findings, number of studies (n=5) and total sample size (n=536), the level of evidence for the validity of assessment of BOO with transabdominal ultrasound was graded as moderate (see table 5). No measures of reliability were reported (see online supplemental table 2B).

One study compared transabdominal ultrasound measured bladder weight to pressure flow studies, to indicate infravesical obstruction, and demonstrated good sensitivity (85.3%) and specificity (87.1%).38 With a fair COSMIN score for criterion validity from one study with a sample size of n=65, the level of evidence was low for the measurement of bladder weight using transabdominal ultrasound to indicate obstruction (see table 5). Measures of reliability were not reported (see online supplemental table 2B).

Transperineal ultrasound uroflowmetry

Transperineal ultrasound uroflowmetry with a radio frequency reflection measurement to evaluate BOO was compared with pressure flow studies, and demonstrated a high ROC-AUC of 0.96, and good sensitivity (88%) and specificity (95%).25 The COSMIN score was fair for criterion validity. Based on the COSMIN scores from one study with a sample size of n=45, the level of evidence was graded as low to evaluate BOO using transperineal ultrasound uroflowmetry (see table 5). No measures of reliability were reported (see online supplemental table 2B).

Uroflowmetry at home

Uroflowmetry at home with a compartment-meter showed a good sensitivity (79%–99%) and specificity (68%–90%28) compared with different maximum flow rates (<10 mL/s, <15 mL/s, <19 mL/s28) measured by uroflowmetry at the clinic. The agreement between scores was the highest after 10 measurements (kappa: 0.84).28 The COSMIN scores were fair for criterion validity and good for reliability. Based on these COSMIN scores from one study with a sample size of n=186, the level of evidence was graded as low to indicate BOO based on uroflowmetry at home (see table 5, see online supplemental table 2B).

Uroflowmetry

Uroflowmetry to indicate BOO demonstrated a moderate to good sensitivity (68%–99%) and poor to moderate specificity (39%–73%). A higher flow rate cut-off point (7, 10, 15 mL/s) resulted in a higher sensitivity and lower specificity to identify non-obstructed.44 Comparable sensitivity and specificity trade-offs were found to identify obstruction based on maximum flow rate below cut-off values (8, 10, 12, 15 mL/s). Mean values of maximum flow rate of three or four voids would increase diagnostic accuracy.48 Poor correlations were found for the maximum urinary flow compared with the Abrams-Griffiths number (τ=−0.41), urethral resistance factor (τ=0.26) and Schäfer’s obstruction grade (τ=−0.43).56 The COSMIN scores for criterion validity were good.44 48 56

Based on uroflowmetry, Chen et al developed a new nomogram to detect obstructed (≤10 mL/s) from non-obstructed (≥15 mL/s) male patients.29 Compared with the Abram-Griffiths obstruction number from urodynamic studies the sensitivity (81%) and specificity (91%) were good with an ROC-AUC of 86%.29 The COSMIN score was fair for criterion validity.

Based on the COSMIN scores, the consistency of findings, number of studies (n=4) and the total sample size (n=1001), the level of evidence was graded as high for validity for uroflowmetry to identify BOO (see table 5). No measures of reliability were reported (see online supplemental table 2B).

Penile cuff uroflowmetry and penile compression manoeuvre

The method of manual penile compression by the patient to interrupt the flow midstream was analysed by one study.23 This manoeuvre was performed while voiding in a uroflowmetry device. An index was created based on two points from the recorded flow. With a cut-off index value of 96.4%, BOO based on Schäfer’s obstruction grade could be predicted with a moderate sensitivity of 74% and good specificity of 94%.23 The level of evidence was graded as moderate for the validity of using a penile compression manoeuvre to indicate BOO, based on a good COSMIN score for criterion validity from a single study with a sufficient sample size (n=135). No measures of reliability were reported for this method (see table 5).

Uroflowmetry using an automated inflating penile cuff was analysed by three studies.34 37 50 One study evaluated a similar compression index as beforementioned and found a moderate sensitivity (78%) and good specificity (84%) at an index cut-off value of 160% to predict BOO.34 The sensitivity for the measured maximum urinary flow with the penile cuff to predict BOO was good (80%37 and 100%50) with a fair to good specificity of (56%50 and 100%37).

Based on the fair34 50 or good37 COSMIN scores for criterion validity, inconsistency of findings, number of studies (n=3) and total sample size (n=253) the level of evidence was graded as low for the validity to indicate BOO using uroflowmetry with an automated inflating penile cuff (see table 5). Measures of reliability were not reported (see online supplemental table 2B).

Combination of assessment methods

One study developed and evaluated a BOO number based on prostate size measured by transrectal ultrasound, mean voided volume and maximum urinary flow rate from uroflowmetry. It demonstrated poor correlations (r=0.48 to r=0.52) with obstruction indices. The overall diagnostic accuracy, through ROC-AUC analysis was good with reference values: Abram-Griffith’s number (0.83), Schäfer’s obstruction grade (0.82) and urethral resistance factor (0.87).56 Measures of reliability were not reported. Based on a good COSMIN score for criterion validity from one study with a sufficient sample size (n=160), the level of evidence was graded as moderate for the validity of using a combination of assessment methods for the indication of BOO (see table 5, online supplemental table 2B).

Questionnaire

The International Prostate Symptom Score (IPSS) was the only identified questionnaire used to detect BOO. It demonstrated poor correlations with maximum flow rate, postvoid residue and obstruction grade numbers (r=−0.07 to r=0.06).55 Different IPSS item-score cut-off values to indicate obstruction (maximum flow rate <10, <15, <19 mL/s28) demonstrated poor to moderate sensitivity (25%–74%) and moderate to good specificity (55%–86%) compared with uroflowmetry.28 A lower flow rate cut-off point resulted in a lower sensitivity and higher specificity. The IPSS studied as the American Urological Association questionnaire, yielded poor correlations with uroflowmetry recordings,41 detrusor pressure at maximum flow (r=0.1851) and obstruction grade numbers (r=0.15 to r=0.1656). The COSMIN scores were fair for hypothesis testing41 and ranged from fair28 55 to good51 56 for criterion validity. Combined with the inconsistency of findings, number of studies (n=5), total sample size (n=788) and COSMIN scores, the level of evidence was graded as moderate for the validity of detection of BOO using questionnaires (see table 5). However, no measures of reliability were not reported (see online supplemental table 2B).

Assessment methods for BPO

Nineteen studies evaluated assessment methods related to prostate size or intravesical prostatic protrusion, peripheral zone volume and transitional zone volume or transitional zone index to evaluate BPO due to benign prostatic enlargement, unless specified otherwise.21 26 27 30–33 36 39 40 42 43 45 46 49 52–55 An overview of the COSMIN scores per study on BPO is provided in table 4.

Digital rectal examination

The diagnostic accuracy of digital rectal examination to determine prostate size varied between studies. With a cut-off value of ≥30 mL for the prostate size, the sensitivity was good (94%54) and specificity moderate (78%54). Prostate size measured by digital rectal examination performed by a general practitioner and a urologist as reference test, demonstrated a poor correlation (k=0.2827). Correlations with prostate size measured with transrectal ultrasound were poor to good (r=0.56 to r=0.72).49 The COSMIN scores were fair49 and good27 54 for criterion validity. Including the COSMIN score, consistent findings, the number of studies (n=3) and total sample size (n=1067) the level of evidence for validity of digital rectal examination to determine prostate size was high (see table 5).

One study established the reliability of digital rectal examination to be poor to good, based on the grading scale used (ICC: 0.58–0.86).49 The COSMIN score for reliability was fair, based on one study with a sufficient sample size (n=121). Therefore, the level of evidence was low for the reliability of digital rectal examination (see online supplemental table 2A,C).

Transabdominal ultrasound

The diagnostic accuracy of prostate size measured by transabdominal ultrasound was good, compared with transrectal ultrasound, with a cut-off prostate volume of ≤80 mL (sensitivity: 95%, specificity: 96%).40 Another study investigated a cut-off prostate volume of <80 cc compared with a specimen weight of <80 with fair sensitivity (56%) and specificity (100%).52 Transabdominal ultrasound measured prostate size correlated good with transrectal ultrasound measured prostate size (r=0.8253 to r=0.9840 45) and good with enucleated tissue weight (r=0.7333 to r=0.8231 52). Correlations with transitional zone volume measured by transrectal ultrasound were good (r=0.95).45 The COSMIN scores for criterion validity were fair31 33 45 52 to good.40 53 Due to the inconsistency of findings, number of studies (n=6) and total sample size (n=395), the level of evidence was graded as moderate for the validity of measuring prostate size using transabdominal ultrasound (see table 5, online supplemental table 2C).

A low interobserver error was demonstrated (5%), with a fair score on the COSMIN box for reliability and measurement error.45 Based on the COSMIN scores from a single study with a study sample of n=95, the level of evidence for reliability was graded as low for prostate size measurements with transabdominal ultrasound (see table 5, online supplemental table 2A).

Transperineal ultrasound

Good correlations were demonstrated for prostate size measured by transperineal ultrasound, compared with enucleated tissue weight (r=0.89).46 Based on a fair COSMIN score for criterion validity from a single study with a sample size of n=80, the level of evidence was graded as low for the measurement of prostate size using transperineal ultrasound (see table 5). Measures of diagnostic accuracy and reliability were not reported (see online supplemental table 2C).

Transrectal ultrasound

Prostate size measured by transrectal ultrasound correlated poor to good with enucleated tissue weight (r=0.6742 to r=0.9526 30 31 33 52) One study found a mean difference of −12.5 g underestimation of prostate volume measured by transrectal ultrasound, with levels of agreement between −38 and 13 g.42 Other studies assessed the criterion validity of various transrectal ultrasound formulas or outline methods to calculate and measure prostate size. Correlations with the frequently used ellipsoid formula or step planimetry method were good (r=0.8843 to r=0.9621) and non-significant differences36 between measurements of size were found.

The transitional zone volume measured by transrectal ultrasound correlated good (r=0.8730 to r=0.9733 52) with enucleated tissue weight, and subgrouping for men with a larger prostate size slightly increased the correlation.30 Correlations with transrectal ultrasound measured total prostate volume were good (r=0.8221 and r=0.9633). Good sensitivity (93%) and fair specificity (61%) to identify volumes under 80 cc with a specimen weight under 80 grams were found.52

Calculation of the transitional zone index, based on prostate zones measured with transrectal ultrasound, yielded good sensitivity (91%–100%) and poor to good specificity (19%–91%) for different cut-off values, although reference values were unclear.33 For the transitional zone index, poor correlations (r=0.55) with resected tissue weight were found.33

The COSMIN scores were fair21 26 31 33 36 52 to good30 42 for criterion validity and one study scored fair for hypothesis testing.43 Including the COSMIN scores, consistency of findings, number of studies (n=9) and total sample size (n=1854), the level of evidence was graded as high for the validity of the measurement of prostate size using transrectal ultrasound (see table 5).

The interrater scores for peripheral zone thickness were good (ICC: 0.8739), and low interobserver error (4%)45 was found with fair COSMIN scores for reliability. Based on the COSMIN scores from two studies with an overall sample (n=1104) and consistent findings, the level of evidence was graded as moderate for the reliability of transrectal ultrasound (see online supplemental table 2C).

Combination of assessment methods

One study combined transitional zone volume measured by transrectal ultrasound and free uroflowmetry, to calculate a nomogram-based index score. Compared with a Schäfer’s obstruction grade of ≥3 to indicate BPO (as described by the study), a moderate diagnostic accuracy (sensitivity: 74%; specificity: 79%) was demonstrated, with a good ROC-AUC of 0.76.32 With a good COSMIN score for criterion validity from a single study with a sufficient sample size (n=449), level of evidence was graded as moderate for the validity of a combination of assessment methods to indicate BPO (see table 5). Measures of reliability were not reported (see online supplemental table 2C).

Questionnaires

The construct validity (hypothesis testing) of the subdomain and total score of the IPSS was assessed by correlation with prostate size, peripheral or transitional zone volume and transitional zone index. The COSMIN score for hypothesis testing was good.39 All correlations were poor (r=−0.17 to r=0.15).39 43 55 Sensitivity and specificity of the IPSS to match the urologist’s final diagnosis was moderate (sensitivity: 58%; specificity: 59%) and slightly increasing with age in the model (sensitivity: 57%; specificity: 64%).27 The COSMIN scores for criterion validity were fair43 55 to good.27 Including the COSMIN scores and the consistency of findings, number of studies (n=4) and total sample size (n=1916) the level of evidence for the validity of IPSS to determine BPO was graded as high (see table 5). Measures of reliability were not reported (see online supplemental table 2C).

Discussion

Statement of principal findings

Through this review a variety of assessment methods were identified to assess BOO and to identify the role of BPO in men with LUTS due to BOO. To evaluate BOO, transrectal or transabdominal ultrasound and uroflowmetry were identified to be the most adequate assessment methods for clinicians in primary care. These methods showed moderate to good diagnostic accuracy and are supported by moderate to high quality of evidence. However, compared with urodynamic studies, correlations were poor.

The IPSS questionnaire is frequently used to assess severity of symptoms in men with BOO.11 Nonetheless, the IPSS should not be recommended to identify BOO due to poor to moderate sensitivity and specificity and poor correlations with parameters from urodynamic studies. Noninvasive uroflowmetry related methods such as home-uroflowmetry, penile cuff or penile compression manoeuvres can be supported by low to moderate quality of evidence for their validity. Correlations of these methods with urodynamic studies were moderate to good.

In men with BOO, the role played by BPO could be adequately assessed by transrectal and transabdominal ultrasound to measure prostate size. Studies suggest the importance of the intravesical prostatic protrusion, specific enlargement of the prostate towards the bladder floor, which showed higher correlations and diagnostic accuracy24 35 47 57 with BOO compared with the total prostate volume.82 Similarly, the transitional zone volume could also be assessed through these ultrasound modalities, as some studies reported higher correlations with reference tests compared with total prostate volume.26 30 33 Although these prostate parameters could be evaluated with transabdominal or transrectal ultrasound, they were less frequently researched, and may be best evaluated after initial inspection of total prostate volume to improve the evaluation of BPO.

Digital rectal examination may be used to determine prostate volume, although correlations with reference tests were lower compared with transabdominal and transrectal ultrasound. Also, reliability ranged from poor to good, depending on the classification scale to estimate size. Therefore, outcomes of digital rectal examination to indicate BPO should be interpreted with care.

Clinical implications and therapeutic strategies

Following the EAU guidelines on non-neurogenic male LUTS, one of the main goals in men presenting with LUTS is to obtain a diagnosis that may help outline the multifactorial causes of LUTS.11 A male patient presenting in primary care with LUTS may undergo thorough urological history taking and additional transrectal, transabdominal ultrasound or uroflowmetry to unravel the specific role of BOO and BPO. Subsequently, low-cost and effective therapeutic strategies could start early, initiated in primary care. These could include watchful waiting, pelvic floor muscle training, support of self-management, lifestyle advice or pharmacological treatment, including α1 adrenoceptor antagonists and 5α-reductase inhibitors.11 83 These options may suffice for some men to address their level of bother and could prevent unwanted surgery or invasive and burdensome diagnostic procedures in specialised urology clinics.84

Strengths and weaknesses of the study

Thirty-seven of the 61 studies scored at least one COSMIN-box ‘fair’, ‘good’ or ‘excellent’. Nonetheless, this systematic review was hindered by the lack of definitions of measured constructs, minimal to no description of patient characteristics or how the diagnosis of BOO, BPO or benign prostatic hyperplasia was established.

Thirty-three of the included studies evaluated the criterion validity of assessment methods. The COSMIN describes criterion validity as the degree to which the scores of an instrument are an adequate reflection of a ‘gold standard’.85 In the evaluation of BOO, urodynamic studies were often used to calculate a BOO Index (or: Abram-Griffiths number), Schäfer’s obstruction grade or International Continence Society nomogram to define obstruction.22–25 29 34 35 37 38 47 48 50 51 55–57 However, an overestimation or underestimation of the diagnostic accuracy could be present. Men with a score of >40 on the BOO Index are defined as obstructed and below 20 are non-obstructed.86 Men with scores in between are described as equivocal, which some studies excluded from analysis, while other studies included these men in the non-obstructed group.

Also, in the evaluation of BPO, transrectal ultrasound was frequently used as reference test to measure prostate size. This is questionable, as it shows moderate values for the inter-rater reliability for smaller prostate sizes (<30 mL).87 If a poor reference test is used, the index test may incorrectly classify results as false positives or false negatives.88

These limitations lead to undefinable populations, unclear classification of the disease, and arbitrary categorisation of participants. The variability in prevalence in the study samples may lead to a underestimation or overestimation of the diagnostic accuracy and introduces heterogeneity among the selected studies.89 90 Accordingly, this may be the main reason for the inconsistency of results found for some assessment methods which led to downgrading the level of evidence. Consequently, only a best-evidence synthesis was performed, not performing an in-depth meta-analysis to pool the diagnostic accuracy from selected studies.91

Another reason for the inconsistency of results could be the poor use of statistical methods in some primary studies. For example, incorrect use of Cohen’s kappa statistic to correlate instruments or Pearson’s correlations in heavily skewed data.29 47 These studies could not be excluded from synthesis, as the lowest COSMIN score for incorrectly applied statistical methods is ‘fair’.

These limitations threaten the reported diagnostic accuracy of considered methods and therefore compromise their overall added value to the evaluation of men with LUTS suspected of BOO.

Strengths and weaknesses in relation to other studies

Current guidelines on the evaluation of men with LUTS describe the use of questionnaires and suggest digital rectal examination and ultrasound measurements.11 92 93 The present review underlines the recommendation for the use of transabdominal or transrectal ultrasound, while digital rectal examination may be less informative. Ultrasound assessment methods require time and certain experience.94 The cost of ultrasound systems used to be limiting, although affordable good quality portable systems and promising hand-held devices connected to tablets address this limitation. Questionnaires, like the IPSS, provide accurate insight into the severity of complaints and should solely be used for that purpose.95 96

A review by Malde et al did not recommend uroflowmetry to indicate BOO.16 Our review found poor to good diagnostic accuracy for uroflowmetry, home-uroflowmetry or combinations of such measures, only with a low to moderate level of evidence. In our opinion these methods look promising and may help to identify BOO in men presenting in primary care. However, conclusive diagnosis should not be derived solely from the outcomes of these methods.

Unanswered questions and future research

Overall, for a conclusive diagnosis the assessment methods in this review do not have sufficient diagnostic accuracy to be used as stand-alone tests. Therefore, adequate history taking and clinical reasoning of the clinician are required in the evaluation of male patients with LUTS suspected of BOO. A cluster of measurements including questionnaires and other assessment methods may be more promising than comparing single outcomes to a criterion. Additionally, Bossuyt et al presented several diagnostic pathways, that may prove helpful in the evaluation of men with LUTS and BOO in primary care.97 Many of the included studies compared methods with a criterion test or measured construct, without in-depth analysis of pretesting probability or post-testing probability. New studies should further research the best possible diagnostic pathway, by implementing quick and easy triage tests with high sensitivity, followed by add-on tests with high specificity. This could help to improve accurate referrals and select appropriate treatment pathways with less misclassification of men with BOO.

Future studies on the diagnostic accuracy of noninvasive assessment methods should use the COSMIN-taxonomy, which provides helpful definitions and tools to perform accurate research.85 To make meta-analysis possible in the future, researchers should not only follow adequate methodology, but also describe the results according to reporting guidelines of diagnostic studies, such as the Standards for the Reporting of Diagnostic Accuracy Studies (STARD).98

Conclusion

Transrectal or transabdominal ultrasound and uroflowmetry are the most adequate methods to evaluate BOO based on BPO in primary care and may be beneficial for an adequate referral to specialised urology care. Digital rectal examination could be used to evaluate men with BPO, although caution is needed when interpreting the results. Devices and methods related to uroflowmetry, that is, home-uroflowmetry and penile cuff or compression methods may be inconclusive in the evaluation of BOO. Included questionnaires that measure symptoms of severity of LUTS are not recommended to detect BOO or BPO. Overall, this review was hindered by suboptimal methods of the included studies and unclear definitions and imprecise presentation of results in many of the included studies, often resulting in lower levels of evidence.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors JCFK and JM created the search for the databases. TV, EvB, REPMB, MPK, SPJR, JJMP and ALP-G contributed to the screening of titles, abstracts and full texts and the COSMIN grading of the selected studies. TV extracted the data from the studies. TV, EvB, SPJR, MWC and ALP-G drafted the manuscript, which all authors revised and approved for publication. ALP-G accepts full responsibility for the finished work and/or the conduct of the study as guarantor, had access to the data, and controlled the decision to publish.

  • Funding This work was supported by the Scientific Committee on Physiotherapy of the Dutch Royal Physiotherapy Association (KNGF). Author TV was supported by the Dutch Organisation for Scientific Research (Nederlandse Organisatie voor Wetenschappelijk Onderzoek-NWO), grant number: 023013042.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Preregistration https://doi.org/10.17605/OSF.IO/JS3R4

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.