Does exposure to simulated patient cases improve accuracy of clinicians’ predictive value estimates of diagnostic test results? A within-subjects experiment at St Michael’s Hospital, Toronto, Canada

Bonnie Armstrong; Julia Spaniol; Nav Persaud

doi:10.1136/bmjopen-2017-019241

Article Text

PDF

XML

Medical education and training

Research

Does exposure to simulated patient cases improve accuracy of clinicians’ predictive value estimates of diagnostic test results? A within-subjects experiment at St Michael’s Hospital, Toronto, Canada

Bonnie Armstrong1,
Julia Spaniol1,
Nav Persaud2,3

¹ Department of Psychology, Ryerson University, Toronto, Ontario, Canada
² Department of Family and Community Medicine, Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, Ontario, Canada
³ Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada

Correspondence to Bonnie Armstrong; bonnie.armstrong{at}psych.ryerson.ca

Abstract

Objective Clinicians often overestimate the probability of a disease given a positive test result (positive predictive value; PPV) and the probability of no disease given a negative test result (negative predictive value; NPV). The purpose of this study was to investigate whether experiencing simulated patient cases (ie, an ‘experience format’) would promote more accurate PPV and NPV estimates compared with a numerical format.

Design Participants were presented with information about three diagnostic tests for the same fictitious disease and were asked to estimate the PPV and NPV of each test. Tests varied with respect to sensitivity and specificity. Information about each test was presented once in the numerical format and once in the experience format. The study used a 2 (format: numerical vs experience) × 3 (diagnostic test: gold standard vs low sensitivity vs low specificity) within-subjects design.

Setting The study was completed online, via Qualtrics (Provo, Utah, USA).

Participants 50 physicians (12 clinicians and 38 residents) from the Department of Family and Community Medicine at St Michael’s Hospital in Toronto, Canada, completed the study. All participants had completed at least 1 year of residency.

Results Estimation accuracy was quantified by the mean absolute error (MAE; absolute difference between estimate and true predictive value). PPV estimation errors were larger in the numerical format (MAE=32.6%, 95% CI 26.8% to 38.4%) compared with the experience format (MAE=15.9%, 95% CI 11.8% to 20.0%, d=0.697, P<0.001). Likewise, NPV estimation errors were larger in the numerical format (MAE=24.4%, 95% CI 14.5% to 34.3%) than in the experience format (MAE=11.0%, 95% CI 6.5% to 15.5%, d=0.303, P=0.015).

Conclusions Exposure to simulated patient cases promotes accurate estimation of predictive values in clinicians. This finding carries implications for diagnostic training and practice.

diagnostic inference
experience-based learning
ppv
npv
estimate accuracy

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

https://doi.org/10.1136/bmjopen-2017-019241

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

The use of fictitious diseases and diagnostic tests provided information about performance that was not biased by participants’ prior knowledge about real diseases and tests.
Three separate diagnostic tests that varied in sensitivity and specificity were presented in each format, within subjects, in order to show the robustness of the format effect.
All participants were recruited from the Department of Community and Family Medicine at St Michael’s Hospital in Toronto, Canada. Future studies should replicate this research in other settings and with other populations.
The study was conducted online, which may affect the ecological validity of the results.

Probabilistic reasoning is central to medical diagnosis.1–4 Calculating or estimating the probability of a disease given a positive test result (positive predictive value; PPV) or the probability of no disease given a negative test result (negative predictive value; NPV) is notoriously difficult for clinicians, although commonly required for diagnostic inference.5–7 Specifically, clinicians have difficulty understanding and applying test accuracy evidence to pretest odds of disease.5–10 Systematic errors include overestimation of the PPV and the NPV,5–10 which may have negative effects on patient care. Overestimation of the PPV, for example, may increase the risk of overtreatment such as unnecessary surgery or chemotherapy.11 12

The accuracy of probabilistic inference has been shown to be sensitive to the format in which relevant statistics are presented.13–20 The distinction between numerical and experience formats is most critical in the current context. In numerical formats, PPV and NPV estimates are based on numerical summaries of disease prevalence, test sensitivity (ie, the proportion of patients with the disease who receive a positive test result9) and test specificity (ie, the proportion of patients without the disease who receive a negative test result9) or false-positive rates.5–8 14–20 In so-called experience formats, in contrast, decision-makers accrue information about the prevalence of disease and test reliability through exposure to representative patient cases whose true disease status and test outcome are revealed.21–25 Thus, rather than manipulating statistical information to arrive at PPV and NPV estimates, decision-makers must rely on their memory for previously experienced patient scenarios (ie, true and false, positives and negatives) when estimating predictive values.

A series of studies suggests that experience formats may be superior to numerical formats in non-experts. An experience format led to greater sensitivity to the prevalence of genetic disease in unborn children, as well as a decreased subjective sense of worry about the disease.21 In another study, an experience format increased patients’ knowledge of the risks and benefits of lung cancer screening.22 We recently showed that both younger and older adults, regardless of numeracy skills, were more successful at estimating PPVs and NPVs for fictitious diagnostic tests when information was presented in an experience format, compared with when it was presented in a numerical format.23 Similar findings were reported in a study comparing PPV estimates for a Down syndrome screening.24

In summary, there is strong evidence suggesting an advantage of experience over numerical formats in the context of diagnostic inference. However, no study to date has tested this effect in clinicians. In the current study, we sought to test whether the experience advantage would extend to clinicians. We predicted that, similar to laypeople, clinicians would provide more accurate estimates of the PPV and NPV after being exposed to relevant information in an experience format, compared with a numerical format. To test the robustness of the format effect, participants provided estimates of the PPV and NPV for three different fictitious diagnostic tests that differed in sensitivity and specificity.

Methods

Fifty clinicians affiliated with the Department of Community and Family Medicine from St Michael’s Hospital in Toronto, Canada, provided informed consent before completing a 1-hour online experiment via Qualtrics (Provo, Utah, USA), in which they received information about a fictitious disease and three separate fictitious diagnostic tests.

Information about each of the three tests was provided in a numerical format and an experience format. The numerical format was based on prior literature5–8 14–20 and involved reading a verbal passage describing the prevalence of a disease, as well as the sensitivity and the false-positive rate (ie, 1—specificity) of the diagnostic test. Numerical information was expressed in normalised frequencies, in which the base rate frequency was normalised to 100 (see figure 1A). In the experience format (see figure 1B), participants were presented with a slideshow of 100 representative patient cases. Each patient was characterised by a combination of disease status (does vs does not have the disease) and diagnosis (positive vs negative). The words ‘Has Disease’ and ‘Positive Test Result’ appeared in red, and the words ‘Does Not Have Disease’ and ‘Negative Test Result’ appeared in blue. Therefore, same-colour patient cases indicated a true test result (eg, Has Disease and Positive Test Result), whereas different-colour patient cases indicated false test results (eg, Has Disease and Negative Test Result). Each slide presented a single patient case for 3 s. Participants were instructed not to take notes.

Figure 1

(A) An example of the numerical format. (B) An example of the experience format. The numerical format provides the prevalence of disease, as well as the sensitivity and the false-positive rate of the diagnostic test. In the experience format, 100 representative patient cases were viewed in the slideshow for each of the three tests. Each slide was presented for 3 s, and describes each patient in terms of disease status (ie, has disease or does not have disease) and test result (negative or positive). ‘Has Disease’ and ‘Positive Test Result’ were shown in red font, and ‘Does Not Have Disease’ and ‘Negative Test Result’ were shown in blue font.

In order to test the robustness of the format effect (numerical vs experience) on the accuracy of PPV and NPV estimates, three separate diagnostic tests with varying test characteristics were used. The gold standard test had high sensitivity and high specificity, the low- sensitivity test had low sensitivity but high specificity and the low-specificity test had high sensitivity but low specificity (see table 1 for details). Each participant completed testing for all six combinations of format (numerical vs experience) and test (gold standard vs low sensitivity vs low specificity). Presentation order was counterbalanced, such that half of the participants completed the scenarios in the numerical format first (with test order counterbalanced across participants), followed by the scenarios in the experience format (with test order once again counterbalanced). The other half of participants received the reverse order (experience then numerical). Participants were not told that the three diagnostic tests were identical in both formats.

View this table:

Table 1

Test characteristics

In both the numerical and experience formats, information for each test was presented for a total of 3 min before participants were prompted for estimates, specifically ‘how many patients had the disease, out of all patients who received a positive test result’ (PPV) and ‘how many patients did not have the disease, out of all patients who received a negative test result’ (NPV).

PPV and NPV estimates were solicited using a frequency response format in which participants had to fill in both the numerator and the denominator (eg, ‘6 out of 98’). PPV and NPV estimate errors, defined as the absolute difference between true and estimated values, were submitted to separate 2 (format: numerical vs experience) × 3 (test: gold standard vs low sensitivity vs low specificity) repeated-measures analyses of variance. Given the sample size (n=50) and the repeated-measures design, the statistical power to detect medium-sized effects,26 with an alpha of 0.05, was 0.93 for the ‘format’ factor and 0.98 for the ‘test’ factor.27 Statistical analysis was performed using SPSS (Version 22), with alpha set to 0.05.

Results

Thirty-one female and 19 male clinicians completed the online study. The sample included 38 residents and 12 practising clinicians. On average, residents had completed 1.4 years of residency, and practising clinicians had completed 4.3 years of practice.

As a measure of task performance, mean absolute estimation errors (MAE) are reported. Low MAE values indicate more accurate estimates.23 We chose MAE over alternative performance measures (eg, percentage of participants with responses close to the true value) because the MAE provides fine-grained information about the distance between estimates and true values. Because MAE does not distinguish between underestimation and overestimation, figure 2 additionally shows the mean raw PPV (panel A) and NPV (panel B) estimates for each experimental condition, as well as the true values. For PPV estimates, errors were larger in the numerical format (MAE=32.6%, 95% CI 26.8% to 38.4%) than in the experience format (MAE=15.9%, 95% CI 11.8% to 20.0%, d=0.697, P<0.001). As seen in figure 2A, the classic overestimation of the PPV was replicated when information was described numerically. In contrast, the extent to which PPVs were overestimated was reduced dramatically when information was experienced. For NPV estimates, the numerical format also produced larger errors (MAE=24.4%, 95% CI 14.5% to 34.3%) compared with the experience format (MAE=11.0%, 95% CI 6.5% to 15.5%, d=0.303, P=0.015), with less underestimation and reduced variability in estimates when information was experienced (figure 2B). For PPV and NPV estimates, the effect of format was stable across the three tests (P=0.54). There was also no effect of presentation order of format (P=0.48) and no statistically significant difference between residents’ and qualified clinicians’ accuracy for either the PPV (P=0.35) or the NPV (P=0.80).

Figure 2

Mean PPV (A) and NPV (B) estimates for each format and test type. The X axis displays the experimental factors (format x test) and Y axis displays mean estimate values. The grey bars represent mean estimates in the experience format. The black bars represent mean estimates in the numerical format. The red lines indicate the true PPVs and NPVs. Error bars for each mean represent SEs. PPV, positive predictive value; NPV, negative predictive value .

Discussion

Compared with a numerical format, an experience format in which simulated patient cases were viewed over time produced more accurate PPV and NPV estimates in clinicians. The format effect was replicated across three separate diagnostic tests, demonstrating the robustness of the effect across variations of the problem. Critically, the experience format reduced overestimation of the PPV. Trainees and fully licensed clinicians commonly commit errors when making Bayesian inferences. Most notably, overestimating the PPV5–10 can lead to a variety of negative consequences.11 12 The current study thus adds to a growing literature demonstrating that the format in which decision-relevant information is presented influences predictive value estimates.13–20 More specifically, the current data lends further support to the finding that experience formats boost diagnostic inference relative to numerical formats,21–25 and it extends this finding to a clinician population.

Why does the ‘experience advantage’ occur? While the current study was not designed to address this question, there are several possible explanations. First, the experience format promotes an intuitive estimation strategy, requiring little in the way of statistical knowledge or active manipulation of numerical information. Second, the experience format presented participants with naturally occurring frequencies of the four possible diagnostic scenarios (ie, the absolute number of true positives, false positives, true negatives and false negatives). This is in contrast to the ‘normalised frequencies’ presented in the numerical format. For example, in the numerical format, participants learnt that the sensitivity of one of the tests was 83.33%. This number represents the relative frequency of true positive findings among those with the disease. In contrast, in the experience format, participants encountered five true positives and one true negative in the slideshow of 100 patients, and could subsequently derive subjective natural frequency values based on memory of the patient cases. While both formats convey the same statistical information, the experience format may produce superior predictive value estimates because of its use of naturally occurring frequencies.5 13 16–20 28–30 To what extent the strength of the experience format is due to the ‘slideshow’ method that encourages intuitive responses, or from the use of natural as opposed to normalised frequencies, remains to be addressed in future work.

There are both strengths and weaknesses of the current study. A main strength is that we controlled for the potential confound of prior knowledge through the use of fictitious information. Previous research has investigated clinicians’ probability estimates for real diseases and tests.5 7 10 However, knowledge of medical statistics, such as disease prevalence or test sensitivity and specificity, may have influenced clinicians' estimates. Results presented here demonstrate the effect of format on clinicians’ estimate accuracy more cleanly. Another important strength of the study is that participants were shown information for three separate diagnostic tests, varying in sensitivity and specificity, presented in both formats within subjects. The purpose of this design was to demonstrate the stability of the format effect across individuals, as well as different versions of the problem (ie, for reliable and unreliable diagnostic tests that are subject to different types of errors such as false alarms or misses). The findings of the study illustrate the robustness of the format effect. An important limitation of the study is that the sample includes clinicians from one discipline (family and community medicine) from the same hospital, restricting the generalisability of the results. A second limitation is that the study was conducted online, which may affect the ecological validity of the study findings because the experimental setting cannot be fully controlled by experimenters. For example, participants may have had different browser experiences, or distractions in the physical environment. Future studies should test the effect of format on medical experts’ probability estimates in more controlled settings (eg, an in-lab environment).

The current study shows that exposure to simulated patient cases is an effective technique for enhancing experts’ predictive probability estimates without the need for statistical training. Importantly, the experience format significantly reduced the common error of overestimating the PPV relative to the numerical format. Of note, the latter is commonly used in medical education and in real patient cases.1–4 As discussed, more research is needed to shed light on the mechanisms underlying the experience advantage. In particular, it would be important to contrast the experience format with a numerical format in which decision-relevant information is presented in natural, rather than in normalised, frequencies.28–30 Additional avenues for future research include studying the impact of experience formats on clinicians' treatment decisions and other clinical outcomes across a variety of medical disciplines, and examining the viability of these formats for communicating test results to patients.

Acknowledgments

We thank Ryan Marinacci for his help with programming the online study, as well as Taehoon Lee and Anjli Bali for their help recruiting participants.

References

1.↵
2. Kostopoulou O ,
3. Oudhoff J ,
4. Nath R , et al
. Predictors of diagnostic accuracy and safe management in difficult diagnostic problems in family medicine. Med Decis Making 2008;28:668–80.doi:10.1177/0272989X08319958
OpenUrl CrossRef PubMed Web of Science
2.↵
2. Heneghan C ,
3. Glasziou P ,
4. Thompson M , et al
. Diagnostic strategies used in primary care. BMJ 2009;338:b946.doi:10.1136/bmj.b946
OpenUrl FREE Full Text
3.↵
2. Dowie J ,
3. Elstein A
, Professional judgment: a reader in clinical decision making. Cambridge: Cambridge University Press, 1988.
4.↵
2. Falk G ,
3. Fahey T
. Clinical prediction rules. BMJ 2009;339:b2899.doi:10.1136/bmj.b2899
OpenUrl FREE Full Text
5.↵
2. Gigerenzer G ,
3. Gaissmaier W ,
4. Kurz-Milcke E , et al
. Helping doctors and patients make sense of health statistics. Psychol Sci Public Interest 2007;8:53–96.doi:10.1111/j.1539-6053.2008.00033.x
OpenUrl CrossRef PubMed
6.↵
2. Wegwarth O ,
3. Gigerenzer G
. Statistical illiteracy in doctors. In: Gigerenzer G , Gray JA , eds. Better doctors, better patients, better decisions: envisioning health care 2020. Cambridge: MIT Press, 2011:p.137–51.
7.↵
2. Anderson BL ,
3. Gigerenzer G ,
4. Parker S , et al
. Statistical literacy in obstetricians and gynecologists. J Healthc Qual 2014;36:5–17.doi:10.1111/j.1945-1474.2011.00194.x
OpenUrl CrossRef PubMed
8.↵
2. Lyman GH ,
3. Balducci L
. The effect of changing disease risk on clinical reasoning. J Gen Intern Med 1994;9:488–95.doi:10.1007/BF02599218
OpenUrl PubMed Web of Science
9.↵
2. Whiting PF ,
3. Davenport C ,
4. Jameson C , et al
. How well do health professionals interpret diagnostic information? A systematic review. BMJ Open 2015;5:e008155.doi:10.1136/bmjopen-2015-008155
10.↵
2. Steurer J ,
3. Fischer JE ,
4. Bachmann LM , et al
. Communicating accuracy of tests to general practitioners: a controlled study. BMJ 2002;324:824–6.doi:10.1136/bmj.324.7341.824
OpenUrl Abstract/FREE Full Text
11.↵
2. Wegwarth O ,
3. Gigerenzer G
. Less is more: overdiagnosis and overtreatment: evaluation of what physicians tell their patients about screening harms. JAMA Intern Med 2013;173:2086–7.doi:10.1001/jamainternmed.2013.10363
OpenUrl
12.↵
2. Bhatt JR ,
3. Klotz L
. Overtreatment in cancer - is it a problem? Expert Opin Pharmacother 2016;17:1–5.doi:10.1517/14656566.2016.1115481
OpenUrl CrossRef PubMed
13.↵
2. Gigerenzer G ,
3. Hoffrage U
. How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev 1995;102:684–704.doi:10.1037/0033-295X.102.4.684
OpenUrl CrossRef Web of Science
14.↵
2. Akobeng AK
. Understanding diagnostic tests 2: likelihood ratios, pre- and post-test probabilities and their use in clinical practice. Acta Paediatr 2007;96:487–91.doi:10.1111/j.1651-2227.2006.00179.x
OpenUrl CrossRef PubMed Web of Science
15.↵
2. Galesic M ,
3. Garcia-Retamero R ,
4. Gigerenzer G
. Using icon arrays to communicate medical risks: overcoming low numeracy. Health Psychol 2009;28:210–6.doi:10.1037/a0014474
OpenUrl CrossRef PubMed Web of Science
16.↵
2. Hoffrage U ,
3. Gigerenzer G
. How to improve the diagnostic inferences of medical experts. In: Kurz-Milcke E , Gigerenzer G , eds. Experts in science and society. New York: Kluwer Academic/Plenum, 2004:249–68.
17.↵
2. Gigerenzer G
. Adaptive thinking: rationality in the real world. New York: Oxford University Press, 2000.
18.↵
2. Gigerenzer G
. What are natural frequencies? Doctors need to find better ways to communicate risk to patients. BMJ 2011;343:d6386.
OpenUrl FREE Full Text
19.↵
2. Galesic M ,
3. Gigerenzer G ,
4. Straubinger N
. Natural frequencies help older adults and people with low numeracy to evaluate medical screening tests. Med Decis Making 2009;29:368–71.doi:10.1177/0272989X08329463
OpenUrl CrossRef PubMed Web of Science
20.↵
2. Hoffrage U ,
3. Gigerenzer G
. Using natural frequencies to improve diagnostic inferences. Acad Med 1998;73:538–40.doi:10.1097/00001888-199805000-00024
OpenUrl CrossRef PubMed Web of Science
21.↵
2. Tyszka T ,
3. Sawicki P
. Affective and cognitive factors influencing sensitivity to probabilistic information. Risk Anal 2011;31:1832–45.doi:10.1111/j.1539-6924.2011.01644.x
OpenUrl CrossRef PubMed
22.↵
2. Fraenkel L ,
3. Peters E ,
4. Tyra S , et al
. Shared medical decision making in lung cancer screening: experienced versus descriptive risk formats. Med Decis Making 2016;36:518–25.doi:10.1177/0272989X15611083
OpenUrl CrossRef PubMed
23.↵
2. Armstrong B ,
3. Spaniol J
. Experienced probabilities increase understanding of diagnostic test results in younger and older adults. Med Decis Making 2017;37:670–9.doi:10.1177/0272989X17691954
OpenUrl
24.↵
2. Wegier P ,
3. Shaffer VA
. Aiding risk information learning through simulated experience (ARISE): Using simulated outcomes to improve understanding of conditional probabilities in prenatal Down syndrome screening. Patient Educ Couns 2017;100:1882–9.doi:10.1016/j.pec.2017.04.016
OpenUrl
25.↵
2. Obrecht NA ,
3. Anderson B ,
4. Schulkin J , et al
. Retrospective frequency formats promote consistent experience-based bayesian judgments. Appl Cogn Psychol 2012;26:436–40.doi:10.1002/acp.2816
OpenUrl
26.↵
2. Cohen J
. A power primer. Psychol Bull 1992;112:155–9.doi:10.1037/0033-2909.112.1.155
OpenUrl CrossRef PubMed Web of Science
27.↵
2. Erdfelder E ,
3. Faul F ,
4. Buchner A
. GPOWER: a general power analysis program. Behavior Research Methods, Instruments, & Computers 1996;28:1–11.doi:10.3758/BF03203630
OpenUrl CrossRef Web of Science
28.↵
2. Johnson ED ,
3. Tubau E
. Comprehension and computation in Bayesian problem solving. Front Psychol 2015;6:1–19.doi:10.3389/fpsyg.2015.00938
OpenUrl CrossRef PubMed
29.↵
2. Gigerenzer G ,
3. Hoffrage U
. The role of representation in Bayesian reasoning: correcting common misconceptions. Behav Brain Sci 2007;30:264–7.doi:10.1017/S0140525X07001756
OpenUrl
30.↵
2. Gigerenzer G ,
3. Hoffrage U
. Overcoming difficulties in Bayesian reasoning: a reply to Lewis and Keren (1999) and Mellers and McGraw (1999). Psychol Rev 1999;106:425–30.doi:10.1037/0033-295X.106.2.425
OpenUrl CrossRef Web of Science

Footnotes

Contributors BA (study guarantor): study design, study programming, participant recruitment, data collection, data analysis and manuscript writing. JS: study design and manuscript writing. NP: study design, study funder, participant recruitment and manuscript writing. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. NP funded the study and is supported by the Department of Family and Community Medicine of St Michael’s Hospital, the Department of Family and Community Medicine of the University of Toronto, an Early Researcher Award from the Ministry of Research and Innovation and the Physicians Services Incorporated Graham Farquharson Knowledge Translation Fellowship.
Disclaimer The funders had no role in the study.
Competing interests None declared.
Patient consent Not required.
Ethics approval Ethics approval to conduct the current study was obtained from both St. Michael’s Hospital Research Ethics Board (REB number: 16-282) and the Ryerson Ethics Board (REB number: 2014-129).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data available.

[1] 1.↵

Kostopoulou O ,
Oudhoff J ,
Nath R , et al
. Predictors of diagnostic accuracy and safe management in difficult diagnostic problems in family medicine. Med Decis Making 2008;28:668–80.doi:10.1177/0272989X08319958
OpenUrl CrossRef PubMed Web of Science

[3] Kostopoulou O ,

[4] Oudhoff J ,

[5] Nath R , et al

[6] 2.↵

Heneghan C ,
Glasziou P ,
Thompson M , et al
. Diagnostic strategies used in primary care. BMJ 2009;338:b946.doi:10.1136/bmj.b946
OpenUrl FREE Full Text

[8] Heneghan C ,

[9] Glasziou P ,

[10] Thompson M , et al

[11] 3.↵

Dowie J ,
Elstein A
, Professional judgment: a reader in clinical decision making. Cambridge: Cambridge University Press, 1988.

[13] Dowie J ,

[14] Elstein A

[15] 4.↵

Falk G ,
Fahey T
. Clinical prediction rules. BMJ 2009;339:b2899.doi:10.1136/bmj.b2899
OpenUrl FREE Full Text

[17] Falk G ,

[18] Fahey T

[19] 5.↵

Gigerenzer G ,
Gaissmaier W ,
Kurz-Milcke E , et al
. Helping doctors and patients make sense of health statistics. Psychol Sci Public Interest 2007;8:53–96.doi:10.1111/j.1539-6053.2008.00033.x
OpenUrl CrossRef PubMed

[21] Gigerenzer G ,

[22] Gaissmaier W ,

[23] Kurz-Milcke E , et al

[24] 6.↵

Wegwarth O ,
Gigerenzer G
. Statistical illiteracy in doctors. In: Gigerenzer G , Gray JA , eds. Better doctors, better patients, better decisions: envisioning health care 2020. Cambridge: MIT Press, 2011:p.137–51.

[26] Wegwarth O ,

[27] Gigerenzer G

[28] 7.↵

Anderson BL ,
Gigerenzer G ,
Parker S , et al
. Statistical literacy in obstetricians and gynecologists. J Healthc Qual 2014;36:5–17.doi:10.1111/j.1945-1474.2011.00194.x
OpenUrl CrossRef PubMed

[30] Anderson BL ,

[31] Gigerenzer G ,

[32] Parker S , et al

[33] 8.↵

Lyman GH ,
Balducci L
. The effect of changing disease risk on clinical reasoning. J Gen Intern Med 1994;9:488–95.doi:10.1007/BF02599218
OpenUrl PubMed Web of Science

[35] Lyman GH ,

[36] Balducci L

[37] 9.↵

Whiting PF ,
Davenport C ,
Jameson C , et al
. How well do health professionals interpret diagnostic information? A systematic review. BMJ Open 2015;5:e008155.doi:10.1136/bmjopen-2015-008155

[39] Whiting PF ,

[40] Davenport C ,

[41] Jameson C , et al

[42] 10.↵

Steurer J ,
Fischer JE ,
Bachmann LM , et al
. Communicating accuracy of tests to general practitioners: a controlled study. BMJ 2002;324:824–6.doi:10.1136/bmj.324.7341.824
OpenUrl Abstract/FREE Full Text

[44] Steurer J ,

[45] Fischer JE ,

[46] Bachmann LM , et al

[47] 11.↵

Wegwarth O ,
Gigerenzer G
. Less is more: overdiagnosis and overtreatment: evaluation of what physicians tell their patients about screening harms. JAMA Intern Med 2013;173:2086–7.doi:10.1001/jamainternmed.2013.10363
OpenUrl

[49] Wegwarth O ,

[50] Gigerenzer G

[51] 12.↵

Bhatt JR ,
Klotz L
. Overtreatment in cancer - is it a problem? Expert Opin Pharmacother 2016;17:1–5.doi:10.1517/14656566.2016.1115481
OpenUrl CrossRef PubMed

[53] Bhatt JR ,

[54] Klotz L

[55] 13.↵

Gigerenzer G ,
Hoffrage U
. How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev 1995;102:684–704.doi:10.1037/0033-295X.102.4.684
OpenUrl CrossRef Web of Science

[57] Gigerenzer G ,

[58] Hoffrage U

[59] 14.↵

Akobeng AK
. Understanding diagnostic tests 2: likelihood ratios, pre- and post-test probabilities and their use in clinical practice. Acta Paediatr 2007;96:487–91.doi:10.1111/j.1651-2227.2006.00179.x
OpenUrl CrossRef PubMed Web of Science

[61] Akobeng AK

[62] 15.↵

Galesic M ,
Garcia-Retamero R ,
Gigerenzer G
. Using icon arrays to communicate medical risks: overcoming low numeracy. Health Psychol 2009;28:210–6.doi:10.1037/a0014474
OpenUrl CrossRef PubMed Web of Science

[64] Galesic M ,

[65] Garcia-Retamero R ,

[66] Gigerenzer G

[67] 16.↵

Hoffrage U ,
Gigerenzer G
. How to improve the diagnostic inferences of medical experts. In: Kurz-Milcke E , Gigerenzer G , eds. Experts in science and society. New York: Kluwer Academic/Plenum, 2004:249–68.

[69] Hoffrage U ,

[70] Gigerenzer G

[71] 17.↵

Gigerenzer G
. Adaptive thinking: rationality in the real world. New York: Oxford University Press, 2000.

[73] Gigerenzer G

[74] 18.↵

Gigerenzer G
. What are natural frequencies? Doctors need to find better ways to communicate risk to patients. BMJ 2011;343:d6386.
OpenUrl FREE Full Text

[76] Gigerenzer G

[77] 19.↵

Galesic M ,
Gigerenzer G ,
Straubinger N
. Natural frequencies help older adults and people with low numeracy to evaluate medical screening tests. Med Decis Making 2009;29:368–71.doi:10.1177/0272989X08329463
OpenUrl CrossRef PubMed Web of Science

[79] Galesic M ,

[80] Gigerenzer G ,

[81] Straubinger N

[82] 20.↵

Hoffrage U ,
Gigerenzer G
. Using natural frequencies to improve diagnostic inferences. Acad Med 1998;73:538–40.doi:10.1097/00001888-199805000-00024
OpenUrl CrossRef PubMed Web of Science

[84] Hoffrage U ,

[85] Gigerenzer G

[86] 21.↵

Tyszka T ,
Sawicki P
. Affective and cognitive factors influencing sensitivity to probabilistic information. Risk Anal 2011;31:1832–45.doi:10.1111/j.1539-6924.2011.01644.x
OpenUrl CrossRef PubMed

[88] Tyszka T ,

[89] Sawicki P

[90] 22.↵

Fraenkel L ,
Peters E ,
Tyra S , et al
. Shared medical decision making in lung cancer screening: experienced versus descriptive risk formats. Med Decis Making 2016;36:518–25.doi:10.1177/0272989X15611083
OpenUrl CrossRef PubMed

[92] Fraenkel L ,

[93] Peters E ,

[94] Tyra S , et al

[95] 23.↵

Armstrong B ,
Spaniol J
. Experienced probabilities increase understanding of diagnostic test results in younger and older adults. Med Decis Making 2017;37:670–9.doi:10.1177/0272989X17691954
OpenUrl

[97] Armstrong B ,

[98] Spaniol J

[99] 24.↵

Wegier P ,
Shaffer VA
. Aiding risk information learning through simulated experience (ARISE): Using simulated outcomes to improve understanding of conditional probabilities in prenatal Down syndrome screening. Patient Educ Couns 2017;100:1882–9.doi:10.1016/j.pec.2017.04.016
OpenUrl

[101] Wegier P ,

[102] Shaffer VA

[103] 25.↵

Obrecht NA ,
Anderson B ,
Schulkin J , et al
. Retrospective frequency formats promote consistent experience-based bayesian judgments. Appl Cogn Psychol 2012;26:436–40.doi:10.1002/acp.2816
OpenUrl

[105] Obrecht NA ,

[106] Anderson B ,

[107] Schulkin J , et al

[108] 26.↵

Cohen J
. A power primer. Psychol Bull 1992;112:155–9.doi:10.1037/0033-2909.112.1.155
OpenUrl CrossRef PubMed Web of Science

[110] Cohen J

[111] 27.↵

Erdfelder E ,
Faul F ,
Buchner A
. GPOWER: a general power analysis program. Behavior Research Methods, Instruments, & Computers 1996;28:1–11.doi:10.3758/BF03203630
OpenUrl CrossRef Web of Science

[113] Erdfelder E ,

[114] Faul F ,

[115] Buchner A

[116] 28.↵

Johnson ED ,
Tubau E
. Comprehension and computation in Bayesian problem solving. Front Psychol 2015;6:1–19.doi:10.3389/fpsyg.2015.00938
OpenUrl CrossRef PubMed

[118] Johnson ED ,

[119] Tubau E

[120] 29.↵

Gigerenzer G ,
Hoffrage U
. The role of representation in Bayesian reasoning: correcting common misconceptions. Behav Brain Sci 2007;30:264–7.doi:10.1017/S0140525X07001756
OpenUrl

[122] Gigerenzer G ,

[123] Hoffrage U

[124] 30.↵

Gigerenzer G ,
Hoffrage U
. Overcoming difficulties in Bayesian reasoning: a reply to Lewis and Keren (1999) and Mellers and McGraw (1999). Psychol Rev 1999;106:425–30.doi:10.1037/0033-295X.106.2.425
OpenUrl CrossRef Web of Science

[126] Gigerenzer G ,

[127] Hoffrage U

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Strengths and limitations of this study

Methods

Results

Discussion

Acknowledgments

References

Footnotes

Read the full text or download the PDF:

Log in using your username and password