Article Text


Factors predicting a change in diagnosis in patients hospitalised through the emergency room: a prospective observational study
  1. Stefanie C Hautz1,
  2. Luca Schuler2,
  3. Juliane E Kämmer3,4,
  4. Stefan K Schauber5,
  5. Meret E Ricklin2,
  6. Thomas C Sauter2,
  7. Volker Maier6,
  8. Tanja Birrenbach2,6,
  9. Aristomenis Exadaktylos2,
  10. Wolf E Hautz2
  1. 1Medical Faculty, Department of Evaluation and Assessment, Institute of Medical Education, University of Bern, Bern, Switzerland
  2. 2Department of Emergency Medicine, Inselspital, University Hospital Bern, Bern, Switzerland
  3. 3Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin, Germany
  4. 4AG Progress Test Medizin, Charité Medical School, Berlin, Germany
  5. 5Faculty of Educational Sciences, Centre for Educational Measurement, University of Oslo, Oslo, Norway
  6. 6Department of General Internal Medicine, Inselspital, Bern, Switzerland
  1. Correspondence to Dr Wolf E Hautz; wolf.hautz{at}


Introduction Emergency rooms (ERs) generally assign a preliminary diagnosis to patients, who are then hospitalised and may subsequently experience a change in their lead diagnosis (cDx). In ERs, the cDx rate varies from around 15% to more than 50%. Among the most frequent reasons for diagnostic errors are cognitive slips, which mostly result from faulty data synthesis. Furthermore, physicians have been repeatedly found to be poor self-assessors and to be overconfident in the quality of their diagnosis, which limits their ability to improve. Therefore, some of the clinically most relevant research questions concern how diagnostic decisions are made, what determines their quality and what can be done to improve them. Research that addresses these questions is, however, still rare. In particular, field studies that allow for generalising findings from controlled experimental settings are lacking. The ER, with its high throughput and its many simultaneous visits, is perfectly suited for the study of factors contributing to diagnostic error. With this study, we aim to identify factors that allow prediction of an ER's diagnostic performance. Knowledge of these factors as well as of their relative importance allows for the development of organisational, medical and educational strategies to improve the diagnostic performance of ERs.

Methods and analysis We will conduct a field study by collecting diagnostic decision data, physician confidence and a number of influencing factors in a real-world setting to model real-world diagnostic decisions and investigate the adequacy, validity and informativeness of physician confidence in these decisions. We will specifically collect data on patient, physician and encounter factors as predictors of the dependent variables. Statistical methods will include analysis of variance and a linear mixed-effects model.

Ethics and dissemination The Bern ethics committee approved the study under KEK Number 197/15. Results will be published in peer-reviewed scientific medical journals. Authorship will be determined according to ICMJE guidelines.

Trial registration number The study protocol Version 1.0 from 17 May 2015 is registered in the Inselspital Research Database Information System (IRDIS) and with the IRB (‘Kantonale Ethikkomission’) Bern under KEK Number 197/15.

Statistics from

Strengths and limitations of this study

  • This is a prospective observational study of all patients admitted to internal medicine through the emergency room.

  • It includes the collection of patient, physician and encounter factors to predict diagnostic quality.

  • It uses linear mixed-effects modelling of dependent variables.

  • The study is limited to a single centre.


Diagnostic errors contribute substantially to preventable medical errors.1 Emergency room (ER) usually assign a preliminary diagnosis to patients who are subsequently hospitalised and may experience a change in their lead diagnosis (cDx) from admission to discharge. This cDx often results from diagnostic errors made in the ER in the first place. Berner and Graber2 estimated the rate of diagnostic errors to be at around 15–30% in contexts such as the ER, comparable to what others found.3 A previous Swiss study even found that the diagnosis of patients presenting to the ER with non-specific symptoms changed in 64% of the cases within the following 90 days.4

Among the most frequent reasons for diagnostic errors are cognitive slips, which mostly result from faulty data synthesis.5 Furthermore, physicians have been repeatedly found to be poor self-assessors and generally overconfident in the quality of their diagnosis,6 which limits their ability to improve. Therefore, some of the clinically most relevant research questions concern how diagnostic decisions are made, what determines their quality and what can be done to improve them.2 ,5 Studies addressing these research questions—especially field studies—are, however, still rare. We thus aim to conduct a single-centre observational field study by collecting diagnostic decision data, physician confidence and a number of influencing factors (see below) in an ER to model real-world diagnostic decisions and investigate the adequacy, validity and informativeness of physician confidence in these decisions.

Specifically, we will address the following research questions:

  1. What is the rate of change in diagnosis (cDx) in patients hospitalised in an internal medicine (IM) ward through the ER?

  2. How are diagnostic quality and physician confidence in a real-world setting affected by physician, patient and context factors?

  3. How are diagnostic quality and physician confidence related?

Since diagnostic error is hard to define and challenging to measure,7 as well as perceived as highly judgemental, we chose to assess change in diagnosis. This change is a purely descriptive variable but a necessary (although not sufficient) prerequisite for diagnostic error.

To further guide the study conception, instrument selection and interpretation of results, the project draws on three conceptual frameworks, namely, (1) a procedural model of clinical reasoning,8 (2) situated cognition theory9 and (3) a contemporary model of physician confidence.10 ,11 They are described in the following.

Clinical reasoning

Of the multiple reasons for diagnostic error, cognitive errors are among the most frequent.5 These cognitive errors can be distinguished by the point in the diagnostic process at which they occur:5 ,8

  1. Faulty data acquisition (such as the failure to obtain relevant information);

  2. Faulty data synthesis (as the physician integrates findings on a patient with his or her own knowledge).

Most cognitive errors occur during data synthesis,2 ,8 ,12 although most models of clinical reasoning13 regard data acquisition and synthesis as an iterative and circular rather than a linear process of reasoning.14

Situated cognition theory

It is currently largely unknown how different types of cognitive errors are triggered15 and it is generally acknowledged that one cannot teach or learn a general problem solving skill.16 Rather than being rooted in the physician's abilities alone, however, cognitive errors seem to be the result of an interaction between a physician and (1) his or her patient and (2) the context. In fact, context specificity has been termed ‘the one truth in medical education’.17 Situated cognition theory has been successfully used to explain the recurrent finding of context specificity in experimentally controlled settings.9 According to situated cognition theory, the decision quality is determined by three different factors: physician factors (eg, fatigue, experience), patient factors (eg, urgency) and context factors (eg, overall workload, time of day). It also incorporates interactions between these three categories, such as the interaction of physician and patient characteristics resulting in language barriers. Situated cognition theory was thus selected for this study because it suggests a classification of measurable variables into the latent physician, patient and context factors and establishes a relationship between them.

Physician confidence

A current model of physician confidence has been proposed by Eva and Regehr:10 ,11 they distinguish between (1) global self-assessment, which results more from self-conception rather than from an accurate integration of past performance, and (2) self-monitoring, the moment-by-moment reflection in action that is context dependent and equates to situational confidence-regulating behaviour.11 As self-assessment has repeatedly been demonstrated to badly match objective measures of performance,6 we will focus on self-monitoring in order to assess the relationship between situational physician confidence and decision quality, physician, patient and encounter factors. Confidence (or lack thereof) prompts surgeons ‘to slow down when they should’,18 leads physicians to order further diagnostic tests in difficult cases19 and helps students to ‘know when to look it [sic] up’.20 Situational confidence thus seems to be a strong modulator of physician's actions and therefore has the potential to limit the need for changes in a lead diagnosis. Previous experimental studies have found good indication of adequate self-monitoring in high performing individuals but not in low performing individuals.10 ,21 However, little is known about individual differences in self-monitoring capabilities in medical personnel and the factors influencing them.11 ,22 We thus chose to use situational confidence as a second dependent variable of this study and aim to model it through the collected independent variables.


We will collect data in the ER of the Bern University Hospital for all patients of aged 18 years and older admitted to any IM ward. Patient inclusion started on 15 August 2015 and is planned to last until 15 May 2016. The ER at Bern University is a self-contained, interdisciplinary unit that employs around 45 physicians and 120 nurses, and sees more than 40 000 patients per year, of which around 30% are admitted to the hospital.23

Sample size

A change in lead diagnosis from admittance to discharge and the admitting physician's confidence at the time of admission are the primary dependent variables of this study that we will correlate and aim to model through all other collected data, using a multivariate analysis of variance and a linear mixed-effects model. Sample size is estimated based on α=0.05, power=85%, 8 independent predictor variables for the primary outcome (cDx), and R=0.2 and a 15% dropout rate, to be 500.


Included are patients who are ≥18 years old, received an IM lead diagnosis at the ER and are hospitalised to the IM. Excluded are patients who do not fulfil the inclusion criteria, are hospitalised for palliative care, or have IM-physicians involved as consultants only. Physicians included are all physicians from the ER who voluntarily participate in the study. Physicians are reimbursed with 10 CHF (Swiss francs) per questionnaire (see below) they fill in.


In preparation of the main study, a researcher (LS) retrospectively assessed the cDx rate for all patients admitted to the IM via the ER between 1 January and 31 March 2015. This prestudy aimed to assess the quality of the available data from the hospital's electronic patient documentation systems and the number of patients hospitalised to the IM via the ER. Based on these data, the power calculations for the main study were conducted and the expected duration of the data collection phase could be estimated. Also, based on the data collected in the prestudy, four independent medical experts developed and pretested a schema to assess the cDx rate, that is, whether a change in the lead diagnosis made at the ER had occurred or not (see table 1). Two raters then independently applied the schema to the diagnoses of 90 randomly selected patients to reduce bias. Whenever one rater coded a diagnosis as ‘not classifiable’ or the two ratings did not agree, the final classification was achieved through a discussion of all four expert raters.

Table 1

Schema to classify a pair of diagnoses from ER (admission) and IM (discharge)

We chose this framework to classify the primary dependent variable (change in diagnosis) over existing classification schemes for diagnostic error for several reasons:

  1. By classifying change in diagnosis, we decided on a descriptive and non-judgmental framework because there is no single established definition of diagnostic error and the IM's discharge diagnosis is not necessarily more accurate than the ER's admittance diagnosis.4 Labelling a differing admittance diagnosis erroneous may thus not be justified. Furthermore, previous studies have demonstrated only a small overlap between errors during the diagnostic process and patient harm.24 We would argue that a change in lead diagnosis is a clinically more relevant outcome parameter than any definition of diagnostic error: when a patient who initially requires hospital admission improves to be discharged with an identical diagnosis, treatment was most likely adequate. When, however, a change in diagnosis occurs during hospitalisation, the patient is at risk of missed, delayed or inadequate initial treatment. Thus, studying those factors predicting a change in diagnosis leads to the identification of clinically relevant factors.

  2. Assessing diagnostic error may be influenced by factors such as the evolving nature of a disease, hindsight-bias in the rating and the diagnosing clinicians desire to balance the risks of under-diagnosis and over-diagnosis (see ref 7 for a detailed discussion of the challenges in defining and measuring diagnostic error). The schema used to classify a change in diagnoses instead (see table 1) accounts for these factors.

Main study procedure

For the main study, we will collect data in the ER of the Bern University Hospital for all patients admitted to the IM within a period of approximately 9 months. Of the lead diagnoses, about 10–15% will change within the week after admittance, according to preliminary data from the prestudy that are consistent with the literature.2 To get a deeper insight into the reasons for these changes, we will collect data on patient, physician and encounter factors as predictors of the dependent variables at three time points (see figure 1):

  1. All physicians working at the ER and participating in the study will be asked to answer demographic questions (physician factors stable over all admittance decisions) once at the time of inclusion comprising age, gender and years, and kind of training. We will further record each physician's performance in a cognitive reflection test25 to assess their tendency to follow intuitive judgements (all questionnaires available on request).

  2. At admittance, the lead diagnosis and physician's confidence will be recorded. Further, the following patient, physician and encounter factors will be measured: patient age and gender, triage category, date and time, language barrier, subjective and objective (see below) physician workload, and physician confidence in his or her diagnosis together with the diagnosis made. Data will be collected from (1) physician questionnaires and (2) the hospital's electronic patient documentation systems.

  3. Twenty-eight days after admittance, each admitted patient's current lead diagnosis from the IM or his or her primary discharge diagnosis from the IM, if already discharged, will be recorded from the hospital's electronic patient documentation systems. Twenty-eight days were chosen as follow-up period in respect to commonly used periods to assess morbidity and mortality in many studies in the field of emergency and intensive care medicine.26

Figure 1

Study design. ER, emergency room;eCRF, electronic care report form; IM, internal medicine; IRB, Institutional Review Board.

Furthermore, we will continuously collect data on the noise level in the ER with a noise level logger and on overall workload, by collecting the number of patients and the triage category, waiting time, total time in the ER and total time to hospitalisation for each patient treated in the ER during the study period. For a detailed description of all data collected, their respective source and scaling, and how these variables might be aggregated to factors, please see online supplementary additional file 1.

Data sets to be analysed

Data will be entered into a web-based database that fulfils the requirements of the Swiss Human Research Act. The databases RedCap and Sharepoint are provided by CTU Bern (clinical trials unit). For statistical analyses, coded data will be exported. The code book that relates each patient to his or her code will be kept strictly under lock and key by the ERs study administrator (MER).

Data processing

In total, we will collect two dependent variables (cDx and admitting physician’s confidence) and 23 predictor variables during the study (see online supplementary additional file 1). The cDx rate will be determined using the schema developed in the prestudy (see table 1).

First, we will examine correlations among the predictor variables and aggregate highly correlating variables that are linkable on a theoretical basis. For example, we expect a high correlation between (1) a physician's age, (2) their hierarchical position, (3) their total medical experience and (4) their years of experience in emergency medicine (constituting the factor ‘experience’; see last column of online supplementary additional file 1). We further expect a high correlation between (1) a physician's subjective workload, (2) their fatigue, (3) their perceived difficulty with a diagnosis and (4) their self-reported experience with this diagnosis (constituting the factor ‘working impairment’). We will calculate indices based on Z-standardised raw data for these variables.

The number of patients per triage category, the number of attending physicians, the number of treatment bays in the ER and the number of admitted patients in the ER at any one time, will be used to calculate an ER overcrowding score according to the EDWIN model.27 For the noise level, which is continuously recorded, we will calculate the frequency and duration of peaks (defined as all noise above 65 dB(A)) within blocks of 2 h length and use these aggregated data as context factor in subsequent analyses.

For all remaining variables (marked as ‘?’ in the online appendix SI_variables_and_sources.xls), we will calculate a principal component analysis and use their factor scores to calculate summary indices. This procedure will be used to combine variables to additional physician, context and patient factors.

Last, we will compute the cognitive reflection test-score (CRT-score) for each physician by counting the number of correct answers to the questions from the CRT.25

Handling of missing data

Data will be screened for missing values. This will include exploration of patterns of missing values dependent on design components such as physicians and time of day. Missing data will be handled either through multiple imputation or the use of full information maximum likelihood estimation, depending on the characteristics of the variable and the pattern of missing values. Drop-out rate and missing data rate will be reported. Furthermore, the duration of the data collection will be held flexible and last until the target number of patients included in the study is achieved.

Planned analyses

Data will be analysed in SPSS, Matlab and R Statistical software. Rater agreement in the classification of IM and ER diagnoses will be reported as coefficient κ.

To assess selection bias, we will report how participating physicians differ from all physicians in the department regarding publicly available variables. These variables include gender, board certification, discipline of board certification and current position. We will further report how included patients differ from all patients of the IM in age, gender and triage category, although selection bias for patients is unlikely because we will include all patients admitted to IM with only minimal exclusion criteria.

For research question 1, descriptive statistics (frequency, frequency per patient subgroup based on gender and age, frequency by diagnostic subgroup) will be reported.

For research question 2, we will conduct a generalised linear mixed-effects model in R. In this approach, we will model the outcome (change in diagnosis/no change) as a Bernoulli distributed variable. The linear component of the model entails the covariates, which will be linked to the outcome variable, using a logit link function. The random effects in the model will include the logistic error term and a residual term for the physician identifier with a mean of 0 and unknown variance.

Research question 3 concerns the question of how well physician confidence is calibrated. We will use methods from cognitive psychology28 to calculate the level and direction of (mis-) calibration. In detail, we will assess how accuracy (ie, change in diagnosis/no change) is aligned with confidence, ranging from 0 (best possible calibration) to 1 (worst possible calibration). Calibration will be calculated as the weighted mean of the squared difference between confidence and proportion correct for each confidence level.28 To compute the direction and magnitude of miscalibration, we will calculate the over–under index (O–U index), ranging from −1 (highest possible level of underconfidence) to +1 (highest possible level of overconfidence). Miscalibration will be computed as the difference between confidence and accuracy.28

Results of the prestudy

During the prestudy period, 186 patients were admitted from the ER to IM. Results of the prestudy thus indicate that we should be able to include, on average, 2 patients admitted to IM through the ER per day, thus determining an 8.3-month study period to include 500 patients.

Of the 90 patients randomly selected from the prestudy to assess the cDx rate, four did not fulfil the main studies inclusion criteria and were thus excluded. One patient's diagnoses (1.2%) were rated as not classifiable by at least one rater. Following discussion with the other raters, these diagnoses were later classified as ‘change in diagnosis’. Rater agreement was good (κ=0.61). Table 2 lists the frequency with which pairs of ER/IM diagnoses were assigned to each category after rater agreement.

Table 2

Classification of the ER and IM diagnoses of 90 randomly selected patients from the prestudy

Ethics and dissemination

This is a prospective observational study with no interventions performed. Patient data are collected during usual care at the ER and the IM. No additional patient data are recorded for this study. Physicians participate on a voluntary basis. Participant anonymity for both, patients and physicians, will be respected at all times by anonymisation of physician and patient data. According to Swiss research and data privacy laws, no informed consent is required for such studies. Any protocol amendments will be submitted to the Bern ethics committee for approval before implementation. Termination of the study will be reported to the same committee.

Publication and dissemination of results

Results will be presented to participating physicians within the Departments of Emergency Medicine and Internal Medicine at the University Hospital Bern, and at scientific meetings. Results will be published in peer-reviewed scientific medical journals and authorship will be determined according to ICMJE guidelines.


The authors would like to acknowledge Dr Laura Zwaan, Institute of Medical Education Research Rotterdam, The Netherlands, and Dr Robert El-Kareh, University of California at San Diego, La Jolla, USA, for their valuable and constructive critique of an earlier version of the manuscript.


View Abstract


  • Contributors LS collected the data for the prestudy. JEK and SKS analysed the data. SCH wrote the manuscript. All the authors designed the study, revised the manuscript critically and gave their approval of the final version of the manuscript, and agreed to be accountable for all aspects of the published work.

  • Funding The study is funded through internal research funds of the department of emergency medicine, Inselspital University Hospital Bern.

  • Competing interests None declared.

  • Ethics approval The Bern ethics committee registered the study as a quality evaluation study under Kantonale Ethikkomission Bern number 197/15 and waived the need for informed consent.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.