BACKGROUND

As graduate medical education shifts to a competency-based medical education model, meaningful assessment becomes of critical importance.1 The “Next Accreditation System” of the Accreditation Council for Graduate Medical Education (ACGME) relies on frequent, criterion-based, authentic assessment of residents to inform judgments about resident progress. Ensuring meaningful assessment requires surveying for threats to the integrity of resident assessment.

One concern garnering attention is unconscious or implicit gender bias. Implicit gender bias refers to the way that culturally established gender roles and beliefs impact our perceptions and actions without conscious intention.2 Manifestations of gender bias among practicing physicians include differences in patient referral patterns, compensation, and career advancement.3,4,5 Evidence suggests that gender bias impacts faculty assessment of medical student learners. Studies of Medical Student Performance Evaluations (MSPE) found gender-based differences in the traits ascribed to students, with female students more frequently described using communal traits such as compassionate, caring, or empathetic.6, 7

Of concern in graduate medical education is if and how gender bias impacts assessment. Valid and meaningful assessment has important implications to both training programs and resident learners. Notably, training programs utilize assessments to determine resident progress, advancement, and competency.

To explore this, we reviewed the evidence for gender bias in resident assessment within graduate medical education.

METHODS

A comprehensive literature review was performed to capture relevant primary studies for inclusion into this review. Figure 1 details the search strategy employed using PRISMA guidelines. First, independent scoping searches were performed by two reviewers and a medical librarian to explore types of evidence, gaps in the literature, and inform our search strategy.

Figure 1
figure 1

Search strategy for literature review.

A comprehensive search of the Medline and Embase databases was conducted in March 2018 and September 2018. From our scoping search, we found focusing search terms on gender as opposed to assessment was more effective in capturing relevant articles. Search incorporated three main search themes Gender, Gender Bias, and Graduate Medical Education using established MeSH terms. Gender was captured using the terms ‘female, ‘male’, and ‘gender.’ Gender bias was captured using the terms ‘gender bias’, ‘sexism’, and ‘prejudice.’ Graduate Medical Education was captured using the terms ‘internship and residency’, ‘graduate medical education’, ‘postgraduate medical education’, and ‘residency.’ Boolean operator ‘and’ was used to combine themes and search fields included title, abstract, and keywords. Limits used include journal articles, publications dating between 1998 and 2018, and English language. Finally, manual search of references and citations of captured articles was performed, and potentially relevant articles were included in the review.

A tiered review process was devised. After initial capture, two reviewers independently and blindly reviewed title and abstract to identify articles for in-depth review using the cloud-based platform Rayyan QCRI (http://rayyan.qcri.org). From this initial screen, select articles were retrieved and full text reviewed to determine inclusion. Interrater agreement was 92.0% for title and abstract review and 95.8% for full text review.

We included studies that examined the impact of gender on resident assessment in graduate medical education as primary outcome. For the purposes of this study, we defined assessment to include measures of resident performance used to inform determinations of resident competency, progress, and advancement. This included formal assessments of resident performance using structured assessment tools and qualitative comments and feedback. To this end, we excluded studies that presented gender data while assessing the validity of tests or indices and studies of gender-based differences on capacity to learn or perform tasks such as surgical procedures. We excluded articles that did not represent original research, such as reviews or commentaries, articles in which full text articles in English were not available such as conference abstract reports, and studies that occurred outside of the USA or Canada.

Data was extracted from full-text articles by one author and verified by review team. Data included participants, training setting, outcome measures, and findings. Strengths and limitations of studies were assessed individually and in aggregate. Quality characteristics of studies included sampling, assessment tools, study design, and analytic methods. Discord was settled by discussion and consensus of the team.

RESULTS

The structured search strategy yielded nine unique studies meeting inclusion criteria.8,9,10,11,12,13,14,15,16 Table 1 details extracted data and Table 2 details quality characteristics of included studies based on guidelines for qualitative and quantitative studies.23, 24 Heterogeneity in methods and outcome measures across studies precluded meta-analysis and comparison of study quality via established indices.24

Table 1 Studies of Gender Bias in Resident Assessment in Graduate Medical Education
Table 2 Quality Characteristics of Studies of Gender Bias in Resident Assessment in Graduate Medical Education

Data included 38,342 resident performance rating scores and 10,394 instances of narrative feedback for 1209 residents by 1287 faculty. Settings included family practice (FP),15, 16 emergency medicine (EM),8, 14 obstetrics and gynecology (ObGYN),13 and internal medicine (IM) training programs.9 Methodologies included quantitative,8,9,10,11,12 qualitative,14,15,16 and mixed methods approaches.13 Eight of nine studies utilized resident assessments derived from direct observation in real-world practice8,9,10,11, 13,14,15,16 while one study examined gender bias in a controlled setting using standardized encounters.12 Assessment tools used included tools using Milestone framework8, 14 standardized tool developed by various specialty groups9, 10, 12, 13, and an institutional assessment tool.11, 16

RESIDENT PERFORMANCE METRICS

Four studies investigated gender bias utilizing faculty rating scores of resident performance. The largest study examined the influence of gender on resident assessment using data from 33,456 resident evaluations from eight EM training programs using an ACGME Milestone-based assessment framework.8 Data was gathered using a real-time online collection tool and faculty elected which residents to assess and when. Dayal et al. found that while there was no difference in Milestone level ascribed to male and female residents at the start of training, by postgraduate year (PGY) 3, faculty ascribed higher levels in all 23 Milestones to male residents compared to female residents. Male residents had a significantly higher rate of Milestone attainment than female residents (12.7% higher or 0.07 Milestone levels per year) so that by the end of training, the discrepancy in Milestone level attained between male and female residents was equivalent to 3 to 4 months of additional training. There was no statistically significant difference in Milestone level by faculty gender or gender of resident faculty dyad.

Two smaller studies examined the impact of gender on resident assessment in IM training programs using an ABIM assessment tool.9, 10 Study of 974 inpatient evaluations of IM residents found scores of male residents were significantly higher in six of nine domains compared to female residents and male faculty rated male residents significantly higher than female faculty in all nine assessed domains.10 A later study of 262 IM resident evaluations from inpatient medicine rotations assessed the influence of faculty and resident gender pairings. While the female resident-male faculty dyad trended toward lower clinical performance scores (7% lower, P = 0.07), there was no significant influence attributable to gender pairing of resident and faculty.9

Looking at resident assessment in a subspecialty setting, analysis of 1100 evaluations of residents rotating on a digestive disease service at one institution found no significant difference in resident performance rating scores due to gender.11 Interestingly, gender pairing was a significant factor in resident assessment of faculty.

STANDARDIZED ENCOUNTERS

Holmboe et al. evaluated the impact of gender bias using standardized encounters as part of a larger faculty development intervention.12 Forty IM faculty viewed and scored scripted encounters representing residents’ history taking, physical exam, and counseling skills at variable competency levels. In these encounters, male residents depicted history and physical exam skills and a female resident depicted the counseling skill. While ratings of the vignette featuring the female resident were lower than scores for the other vignettes, there was no statistically significant difference in ratings due to faculty gender.

MULTISOURCE FEEDBACK

Galvin et al. examined the impact of gender on resident assessments by nursing staff collected as part of a 360° feedback system for a community-based OBGYN residency training program.13 This mixed methods study analyzed 2202 rating scores and 420 narrative comments and found that female residents received significantly lower scores than male residents (1.5 vs 1.7, scale 0 to 2, P = 0.001) in the PGY2 year, which involved more opportunities to interact with nursing. Female interns received fewer positive comments and more negative agentic comments than male interns (17.3% vs 40% and 17.3% vs 3.3% respectively, P = 0.04).

QUALITATIVE COMMENTS

Following the work of Dayal et al., a qualitative study examined 1317 narrative comments included in PGY3 EM resident evaluations.14 Subgroup analysis of 35 residents with multiple evaluations found that female residents received more discordant feedback across faculty, particularly regarding autonomy, assertiveness, and receptiveness to oversight compared to male residents.

Two studies analyzed the influence of gender on resident narrative comments in FP training programs.15, 16 Loeppky et al. analyzed the feedback faculty provide to residents in a Canadian FP training program using FieldNotes or real time, written feedback based on direct observation.15 Analysis of 7316 FieldNote comments found that female faculty provided more feedback and female residents received more feedback than their male counterparts. The female resident-male faculty dyad had the highest proportion of communal adjectives and lowest proportion of agentic adjectives. Ringdahl et al. analyzed 1341 narrative comments from inpatient evaluations of FP interns by faculty and senior residents and found no significant difference in valence or content by faculty gender.16

DISCUSSION

Our review examined the potential for and impact of gender bias on resident assessment in graduate medical education. From this, we surmise three key points. First, gender bias poses a potential threat to the integrity of resident assessment in graduate medical education. Five of nine reviewed studies reported a difference in outcomes attributed to gender including significant differences in resident performance metrics8, 10, 13 and narrative assessment13,14,15 including consistency and valence of feedback and traits ascribed to residents.

Strength of the evidence demonstrating an impact argues in favor of potential for gender bias in resident assessment. Studies that showed significant differences in resident performance metrics employed greater numbers of evaluations as data and utilized established assessment tools.8, 10, 13 The most robust and persuasive evidence comes from the study by Dayal et al. which demonstrated a significant difference in Milestone attainment by resident gender.8 The Milestone framework is the strongest measure of resident assessment across studies as the development and validation of the 23 EM Milestones is robust and well-defined.18, 19 The large number of data points, multi-institutional design, and use of the Milestone framework adds to the robustness of the findings.

The complicated nature of implicit gender bias makes capturing this challenging. Studies assessing for gender bias must consider a variety of potential manifestations and employ appropriate methodologies and analytic models. Studies that found no difference in resident assessment attributed to gender were limited by fewer participants,9, 12, 16 low proportion of female participants,9, 11 or limited qualitative analysis.16 Failing to detect a difference in outcomes may reflect a failure to capture gender bias rather than confirmation of no bias. In short, negative results are insufficient to rule out gender bias. Additionally, the critical importance of valid resident assessment in competency-based medical education raises the stakes on the assessment process. The argument may be made that any good evidence of implicit gender bias in resident assessment is a sufficient harbinger to warrant concern and vigilance.

The second key point surmised from our review is the complexity of how gender bias manifests in resident assessment. Manifestations and patterns of gender bias were not uniform across studies reviewed. While differences were noted by gender of resident,8, 10, 13,14,15 evidence of an influence of faculty gender or gender of resident and faculty pairings was limited.10, 15

Gender bias is multifaceted and may arise when gender-based normative behaviors and expectations misalign with professional roles and behaviors.25 It may emerge in specific context such as when performing professional roles as leader or manager or working with others within a team.26,27,28,29 Providing some context, qualitative studies suggest expectations of interpersonal dynamics and issues of power may be at play. Female residents were more often assessed using communal or warmth-based descriptors and less often agentic or competency-related descriptors.13, 15 Study of qualitative comments by Mueller et al. found that senior female residents more often received inconsistent feedback across faculty regarding traits of autonomy, assertiveness, and receptiveness to oversight.14 Analysis of nursing narrative assessments indicated that female interns were more susceptible to bias in assessment suggesting that female interns must contend with both gender roles and issues of power during training and may be expected to be “overly communal” to be effective.13

A third key point concluded from review of the existing evidence is the need for further exploration and study focused on gender bias in graduate medical education. This includes factors that underlie gender bias, impact on learners, and interventions to address gender bias in graduate medical education. While the evidence indicates that gender bias is a factor in resident assessment in graduate medical education, a key question remains wherein lies the source of the bias. Does the gender difference in resident assessment arise from an issue with the assessment tool, the learners, or the faculty evaluators? Evidence of gender bias was reported using assessment tools of variable strength including the robust EM Milestone framework18, 19 suggesting that gender-based differences in outcomes were not a product of a particular assessment tool.

As some have suggested, the gender-based difference in outcomes may be due to difference in performance between male and female residents. Female residents may be operating under strain when their professional role requires them to act counter to gender-based normative behaviors. Study of IM residents’ experiences with cardiopulmonary resuscitation found that female residents reported that the role of code leader required them to violate gender behavioral norms and experience tension related to competing expectations.30 Conflict between professional role and gender normative behaviors may explain a disparity in performance in contexts where the clash between gender role and role as resident is heightened, such as directing care in the emergency department, in the labor and delivery unit, or on an inpatient ward team.8, 10, 13, 14

Gender-based difference in outcomes may be due to disparity in how faculty assess resident performance. Faculty operate within the same gender climate as their learners, and their experiences navigating this may influence their assessment of learners. A survey of clinician educators at a Swedish academic center reported that female faculty more frequently cited gender of both faculty and learner as an important factor in their teaching.31 A faculty development intervention aimed at addressing gender bias led to a significant increase in faculty awareness of personal bias and self-efficacy to promote gender equity.32

Another important topic warranting further study is the impact of gender bias on learners, as evidence of long-term impact on trainees is lacking. One study looking at the impact of gender bias on learners in primary and secondary education found that early gender bias exposure influenced learners’ later achievements and had implications for career and earning potential.33 We postulate that discordant and non-specific feedback may be a lost opportunity to assess skills and deficiencies and may undercut female residents’ training experience. As resident assessments are used to inform progress through training,1 gender bias in assessment may impact advancement and duration of residency training. Assessments are sourced for programmatic letters of recommendations for employment and fellowship and bias may result in professional disadvantages in terms of professional opportunity and growth.

Lastly, interventions to address gender bias in graduate medical education are needed. These interventions should include promoting open dialog about gender and bias among trainees and faculty, educational innovations to enable residents to manage tension due to competing roles and expectations, and faculty development to mitigate bias in assessment and feedback.

Our review highlights the importance of the issue of implicit gender bias in resident assessment. Limitations of our review include heterogeneity in methodology, assessment instruments, and outcome measures utilized made direct comparison between studies difficult. Relatively small sample size and low proportion of female participants limited results in some studies. Studies examined gender bias within different environments and specialties and continuity of contact between learner and evaluator is an unknown yet potentially relevant variable. Studies employed different assessment tools with variable evidence of tool validity. Our review did not address the intersection of gender bias and racial bias and this topic warrants dedicated study. Lastly, reviewed studies relied on a gender binary construct which does not account for those who identify on the gender continuum.

CONCLUSION

Review of the evidence indicates that gender bias poses a potential threat to the integrity of resident assessment in graduate medical education. Despite noted differences, the majority of reviewed studies found gender-based difference in outcomes including significant differences in resident performance metrics and disparity in narrative comments. Manifestations of gender bias in resident assessment are complex and challenging to study. Given the importance of resident assessment in competency-based medical education, future study is needed to better understand how gender bias manifests in assessment, impact on learners, and interventions to address gender bias in graduate medical education.