Article Text

Original research
Feasibility, quality and validity of narrative multisource feedback in postgraduate training: a mixed-method study
  1. Ellen Astrid Holm1,2,
  2. Shaymaa Jaafar Lafta Al-Bayati3,
  3. Toke Seierøe Barfod4,
  4. Maurice A Lembeck5,
  5. Hanne Pedersen6,
  6. Emilie Ramberg5,
  7. Åse Kathrine Klemmensen7,
  8. Jette Led Sorensen8,9
  1. 1Department of Internal Medicine, Zealand University Hospital Koge, Koge, Denmark
  2. 2Institute of Clinical Medicine, University of Copenhagen, Kobenhavns, Denmark
  3. 3Department of Clinical Physiology and Nuclear Medicine, Zealand University Hospital Roskilde, Roskilde, Denmark
  4. 4Department of Internal Medicine, Zealand University Hospital Roskilde, Roskilde, Denmark
  5. 5Department of Internal Medicine, Nykobing F Sygehus, Nykobing Falster, Denmark
  6. 6Department of Internal Medicine, Glostrup, Rigshospitalet, Kobenhavn, Denmark
  7. 7Department of Obstetrics and Gynecology, Rigshospitalet, Kobenhavn, Denmark
  8. 8Juliane Marie Centre for Children, Women and Reproduction Section 4074, Rigshospitalet, Kobenhavn, Denmark
  9. 9Children Hospital Copenhagen, Rigshospitalet, Kobenhavn, Denmark
  1. Correspondence to Dr Ellen Astrid Holm; ellh{at}


Objectives To examine a narrative multisource feedback (MSF) instrument concerning feasibility, quality of narrative comments, perceptions of users (face validity), consequential validity, discriminating capacity and number of assessors needed.

Design Qualitative text analysis supplemented by quantitative descriptive analysis.

Setting Internal Medicine Departments in Zealand, Denmark.

Participants 48 postgraduate trainees in internal medicine specialties, 1 clinical supervisor for each trainee and 376 feedback givers (respondents).

Intervention This study examines the use of an electronic, purely narrative MSF instrument. After the MSF process, the trainee and the supervisor answered a postquestionnaire concerning their perception of the process. The authors coded the comments in the MSF reports for valence (positive or negative), specificity, relation to behaviour and whether the comment suggested a strategy for improvement. Four of the authors independently classified the MSF reports as either ‘no reasons for concern’ or ‘possibly some concern’, thereby examining discriminating capacity. Through iterative readings, the authors furthermore tried to identify how many respondents were needed in order to get a reliable impression of a trainee.

Results Out of all comments coded for valence (n=1935), 89% were positive and 11% negative. Out of all coded comments (n=4684), 3.8% were suggesting ways to improve. 92% of trainees and supervisors preferred a narrative MSF to a numerical MSF, and 82% of the trainees discovered performance in need of development, but only 53% had made a specific plan for development. Kappa coefficients for inter-rater correlations between four authors were 0.7–1. There was a significant association (p<0.001) between the number of negative comments and the qualitative judgement by the four authors. It was not possible to define a specific number of respondents needed.

Conclusions A purely narrative MSF contributes with educational value and experienced supervisors can discriminate between trainees’ performances based on the MSF reports.

  • education & training (see medical education & training)
  • qualitative research
  • internal medicine

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available. All data relevant to the study are included in the article or uploaded as online supplemental information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This is to our knowledge the first study reporting details of a purely narrative multisource feedback (MSF) instrument used in postgraduate training in internal medicine.

  • Participants were drawn from a convenience sample.

  • Trainees and their supervisors compared the narrative MSF to a scale based MSF based on their previous experience or knowledge concerning MSF.


Multisource feedback (MSF) also termed 360 degrees feedback is a process in which feedback from multiple assessors is collected. The assessment method was developed and has been used extensively in private business and industrial settings for personal development as well as for appraisal purposes.1 2

MSF in the medical environment

In the healthcare system, the increasing demand for accountability to health authorities, funding agencies and patients, as well as the concerns about physician performance and patients’ safety has required new methods for assessment. Physicians must be competent in domains such as interpersonal and communication skills, professionalism, safety and quality, partnership and teamwork.3–5

Competences in these domains have required new assessment methods. MSF was introduced based on the empirical findings from use in the industry. The first studies on use of the method to assess physicians were published in the late 1980s and beginning of the 1990s.6–8 Since then a large number of studies as well as several systematic reviews9–13 on MSF have been published.

MSF questionnaires

An MSF questionnaire typically consists of several questions, and the assessors mark their answers on a numerical scale.14–16 The questionnaires often also include a possibility to write a free-text comment. However, perceptions about the usefulness of the free-text comments as an add-on are mixed. A recent systematic review included two studies17 18 examining the effect of narrative comments as part of MSF and concluded that the amount of narrative comments is critical in order to improve usability and acceptance of MSF.12 Two studies not included in the systematic review suggests that written comments as add-on in a scale-based MSF instrument provide only little specific information that would make them useful for learners.19 20

Overeem et al18 reported to our knowledge the only study, which has explored the use of an MSF instrument solely containing narrative statements. Their study showed that among physicians there was a significantly higher satisfaction with the narrative method compared with the numerical scale-based questionnaires. However, this study compared three different MSF models and did not go into detail describing the narrative MSF (an instrument used in the Netherlands).

Validity and reliability

Validity and reliability of scale-based MSF-questionnaires has been extensively examined and is generally reported to be acceptable,9 10 13 although there have been critical voices.17 21–23 A recent systematic review concluded that there is a lack of research results to demonstrate content validity (do the questionnaires measure what they are supposed to measure?), consequential validity (does the feedback lead to any changes?) and validity concerning the process.11 Validity may vary depending on the purpose of MSF. A large validation and reliability study including approximately 1000 physicians and 16 000 assessors (colleagues/coworkers) found that 15 colleagues needed to answer the MSF questionnaire in order to reach reliability.24 This study also concluded that the method was acceptable for formative feedback, but that due to possible biases it should not be used in isolation to inform decisions about a doctor’s fitness to practice medicine.

Very few studies have reported objective measurement of the consequential validity of MSF. The most recent systematic review included 16 studies of which only one included a measured change in behaviour.12 25 Some studies have shown that physicians feel that MSF has educational value26 and will lead to changes in attitudes and/or behaviour.27 Other studies found the perceived effectiveness low.17 28 Several studies found that narrative comments and/or mentoring and feedback conversations are important in order to strengthen the educational value.17 18 29–32

During the last decade, several researchers in medical education have raised concerns about the extensive use of psychometric measurements in the assessment of medical competencies.33–37 In an analysis of assessment discourses Hodges named this heavy reliance on psychometric measurements the discourse of ‘Cronbach’s alpha and competence-as-reliable test score’.34 Others have pointed to the ‘gaming culture’ arising when focus on tick-boxes and numbers replaces focus on learning.32 Schuwirth and van der Vleuten made ‘a plea for a major revision of the statistical concepts and approaches to assessment’.33 Eva and Hodges noted that ‘Perhaps the translation of behaviours into numbers and then numbers back into statements is an unnecessary detour .’38

Aim of this study

In this article, we will report findings from the use of a purely narrative MSF-questionnaire. We will examine feasibility, quality of narrative comments, perceptions of the users (face validity), consequential validity, discriminating capacity and discuss the number of assessors needed.



In 2004, Denmark adopted the CanMEDS-based framework including workplace-based assessments of competences. MSF is mandatory in almost all specialties including the specialties of internal medicine. All trainees are appointed a clinical supervisor responsible for holding regular feedback conversations and securing progression. The aim of MSF in this context is to support and develop the competences of trainees in domains such as interpersonal and communication skills, professionalism, safety and quality, partnership and teamwork.

Participants and questionnaires

Trainees in postgraduate training in internal medicine or in one of the specialties of internal medicine were invited to use an electronic MSF if the trainees and their clinical supervisors agreed to participate in the study. The trainees could be in any postgraduate training year that is, year 1–6, but could only be included once. The trainees chose their own assessors hereafter called respondents. The trainees were informed that MSF was meant to collect feedback from different categories of collaborators and were advised to choose respondents from various groups of staff such as nurses, secretaries, senior colleagues and peers.

In order to become a clinical supervisor in Denmark, you have to attend a 3-day ‘train the trainers course’. This course includes training in general feedback giving but does not specifically include feedback giving related to MSF. Trainee–supervisor pairs who agreed to take part in the study received a one-page document explaining the aim of MSF and especially stressing that a feedback conversation was an important aspect. During the feedback session, specific strengths as well as possible need for development should be discussed and planned.

The respondents were not trained in giving feedback. However, they received a mail containing a link to the questionnaire and this mail included a short instruction. The respondents were told that they should only make comments based on their own observations, that is, they did not have to give answers to all questions in the questionnaire. They were asked to make comments as specific as possible and provide positive as well as negative comments when appropriate. Negative comments should be constructive and preferably include advice for change.

The MSF questionnaire had been developed by a group consisting of representatives appointed by the national scientific societies of the internal medicine specialties (the societies for internal medicine, cardiology, infectious diseases, pulmonary medicine, gastroenterology, geriatric medicine, rheumatology, nephrology, haematology and endocrinology). The questionnaire was designed to assess core competencies within the domains of communication, collaboration, management and professionalism. These domains were chosen because they were part of the core curriculum in the internal medicine specialties. The questionnaire contained two open-ended questions within each domain. Additionally, respondents were asked to write down advice on how the physician might further improve. The questionnaire was pilot tested for feasibility but not further validated before the present study. The questionnaire is shown in the online supplemental material 1 (translated from the Danish version). One of the authors (EAH) transferred the questionnaire into an electronic form using the computer programme SurveyXact.

The option of using an electronic version for the mandatory MSF was distributed through educational key persons in the scientific societies and through mouth-to-mouth method. Trainees knew the MSF procedure since MSF had been mandatory in specialist training in Denmark during several years before this study. The MSF form previously used was scale based with an option to add narrative comments. However, in 2013, the curriculum for the internal medicine specialties was revised, and a purely narrative MSF form was developed and recommended by the internal medicine society. Trainees who wished to use the electronic model were advised to choose approximately eight respondents. When a trainee asked to use the electronic model, one of the authors (EAH) would mail a link to the questionnaire to the respondents appointed by the trainee. After completion of an assessment EAH would summarise the results in a report which was mailed to the supervisor of the trainee. The report was a standard computer generated summary and contained no interpretations. The supervisor then would arrange a feedback conversation. After the feedback conversation, the supervisor and the trainee answered an electronic postquestionnaire containing questions concerning perception of usability and consequences of the process; questions were answered on a 5-point Likert scale. Data were collected during the period 1May 2014 until 1 May 2016.

Data analysis

Quality of narrative comments

The content and quality of the narrative comments was examined using a directed content analysis in order to identify feedback characteristics expected to have a beneficial impact on a learner’s performance.39 Three of the authors (SJLA-B, TSB and EAH) developed the initial coding scheme consulting the literature on effective feedback and leaning on similar work by Canavan et al19 Coding was done using the computer program NVivo.

The initial scheme was tested and improved through discussions during several iterative rounds using different samples. After agreement on the coding scheme, two of the authors (EAH and SJLA-B) coded 10 reports and discussed incongruences. However, the coding results were now so similar that one author (EAH) coded the remaining documents. If EAH had doubt concerning the interpretation of comments SJLA-B was consulted. The coding scheme included the following codes:

  • Valence: comments were coded according to whether they were positive or negative.

  • Specificity: comments were coded as specific if they contained information that was more specific than the question. For example, a question concerning collaboration could be answered ‘very good at collaborating’ (unspecific) or ‘good at collaborating with the nurses’ (specific).

  • Behaviour related: a comment was coded as behavior-related if it was describing behaviour that could be changed. If for instant a physician was described as ‘a calm and friendly person’ it would not be coded as behaviour related. However, if the comment said ‘in acute situations she keeps calm and friendly and continue working efficiently’ it would be coded as behaviour related.

  • Constructive: a comment was coded as constructive if it suggested possible ways for change/development.

Feasibility and validity

The trainees and their supervisors answered a survey containing information on their perception of the process, consequences and time spent in order to examine feasibility, face validity and consequential validity. Data collected form this survey was used to examine feasibility, satisfaction and consequences.

Discriminating value

Four authors, all experienced supervisors (ÅKK, TSB, HP and MAL) studied all reports independently and divided them into two groups based on their performance in the assessed domains: (1) probably very competent, no reason for concern or (2) some concern due to possibly lacking competences, need for further assessment.

Number of respondents needed

Four of the authors (EAH, MAL, HP and ER) performed iterative readings of all assessments in an attempt to decide if criteria of saturation could be met at a certain number of respondents.


Danish law exempts this type of survey studies from ethical approval. However, when a trainee asked to use the electronic model for MSF, the author EAH would mail a description of the method. This mail included the following information:

  • All data collected would be anonymised and data extracted from the electronic MSF would be used as part of a research project.

  • The trainee and the supervisor should be willing to report their experiences in a second questionnaire.

The information mail to participants stressed that participation was voluntary and participating in the study did not affect the training or work of the participants.

Patient and public involvement

There were no patients participating in this study. The public was not involved.


Participation and feasibility

Overall, 48 trainee–supervisor pairs and 376 respondents participated. The mean number per trainee of respondents invited was 10.9 (SD 2.3) and the mean number of respondents was 8.0 (SD 2.0). Mean time spent by respondents was 12.6 min (SD 3.7) and mean time spent by the trainees performing the self-assessment was 20.5 min (SD 10.4). Respondents were senior colleagues (consultants), peers, nurses, secretaries and others (see figure 1).

Figure 1

Categories of respondents. The trainees chose their own respondents for MSF. They were advised to choose respondents from different categories of staff. Each column represents a trainee. for each trainee the figure shows the number of assessors in different categories of staff. Other trainees were categorised as ‘Peers’. The category ‘nurses’ include auxiliary nurses and nurse students. MSF, multisource feedback.

Quality of the narrative comments

In total, 4684 comments were coded. Each comment could be coded for several characteristics for example a comment could be positive, constructive and specific. However, if a comment was coded for valence it would be either positive or negative. Out of 1935 comments containing a positive or negative statement, 89% had positive valence and 11% had negative valence. Only 185 comments were coded as being constructive (giving suggestions for change); 1289 comments described behaviour and 1275 comments were specific i.e. giving an answer that was more specific than the question. Table 1 shows the percentage of comments covered by each code category and figure 2 illustrates number of comments within different coding categories for each trainee.

Table 1

Coded comments in the qualitative text analysis

Figure 2

Distribution of comments within different coding categories each column represents a trainee. For each trainee, the figure shows the amount of comments within the different coding categories.

Face validity and consequential validity

Out of the 48 trainee–supervisor pairs in the study 34 trainees and 38 supervisors completed a postquestionnaire on perception and consequences of MSF after having had the feedback conversation. A large majority of trainees and supervisors preferred a narrative MSF to the more conventional numeric scale based MSF (see table 2). We found no significant associations between the amount of negative/positive or constructive comments and the perceptions of the trainees on whether they had made a plan for improvement or not.

Table 2

Trainees and supervisors perceptions

Table 2 shows results from the post-questionnaire answered by trainees and supervisor after the MSF procedure, which included a feedback conversation between supervisor and trainee. The questionnaire was answered on a 5-point Likert scale (strongly disagree—disagree—uncertain—agree—strongly agree). Significance was tested using χ2.

Discriminating capacity

Negative comments were kept in a very cautious and respectful language. Four authors independently classified the MSF reports in either ‘no reason for concern’, or ‘some concern’. There was a very good inter-rater correlation between the judgements of the four authors with kappa values of 0.7–1 (see table 3).

Table 3

Inter-rater correlation

Four of the authors (ÅKK, TSB, HP and ML) judged the MSF reports for each physician and decided whether the report indicated ‘no concern’ or ‘some concern’ regarding performance within the domains of communication, collaboration, management or professionalism.

There was a significant association between the judgements of the four authors and the number of negative comments found in the text analysis of the MSF reports (see table 4).

Table 4

Comparison of trainees categorised as ‘no concern’ or ‘some concern’

Number of respondents needed

The number of respondents per trainee varied from 3 to 13, with a mean of 8 (see figure 1). The details of the comments given by the respondents varied substantially. Some respondents used many words and others used very few. Some of the respondents only gave comments such as ‘good’, ‘super!’ ‘average level’ whereas others gave very detailed feedback including examples. When respondents provided detailed information, we found that very few respondents were needed to give us a picture of the trainee. To demonstrate this we will now look in more detail into the MSF reports of three trainees, representing physicians who had many, few or a mean number of respondents.

Dr Y: 13 respondents. Altogether 1427 words. One assessor contributes with 24% of these words. In total, there are seven negative and seven constructive remarks, and they all come from three assessors. The remaining 10 assessors provide very little information.

Dr X: Three respondents. Altogether 565 words. One of the assessors contribute with only 10% of these words and the other two contribute with 44% and 46%, respectively. There are three negative (all from one assessor) and five constructive comments (all but one from the same assessor who gave the negative comments).

Dr Z: Seven respondents. Altogether 689 words. Three assessors contribute 70% of the words, one assessor 16% and the remaining three assessors 14%. There are five constructive remarks and seven negative remarks. One assessor is responsible for three negative and three constructive remarks. The remaining six constructive or negative comments are distributed on five assessors, one assessor responds ‘good’ to all questions.

Based on these findings, we cannot make a conclusion on how many assessors are needed in order to reach satiety or in order to secure meaningful feedback.


Face validity

The study demonstrates that a purely narrative version of MSF can provide feedback that is valued by physicians in postgraduate training as well as by their supervisors. Both among recipients of MSF and among their supervisors, an overwhelming proportion (>90%) preferred a narrative questionnaire to a scale-based questionnaire. The mean time spent by respondents was 12.6 min (SD 3.7), which seems reasonable.

Quality of feedback

The amount of negative and constructive comments was low. This finding is similar to previous studies on MSF. Many respondents obviously did not invest much time when for example answering all questions with ‘good’ or ‘average’. However, some respondents gave detailed descriptions of their experience with the trainee and advice on how to develop further competence. All respondents used a very polite language, indeed so polite that a hint of criticism could easily remain unnoticed. Others have described this lack of negative or constructive feedback.19 32 40 Lockyer et al found that 18% of comments were negative and 76% positive.41 In a qualitative study, Ingram et al demonstrated that raters were reluctant to give negative feedback.32

Consequential validity

A large proportion of the MSF-recipients (82%) perceived that they discovered performance that they needed to develop. However, only about half (52%) of these had actually made plans on how to train for performance change. The effect of MSF feedback has been reviewed in several studies.10 12 42 43 As discussed in the background section the evidence is conflicting. However, with this large proportion of MSF-recipients acknowledging detection of performance in need of development, we find that our study contributes to the evidence for a positive educational value of MSF.

Discriminating capacity

We found that a narrative MSF was able to discriminate between trainees. However, we consider the strength of a narrative MSF to be the much more detailed information in comparison to a score marked on a scale. We suggest that this strength is the reason that the majority of trainees reported having identified areas where they performed better than they thought or realised a need to improve.

Number of respondents needed

We were not able to make firm conclusions on the number of assessors needed for narrative MSF. It all depends on the quality of the assessments and of the purpose of the assessment. If the purpose is purely formative with an intention to collect meaningful feedback, few respondents may be enough. If the purpose of MSF is to discriminate between trainees who may be in trouble and trainees with acceptable performance a larger representative sample of colleagues may be needed to secure that problematic behaviour will be identified. Strengths and limitations

This study is to our knowledge the first internationally reported study describing details of a purely narrative MSF instrument. Participation in this study was optional for those who heard about the study and preferred an electronic version to the standard paper version of the mandatory MSF. Thus, the participants comprise a convenience sample of trainees in the internal medicine specialties and may not be representative for all trainees. However, MSF is mandatory and the questionnaire used was identical to questionnaires used by all trainees only differing by being in an electronic form. We, therefore, do not expect the sampling to bias our results.

A majority of the participants responded that they preferred a narrative MSF to a scale based MSF. However, we do not know exactly to what extent participants build this response on experience. Furthermore, it was new to most of the participants to use an electronically distributed MSF and this may have influenced their preference. In conclusion, the present study cannot be used to directly compare a narrative MSF to a scale-based MSF.

The results may be influenced by the fact that the respondents were chosen by the trainees. Early studies on MSF suggested that scores from assessors chosen by the trainee was not significantly different from scores given by assessors chosen by a supervisor.8 However, this has been challenged in some later studies showing significant differences in scores depending on choice of assessors.21 22 The consequential validity is based only on information from the participants and we cannot conclude on the actual consequences.

Future directions

A very clear finding in our study was that the respondents gave very little negative and constructive feedback and used an extremely polite language. Some respondents contributed with detailed feedback and suggestions for development while others spent very few words like ‘super’, ‘good’ or ‘average’. This might be influenced by choice of respondents. The respondents in our study were chosen by the trainees. This procedure has advantages such as feasibility (time saving for the supervisor, the trainee can choose respondents who know them) and credibility (the trainee is probably more prone to accept the assessment from colleagues chosen by himself/herself). However, in a supportive learning environment where it should be stressed that MSF has only formative purposes it might be possible to make trainees choose their respondents wisely by not choosing only those whom they consider to be positive, but specifically go for respondents that may be critical and are willing to give honest feedback. The fact that the MSF is purely narrative in itself stresses the formative character of MSF. Furthermore, it would promote a good learning environment and feedback culture, if respondents received some amount of training in MSF. This could be part of a more general training in feedback giving and receiving for both trainees and supervisors.

In this study, trainees as well as supervisors prefer a narrative MSF to the conventional numeric scale-based questionnaire. As discussed in the background section, this finding is in harmony with other voices asking for more qualitative assessments.33 35 37 44 45

Using narrative feedback instead of numbers is supported by recent trends in the discourse of feedback.46–49 We recommend further studies to develop narrative MSF. We suggest that future studies include experimenting with assessor choice, assessor education and studies on effect.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available. All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors EAH and JLS planned the study. EAH collected the data and produced the first draft of the article. EAH, SJLA-B and TSB performed the qualitative text analysis. ÅKK, TSB, HP and MAL studied all reports in the analysis of discriminating value. EAH, MAL, HP and ER performed iterative readings in an attempt to decide on the necessary number of assessors. All authors contributed to analysis and interpretation of data and approved of the final version of the article.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.