Article Text

Original research
Role of formative assessment in predicting academic success among GP registrars: a retrospective longitudinal study
  1. Paula Heggarty1,
  2. Peta-Ann Teague1,
  3. Faith Alele2,
  4. Mary Adu1,
  5. Bunmi S Malau-Aduli1
  1. 1College of Medicine and Dentistry, James Cook University, Townsville, Queensland, Australia
  2. 2College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, Queensland, Australia
  1. Correspondence to A/Professor Bunmi S Malau-Aduli; bunmi.malauaduli{at}


Objectives The James Cook University General Practice Training (JCU GPT) programme’s internal formative exams were compared with the Royal Australian College of General Practitioners (RACGP) pre-entry exams to determine ability to predict final performance in the RACGP fellowship exams.

Design A retrospective longitudinal study.

Setting General Practice (GP) trainees enrolled between 2016 and 2019 at a Registered Training Organisation in regional Queensland, Australia.

Participants 376 GP trainees enrolled in the training programme.

Exposure measures The pre-entry exams were Multiple-Mini Interviews (MMI), Situational Judgement Test (SJT) and Candidate Assessment and Applied Knowledge Test. The internal formative exams comprised multiple choice questions (MCQ1 and MCQ2), short answer questions, clinical skills and clinical reasoning.

Primary outcome measure The college exams were Applied Knowledge Test (AKT), Key Feature Problems (KFP) and Objective Structured Clinical Examination (OSCE).

Results Correlations (r), coefficients of determination (R2) and OR were used as parameters for estimating strength of relationship and precision of predictive accuracy. SJT and MMI were moderately (r=0.13 to 0.31) and MCQ1 and MCQ2 highly (r=0.37 to 0.53) correlated with all college exams (p<0.05 to p<0.01), with R2 ranging from 0.070 to 0.376. MCQ1 was predictive of failure in all college exams (AKT: OR=2.32, KFP: OR=3.99; OSCE: OR=3.46); while MCQ2 predicted failure in AKT (OR=2.83) and KFP (OR=3.15).

Conclusion We conclude that the internal MCQ formative exams predict performance in the RACGP fellowship exams. We propose that our formative assessment tools could be used as academic markers for early identification of potentially struggling trainees.

  • medical education & training
  • education & training (see medical education & training)
  • primary care

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • This is the first study investigating the predictive roles of both pre-entry exams and in-training assessment on General Practice (GP) trainees’ performance in the college exams at a GP regional training organisation (RTO) that uses a distributed training model.

  • This is also the first study to investigate the predictive ability of Candidate Assessment and Applied Knowledge Test in an Australian setting.

  • This study was conducted at only one RTO.

  • Generalisation of findings to other settings may be limited by the retrospective nature of the study.


Speciality training programmes can be very challenging for some junior doctors. Despite meeting the high academic standards required for postgraduate medical training, a proportion of trainees struggle to perform well in their fellowship exams.1 2 A recent public report noted that 22.5% of General Practice (GP) trainees were likely to fail their first attempt of fellowship exams.3 Failure to achieve fellowship is a concern for the candidate who may suffer considerable personal distress with possible financial hardships emanating from the need to pay for resit exams, and perhaps stigma or shame after failing one or more components of the exams.4 Therefore, timely identification of struggling trainees is an important step for early remediation plans to prevent failure.5 6

Studies on early identification of students’ academic underperformance have been commonly reported in undergraduate medical education.7–11 However, there is a paucity of literature on the subject in postgraduate training. Similar to undergraduate medical education, a significant proportion of trainees encounter academic failure during their postgraduate training, hence the need for early support. Studies suggest that remediation should be an explicit part of the medical education programme rather than an afterthought activity put in place after trainees fail.12 13 Without proactive engagement in academic support programmes that foster early identification of GP trainees’ educational difficulties, regional training organisation (RTO) remediation programmes will remain a ‘band-aid fixing tool’ for failed registrars. Such an approach subsequently results in reduced effectiveness of the training programme and comes at an increased cost to individual trainees and the training organisation.14 15

In response to the need for early diagnosis of trainees’ educational difficulties and associated limiting factors, it is important to use the concept of ‘assessment for learning’ to address learners’ learning needs and strategically change the ways in which they learn (Isaacs, 2001).16 Assessment has the potential to predict performance in practice and aid identification of learning deficiencies, especially with the use of valid and reliable tools.2 17 Pre-entry assessment tools used by the Royal Australian College of General Practitioners (RACGP) are the Situational Judgement Test (SJTs), Multiple-Mini Interviews (MMI) and the Candidate Assessment Applied and Knowledge Test (CAAKT) which was introduced recently to replace the SJT. The SJT and the CAAKT aim to assess clinical reasoning and problem-solving attributes in trainees, while fundamental clinical competencies across the Domains of General Practice are assessed by behavioural style clinical questions using the MMI.18 19 Although the predictive ability of SJT and MMI on performance in end-of-training assessment had been previously established,20 the predictive ability of CAAKT is currently unknown. In addition, there is need for comparative studies with internally developed assessment tools, particularly in the context of their use in RTOs. Shulruf et al21 reported the inability of interview/admission test scores to predict subsequent student failure or drop-out and concluded that it might be more useful to focus on post-enrolment factors. Evaluating the effect of internally developed formative assessment schedules could provide important data for adopting a programmatic approach to assessment.22

Therefore, the following research questions were considered:

  1. What is the predictive ability of the RACGP SJT, MMI and CAAKT exams on James Cook University (JCU) GP registrars’ performance in RACGP exams (end-of training assessment)?

  2. What is the predictive ability of internally developed JCU GP training formative exams on registrars’ performance in RACGP exams (end-of training assessment)?

The findings of this study will enhance the development of a systematic assessment programme that fosters early identification of registrars potentially at risk of academic difficulty, thus enabling effective and timely remediation strategies.

Materials and methods

Study setting

James Cook University’s (JCU) postgraduate general practice training (GPT) programme was established in 2016 to deliver the Australian General Practice Training (AGPT) programme in ‘North West Queensland’, an area that encompasses all of Queensland apart from the South East corner of the state. JCU provides training pathways to the award of Fellowship of the Royal Australian College of General Practitioner (FRACGP), Fellowship of the Australian College of Rural and Remote Medicine and Fellowship in Advanced Rural General Practice. JCU, as a RTO, has a mandate to recruit, train and retain a fit for purpose general practice workforce in remote, rural and regional Queensland. As such, JCU GPT aims to lessen attrition of trainees (both voluntary and involuntary), lessen trainees’ underperformance (decreasing reliance on remediation) and increase successful completion of training rates. This mandate resulted in the design of internal formative assessment tools that allow for assessment of trainees’ academic ability right from the first day of orientation to the programme. The internal formative assessment schedule entails educational diagnosis for identification of trainees’ learning needs, hence providing the opportunity for early remediation plans to foster successful completion of the training.

Study procedure and population

Using a retrospective longitudinal study design, we collated academic pre-entry data (SJT, CAAKT and MMI) for all GP trainees who were enrolled with JCU GPT at the time of the study (2016 to 2019). Demographic variables which were obtained from the registrars at the point of entry into JCU GP programme and stored in the University Record System database were included in this study. These variables include gender, training pathway, origin and fellowship status. Data on the registrars’ internal formative assessment scores comprising of the orientation assessment (multiple choice questions (MCQ1), short answer questions (SAQ), clinical skills (CS) and clinical reasoning (CR)) and subsequent assessment (MCQ2) of trainees 8 months after the MCQ1, were also collated. Data on college exam scores (Applied Knowledge Test (AKT), Key Feature Problems (KFP) and Objective Structured Clinical Examination (OSCE) exams) for trainees who had completed at least two of the three college exams were obtained. Registrars need to pass AKT and KFP before they can attempt the OSCE exam. All registrars in the study had attempted at least both AKT and KFP. Only the first attempt scores in the college exams were included in the analysis.


Exposure measure

Pre-entry assessment

Prior to 2018, applicants to the RACGP training programme were required to undertake SJT and MMI. The SJT items assess non-academic (professional) abilities such as empathy, integrity and coping with pressure.23 The questions test awareness of effective lines of action in each situation as well as procedural knowledge.20 If successful in this assessment, candidates are invited to undertake a MMI which is a structured interview style assessment of between five and eight stations. The MMI assesses clinical competencies across the RACGP Domains of General Practice including interpersonal communication through behavioural style scenarios and questions.20 From 2018, applicants are required to undertake a CAAKT which is a written test that comprises both applied knowledge and SJT questions. The CAAKT replaced the SJT. Based on their successful performance in these assessments, applicants are offered a training position on the AGPT programme training towards a FRACGP.

Internal formative exams

JCU GPT offers all trainees commencing in general practice training posts a number of formative assessments. The registrars are required to take an introductory formative exam at the start of their training which includes four different assessments: a 65-question MCQ paper, a short answer paper (SAQ), observed simulated consultations (CS) and a written clinical reasoning paper (CR). Another 65-question MCQ paper is administered to the trainees 8 months after commencing the training programme. These are in-training formative assessments conducted by JCU to evaluate the registrars’ knowledge and learning needs and are different from the summative assessments (college examinations) which are organised independently by the RACGP.

Outcome measures

RACGP summative exams

RACGP pathway trainees are required to successfully pass an AKT which comprises 150 questions that assess applied clinical knowledge, a written key feature paper (KFP) which assesses applied clinical reasoning and clinical decision-making skills and an OSCE which assesses applied knowledge, clinical reasoning, communication skills and professional behaviours.

Data analysis

Given that the national pass scores for the college examinations vary per year/sitting, we standardised the AKT, KFP and OSCE scores before conducting the analyses. Missing data were detected and deleted by pairwise comparative scanning to maximise effective sample size. After assumptions of linearity, homoscedasticity, independence of errors, non-multicollinearity, unusual points and normality of residuals were met, descriptive statistics comprising frequencies and mean scores were computed in a preliminary analysis. IBM SPSS (V.23) was used in running correlations, and multiple linear and logistic regressions.

Correlation analysis: Pearson’s pairwise correlation coefficients between all pre-entry, internal formative and college exams were used to estimate the strength of association between these variables and significance thresholds were set at p<0.05 and p<0.01.

Multiple linear regression analysis: Multiple linear regression analyses to estimate the ability of pre-entry (SJT, CAAKT and MMI) and in-training formative (MCQ1, CR, CS, SAQ and MCQ2) exams to predict performance in the final college exams were conducted. For the purpose of these analyses, CAAKT was not included in the multiple linear regression model with SJT and MMI, because SJT and CAAKT are similar, hence such an inclusion could escalate the Variance Inflation Factor and trigger multicollinearity. Therefore, a separate multiple linear regression model that included only CAAKT and MMI to assess their ability to predict RACGP Fellowship exams was run.

Logistic regression analysis: The probability of failing each college examination as a result of failure in the internal formative exams was estimated using separate logistic regression analyses. Only significant internal formative assessment variables that were predictive of all college exams in the linear regression were included in the logistic regression model. The formative assessment scores were converted to categorical pass or fail variables. The pass mark for the internal assessment was 55% and scores below 55% were considered a fail. Gender and origin were adjusted for and fitted as covariates in the multiple linear and logistic regression models for continuous and categorical variables, respectively.

Patient and public involvement

Patients or the public were not involved in this study.


Assessment data of 376 trainees were included in the study. Demographic profiles showed that 58.2% were women, 1.9% of Aboriginal and Torres Strait Islander descent, 64.1% Australian medical graduates (AMG), 96.5% enrolled in the FRACGP Only training pathway and 61.7% had become fellows (online supplemental table 1).

All score categories in the study showed normal distribution, with the exception of SJT and CAAKT which were slightly negatively skewed (−1.58 and −1.5, respectively), as commonly observed in a typical SJT score distribution.20 24 Nevertheless, parametric analysis was conducted for all score categories, due to the large data set. Previous research had demonstrated the robustness of using parametric analyses for large data sets, except in instances of extreme skewness.25

Descriptive statistics revealed that in the pre-entry exams, mean scores for SJT, MMI and CAAKT were 84.09%, 31.78% and 68.05%, respectively. Mean scores for the internal formative assessment ranged from 49.24% in CR to 62.4% in MCQ2. The mean scores for AKT, KFP and OSCE were 72.09%, 60.85% and 71.79%, respectively (online supplemental table 2). The first attempt pass rates for the RACGP exams were 86.9% for AKT, 76.6% for KFP and 92.6% for OSCE (data not shown). In addition, univariate analysis revealed that performance was not influenced by gender in the AKT (χ2=1.940, df=1, p=0.167) and OSCE (0.391, df=1, p=0.532) exams. However, females obtained significantly higher scores than their male counterparts in the KFP exam (χ2=5.225, df=1, p=0.026). A similar trend was observed with origin (AMG/international medical graduates (IMG)), whereby performance in the AKT (χ2=0.391, df=1, p=0.530) and OSCE (χ2=0.071; df=1, p=0.817) exam were not influenced, but AMGs obtained significantly higher scores than IMGs in the KFP exams (χ2=11.990, df=1, p=0.001).

Table 1 presents the correlation coefficients between the pre-entry exams (SJT, MMI and CAAKT) and the college exams. Results showed that performance on SJT and MMI are significantly and positively correlated with performance on all college assessments (r values ranging from 0.13 to 0.31; p<0.05 to p<0.01). However, these correlations are weak, with MMI having the lowest correlation with KFP (r=0.13, p<0.05). On the other hand, CAAKT was only significantly correlated with performance on OSCE (r=47, p<0.01). Overall, SJT showed a stronger correlation with the college exams compared with the MMI. It is important to note that CAAKT and SJT did not yield any correlation value because they are similar.

Table 1

Pearson correlation coefficients between pre-entry and college exams

Table 2 presents the correlation coefficients between the internal formative examinations and the college exams. Overall, all other internal exams except clinical skills were significantly correlated with the college exams. Of all assessments, MCQ1 and MCQ2 showed stronger correlations with all the college exams (r values ranging from 0.37 to 0.53; p<0.01) compared with SAQ and CR (r values ranging from o.15 to o.37; p<0.05 to p<0.01).

Table 2

Pearson correlation coefficients between internal formative and college exams

Multiple regression analysis was conducted to ascertain the extent to which pre-entry exams and internal formative exams significantly predict scores in each college exam. Gender and origin were included in the multiple linear regression analyses and were not significant predictors of performance in the college examination. Table 3 shows the predictive relationship between the pre-entry examination scores and the college examination. Pre-entry SJT was predictive of college AKT examination (p<0.001), KFP (p<0.001) and OSCE (p<0.001). MMI was predictive of college AKT (p=0.002) and OSCE (p=0.001) but was not predictive of KFP. It was found that 0.7% to 12.5% of the variation in the college exams could be explained by the pre-entry exams (SJT and MMI). Pre-entry CAAKT was predictive of OSCE (p=0.001) but not of AKT and KFP. When the predictive role of both CAAKT and MMI (in one regression model) were assessed, MMI was predictive of AKT (p=0.027) and KFP (p<0.001) but lost its predictive power with OSCE (data not shown) which differs from the model with SJT.

Table 3

Comparison of predictive abilities of pre-entry exam scores on college exam scores

The results presented in table 4 show the predictive relationship between the internal formative examination scores and the college examination. All independent variables (internal formative exam scores) were entered into the model at once to indicate how well they are able to predict each dependent variable (RACGP exam scores) and how much variance each of these independent variables explains in the dependent variable over and above the other independent variables adjusting for gender and origin. MCQ1 was predictive of the college AKT examination (p<0.001), KFP (p<0.001) and OSCE (p=0.003). Similarly, MCQ2 was also predictive of the college AKT (p<0.001), KFP (p<0.001) and OSCE (p=0.014). CR was predictive of KFP (p=0.003), but not AKT and OSCE. There was no significant relationship between SAQ and CS and the RACGP exams. Overall, the internal formative assessments accounted for 20.5% to 37.6% of the variance in performance in the RACGP exams.

Table 4

Comparison of predictive abilities of internal formative exam scores on college exam scores

Table 5 shows that failing MCQ1 was predictive of failing all RACGP exams, while failing MCQ 2 was predictive of failing only AKT and KFP. Candidates who failed MCQ1 exam were 2.3 times more likely to fail the AKT exam (p=0.033, OR=2.316, 95% CI 1.071 to 5.006). Participants who failed MCQ2 were 2.8 times more likely to fail the AKT exam (p=0.005, OR=2.831, 95% CI 1.369 to 5.856). Similarly candidates who failed the MCQ1 and MCQ2 were 4.0 times and 3.2 times more likely to fail the KFP exam (p<0.001, respectively). Participants who failed MCQ1 were 3.46 (95% CI 1.259 to 9.545) times more likely to fail the OSCE component (p=0.016), however, failure in MCQ2 was not predictive of failing the OSCE component.

Table 5

Failing internal formative assessment as predictors of failing each college examination


This study aimed to assess the predictive role of the pre-entry SJT, MMI, CAAKT and the internally developed JCU GPT formative exams on trainees’ performance in the RACGP examinations (end of training assessment). The findings of the study showed that SJT and MMI were significantly correlated with exam performance in the end of training exams. However, only SJT was predictive of performance in all end of training exams while MMI was predictive of AKT and OSCE but not KFP. CAAKT was correlated with and predictive of only the OSCE. Evaluation of the predictive abilities of the internally developed formative assessments showed that only the scores from the orientation MCQ1 and the end of semester MCQ2 were predictive of performance in all the RACGP exams. Obtaining a high score on the MCQ1 and MCQ2 predicts a high score and an increased probability of passing the RACGP exams. Obtaining a fail as an outcome in the orientation MCQ1 was predictive of failing all RACGP examinations, while a failure in the MCQ2 was only predictive of failing the AKT and KFP components of the examinations.

Although SJT and MMI pre-entry assessments were correlated with the end of training examination, the correlations were comparatively lower than the internal formative assessment.20 In addition, the current pre-entry assessment (CAAKT) that replaced SJT was found to be less correlated with the AKT and KFP components of the RACGP college exams compared with the previous assessment (SJT). The pre-entry exams were also less predictive of College exam performance in comparison to the internal formative MCQ and CR exams. The significant correlation between the pre-entry exams and college exams may in part be related to similarities in construct.20 Patterson et al (2016) reported a significant correlation between clinical skills assessment and performance in OSCE.20 However, interviews have been shown to be inconsistent predictors of future performance.20 The poor predictive ability of the newly introduced CAAKT buttresses the fact that pre-entry assessments may not be the best predictors of performance in subsequent exams during training.11 Previous studies in undergraduate medical education have reported conflicting findings on the ability of interview/admission test scores to predict student academic performance during the undergraduate preclinical years and clinical years.26–28 While some authors reported that the admission tests were predictive of academic performance during the preclinical years,26 others stated that the test was not predictive of performance in preclinical and clinical years.27 28 There are suggestions that it might be more useful to focus on post-enrolment factors.21

The evidence from this study indicates that the in-training formative assessments may be useful programmatic ‘assessment for learning’ tools2 and also better predictors of academic performance in the summative assessment of the RACGP exams. While there are limited studies on the predictive role of in-training assessment in GP training, numerous studies in other specialities have identified the predictive role of in-training examinations in fellowship exams.29–31 The internal MCQs were designed as formative feedback tools to assist the trainees and trainers to assess mastery of content and competence while serving as a guidepost for the fellowship examination.32 Evidence from our study has enabled the JCU GPT programme to intervene early in providing additional support for trainees at risk of failing to progress through the training programme. Performance in the internal formative assessment has provided the most useful information and guidance for identifying trainees who may experience academic difficulty early in the programme.29 With early identification of an individual at risk of failing the fellowship exam, the more time the training programme has to intervene and provide additional support.29 Although the details are not included in this current study, those who failed the orientation MCQ1 had additional educational support programmes constructed for them which enabled them to identify and remediate knowledge gaps before sitting the end of semester MCQ2 and subsequent fellowship examinations.

Furthermore, timing of in-training assessments is very important as they play a significant role in early identification of poorly performing trainees who are at risk of failing RACGP assessment. It is generally stated that the earlier the issue is identified and clarified, the earlier a process of assistance/academic support can be provided.29 Current evidence suggests that in training assessments that are properly structured foster active learning.33 For these assessments to be effective, it is important for them to be ongoing activities rather than an end of training process. These in-training assessments can be used as academic markers to identify and provide support for struggling trainees, thereby, providing the opportunity for identified gaps to be addressed through early support programmes.34

This study reflects the benefit of ongoing in-training formative assessments that assess both clinical and theoretical knowledge35 at one GP RTO, the findings may be applicable to other GP training organisations in Australia. In addition, other speciality training organisations may want to consider investigating the role of in-training formative assessment using MCQs which have proven to be predictive of performance in fellowship examinations. As healthcare systems increasingly depend on service provided by junior or trainee doctors, early detection (and remediation) of poor performance is vital for both patient care and career progression of the individual doctor.36

Strengths and limitations

This study is the first study to investigate the predictive role of both the pre-entry exams, and in training assessment on GP trainees’ performance in the end of training assessments at a GP RTO that uses a distributed training model. However, our study has some limitations that need to be taken into consideration. First, the study was conducted at only one RTO. Second, generalisation of the study findings to other settings may be limited by the retrospective nature of the study. Future studies could consider investigating the predictive value of the in-training exams over a longer period of time. Additionally, the effectiveness of the academic support provided to underperforming registrars could be further explored.


The findings of this study suggest that the internally developed formative MCQs are valuable assessment instruments which could assist in the early identification of GP trainees who are at risk of underperformance in their RACGP fellowship exams and subsequently foster early planning for appropriate supervision and additional support for this group of trainees.


The authors acknowledge all the James Cook University General Practice registrars who participated in the study. Ms Allison Bonner, Ms Kimberley Martinsen and Ms Kath Paton are appreciated for facilitating the data collection process.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @D_Faith, @bmalauaduli

  • Contributors BSMA, PH and PAT conceived and designed the study. BSMA, PH and PAT supervised the data collection. BSMA and FA conducted the data analysis and PH and PAT contributed to the interpretation of the data. BSMA, MA and FA wrote the first draft of the manuscript. All authors contributed to the writing and review of the manuscript and approved the final version.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval Ethics approval (H6771) for this research was granted by the James Cook University Research Ethics Committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement No data available.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.