Article Text


Chinese Obstetrics & Gynecology journal club: a randomised controlled trial
  1. Ilene K Tsui1,2,
  2. William C Dodson1,
  3. Allen R Kunselman3,
  4. Hongying Kuang2,
  5. Feng-Juan Han2,
  6. Richard S Legro1,
  7. Xiao-Ke Wu2
  1. 1Department of Obstetrics and Gynecology, Penn State College of Medicine, Hershey, Pennsylvania, USA
  2. 2Department of Obstetrics and Gynecology, First Affiliated Hospital, Heilongjiang University of Chinese Medicine, Harbin, China
  3. 3Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, USA
  1. Correspondence to Professor Xiao-Ke Wu; xiaokewu2002{at}


Objectives To assess whether a journal club model could improve comprehension and written and spoken medical English in a population of Chinese medical professionals.

Setting and participants The study population consisted of 52 medical professionals who were residents or postgraduate master or PhD students in the Department of Obstetrics and Gynecology, Heilongjiang University of Chinese Medicine, China.

Intervention After a three-part baseline examination to assess medical English comprehension, participants were randomised to either (1) an intensive journal club treatment arm or (2) a self-study group. At the conclusion of the 8-week intervention participants (n=52) were re-tested with new questions.

Outcome measures The primary outcome was the change in score on a multiple choice examination. Secondary outcomes included change in scores on written and oral examinations which were modelled on the Test of English as a Foreign Language (TOEFL).

Results Both groups had improved scores on the multiple choice examination without a statistically significant difference between them (90% power). However, there was a statistically significant difference between the groups in mean improvement in scores for both written (95% CI 1.1 to 5.0; p=0.003) and spoken English (95% CI 0.06 to 3.7; p=0.04) favouring the journal club intervention.

Conclusions Interacting with colleagues and an English-speaking facilitator in a journal club improved both written and spoken medical English in Chinese medical professionals. Journal clubs may be suitable for use as a self-sustainable teaching model to improve fluency in medical English in foreign medical professionals.

Trial registration number NCT01844609.

Statistics from

Strengths and limitations of this study

  • There were limited numbers of participants so the sample size was small.

  • The baseline questionnaire enquired about but did not quantify previous formal English language instruction.

  • The compliance rate of the self-study group was poor when providing written answers to questions.

  • Pre- and post-test examinations were modelled on standardised Test of English as a Foreign Language (TOEFL) examinations.

  • The appropriateness of using multiple choice tests modelled on US-based examinations to evaluate the medical knowledge of Chinese medical professionals is unclear.


English is increasingly becoming the lingua franca of medicine. Most international medical conferences are held in English and the journals with the highest impact are published in English. However, many international research institutions have driven growth in participation in international meetings and publication output1 without necessarily offering sustainable solutions for academics with limited English language capabilities who compete to present at international meetings and publish in elite international journals, thus limiting global scholarship and exchange with non-native speakers.2 In Chinese higher education, for example, there is significant pressure on doctoral science students to publish in English language academic journals.3 ,4 However, despite the rapid growth in the number of articles by Chinese scientists in international publications,5 instruction on writing within specialist disciplines is still lacking and language remains a barrier for many students who wish to convey their discipline-specific concepts in English while avoiding plagiarism and the need for language editing.6

Consequently, there is an acute need for non-English speaking medical professionals to develop their written and oral English communication skills so they can participate in these academic endeavours. Previous studies have suggested that it is easier to learn English if it is taught with a focus on a particular discipline rather than on overall language fluency.5 Therefore, the purpose of this randomised educational trial was to determine if participation in a journal club based on articles and specifically designed materials freely accessible through the website of the journal Obstetrics & Gynecology improved comprehension and written and spoken medical English in a sample of Chinese medical professionals. If the findings from the study are positive, this suggests that foreign colleagues should engage with native English speakers and that academic collaboration and innovative methods for teaching English for a specific purpose (ESP) should be encouraged.

Materials and methods

The study population consisted of 52 medical professionals who were residents or postgraduate masters or PhD students at the Department of Obstetrics and Gynecology, Heilongjiang University of Chinese Medicine in Harbin, China, who consented to participate in an 8-week educational intervention. Participants had limited experience with Western medicine. This randomised controlled trial with a parallel design was exempt from approval by the Institutional Review Board at the Pennsylvania State University College of Medicine (45 CFR 46.101(b)(1)) or by the review board of the host institution in China at The First Affiliated Hospital, Heilongjiang University of Chinese Medicine based on its classification as educational instruction and strategy research. All participants gave written informed consent with potential harms cited as possible stress from taking examinations or participating in a journal club. Tests results were anonymised and performance was kept strictly confidential so as not to impact on the student's professional reputation.

Participants were eligible if they were Chinese medical professionals specialising in gynaecology; in China the practice of obstetrics and gynaecology is split and we focused on gynaecology specialists in this study. The sole exclusion criterion was self-reported fluency in English. Consenting participants completed a baseline demographic questionnaire and were randomised to either (1) an intensive treatment arm with 24 journal club sessions led by a bilingual (English and Mandarin) medical student (IKT) from the USA over the course of 8 weeks or (2) a self-study arm with self-directed learning. One of the authors (ARK) developed the randomisation scheme to randomly assign participants to the intervention groups in a ratio of 1:1 which was unknown to the other authors or participants. Another author (IKT) matched an alphabetical student roster to this randomisation list 3 days before the first meeting of the journal club. No other characteristics about the students were known apart from their name. Randomisation was concealed from study participants until all had been assigned to an intervention group.

Participants took a three-part baseline examination (multiple choice, written and oral) to assess medical English comprehension and expression; the test was modelled on standardised language examinations such as the Test of English as a Foreign Language (TOEFL). A second examination with a similar format but different content was conducted after the intervention. The first multiple choice test consisted of 15 questions with five possible answers each, adapted from the Association of Professors of Gynecology and Obstetrics (APGO) Undergraduate Web-Based Interactive Self-Evaluation (uWISE) practice examinations and was read aloud to all participants during a 1 h group session. Sample test questions are provided in the online supplementary file. Participants then selected one of five multiple choice answers and recorded their responses. Participants did not have access to the questions in a written format. Two additional open-ended questions were selected from the study guide of the Obstetrics & Gynecology journal club: one was read aloud to the whole group and students had 10 min to provide written responses, while the other was read privately to each student whose oral response was recorded. At the time of baseline testing, no articles had been discussed in the journal club, although that selected for the oral examination was one of the first listed on the class syllabus that had been distributed in advance. Similarly, the article selected for the examination after the intervention was one of the last journal club articles listed, although students did not know in advance which one would be chosen. Test questions addressed vocabulary, grammatical competence, comprehension and verbal fluency.

Following the baseline examinations, both groups received a class syllabus with 24 selected gynaecological articles and sample questions from the Obstetrics & Gynecology journal club website. Articles covered 15 different gynaecological topics as identified by the APGO Medical Student Educational Objectives.7 Articles were selected based on website availability and student interest as perceived by the journal club facilitator (IKT). Students accessed all material independently of the Obstetrics & Gynecology website. The journal club participants attended intensive 2 h sessions every other day, which consisted of reading selections from the assigned article aloud and discussing questions from the website's study guide. The self-study group followed the same syllabus but did not attend classes. There were no restrictions on the use of translation software, nor was there an accurate way to monitor its use. As a measure of compliance, the self-study students were asked to submit written answers (which were not graded) to two questions from the study guide for each article by the day it was to be presented at the journal club. All data were collected at the host institution in Harbin, China. The journal club ran for 8 weeks from May through July 2013.

The primary outcome measured was the change in score from baseline to after the intervention on the multiple choice examination. Pre-specified secondary outcomes included change in score for the written and oral examinations. Two independent, masked evaluators (WCD and RSL) graded the written and oral examinations based on a rubric adapted from the respective TOEFL examinations at study conclusion, and so feedback was not provided to the participants before the end of the study. The evaluators were blinded to the identity of the subject, group assignment, and whether the test they were grading was the baseline or end-of-study examination.

Each masked evaluator independently graded the written responses from 0 to 5 on language use and topic development for a total maximum score of 10. Written responses were presented and evaluated in random order and scores were assigned to students based on their student ID number only. Spoken responses were graded from 0 to 4 on delivery, language use and topic development for a total maximum score of 12. Masked evaluators assigned scores to students based on their spoken student ID number only; recorded responses were presented in random order and not segregated by treatment group. A higher score indicated better comprehension and fluency of written or spoken English.

Before study initiation, a difference in the means of three points between the two groups was judged an educationally meaningful difference based on a 15-point examination. Further, we assumed the SD would be three points. Based on these assumptions, a sample size of 23 participants per group provided 90% power to detect a difference of three points between the two groups using a two-sided test having a significance level of 0.05. However, we anticipated a 10% participant attrition rate and so the total sample size was increased to 52 participants.

Linear mixed-effects models were used to assess differences between and within groups regarding the primary outcome (change in multiple choice scores) and secondary outcomes (change in writing and speaking scores). Linear mixed-effects models are an extension of regression models that account for the within-subject correlation inherent in longitudinal studies. Inter-rater reliability between the two independent evaluators for the writing and speaking examinations was assessed using the weighted kappa statistic. All hypothesis tests were two-sided and all analyses were performed using SAS software V.9.3 (SAS Institute, Cary, North Carolina, USA).


As shown in figure 1, 52 Chinese medical professionals participated in the study with 46 completing all sections. Participants were recruited from March 2013 to May 2013 from a pool of 60 students at the host institution. Six participants failed to complete the study (four in the journal club group and two in the self-study group) for an 11.5% attrition rate. Participants were lost to follow-up or were unable to complete the course and attend the final day of testing due to conflicting professional duties. Compliance for the self-study group, as measured by the submission of answers to two study guide questions per article, dropped from 100% in the first week, to 60% (15 out of 25) by mid-study and to 20% (5 out of 25) by study conclusion at 8 weeks. In comparison, attendance for the journal club group dropped from 100% in the first week, to 96% (26 out of 27) by mid-study and to 77% (20 out of 27) by study conclusion. The facilitator (IKT) sent email reminders directly to students and also asked attending physicians to encourage student participation. All students completed the final test regardless of compliance level and their results were included in the final analyses.

The baseline characteristics of the two cohorts show similar levels of self-reported English proficiency, as well as other demographic characteristics including age, highest degree conferred and years of formal English instruction (table 1). Of note, the vast majority of study participants were women, reflecting the fact that culturally, practicing obstetricians and gynaecologists in China are predominantly female. The mean number of correct multiple choice responses increased in both groups, but there was no statistically significant difference between them (table 2). However, there was a statistically significant difference between groups regarding the mean written and speaking scores (table 2). For the self-study and journal club groups, respectively, the mean correct written scores (SD) were 5.52 (2.36) and 4.72 (3.10) at baseline, and 4.72 (2.32) and 6.98 (2.20) after the intervention, while mean correct speaking scores (SD) were 5.33 (2.37) and 5.63 (2.36) at baseline, and 4.89 (2.70) and 7.11 (2.16) after the intervention.

Table 1

Baseline demographic characteristics of both cohorts self-reported before study randomisation

Table 2

Mean differences in scores between the pre- and post-intervention examinations in the self-study and journal club groups

There was also a statistically significant improvement in the journal club group across all three language competencies (table 3). There was a statistically significant improvement in the self-directed group on the multiple choice examination, but not for the writing or speaking components.

Table 3

Mean differences in scores between the pre- and post-intervention examinations in the self-study and journal club groups

Table 4 lists the articles selected from the journal Obstetrics & Gynecology. Articles covered 15 different gynaecological topics as identified by the APGO Medical Student Educational Objectives.

Table 4

List of articles selected from the journal Obstetrics & Gynecology

For this study, the inter-rater reliability of the two independent raters for evaluating pre- and post-examination written scores had weighted kappa values of 0.67 (95% CI 0.55 to 0.79) and 0.71 (95% CI 0.62 to 0.81), respectively. Weighted kappa scores for pre- and post-examination speaking scores were slightly lower at 0.58 (95% CI 0.45 to 0.72) and 0.57 (95% CI 0.42 to 0.72), respectively.


Our study results indicate that participation in a journal club significantly and selectively improves the written and spoken medical English of Chinese obstetrics and gynaecology health professionals as identified by a significant improvement in TOEFL scores. This suggests that holding frequent journal clubs may be one method to increase English comprehension and speaking skills in foreign medical professionals. However, other factors such as the students’ concurrent clinical training may also play a role in individual content-specific test performance.

One objective of our study was to determine if participating in a journal club would improve an individual's knowledge base and comprehension over independently reading journal articles. A large study by the Royal College of Physicians and Surgeons of Canada demonstrated that reading the medical literature best stimulated self-directed learning activity and likely resulted in changed practice patterns, despite available educational seminars and opportunities for group discussion with peers, as in a journal club setting.32 On the other hand, a randomised controlled trial suggested that surgeons who participated in an internet-based journal club improved their critical appraisal skills more than a control group who only read clinical articles, possibly due to the lack of accountability in self-directed learning.33 As regards writing skills, a few studies have identified strategies employed by non-native speakers writing for English publications including using mentoring services provided by journals, attending writers’ workshops provided by professional societies, recruiting visiting scholars or commissioning paid editors.2 ,5 Although these solutions are helpful, there is still a need for students themselves to develop transferrable and sustainable writing skills that are adapted to the local context and academic requirements. Some of the continuing difficulties in language acquisition in Chinese higher education are due to the fact that historically there has been a division between science/technology teaching and English language teaching, further limiting opportunities for collaboration. A journal club as a vehicle for language acquisition seeks to combine both disciplines.

Another aim was to quantify differences in comprehension, oral and spoken English between the two groups as assessed by multiple choice and modified TOEFL tests. While studies have described the journal club's effectiveness in teaching critical appraisal as measured by subjective self-assessments or self-created pre- and post-tests,34–37 little research has evaluated the journal club method for specifically improving oral and spoken comprehension of medical English. Further, a literature review found no randomised trials quantifying the impact of journal clubs used as a tool for teaching medical English and improving oral and written comprehension in non-English speakers, although a commentary has explored the benefits of and barriers to organising journal clubs in developing countries.38

The strengths of this study include its randomised design, its reproducible model, the use of objective article study guides from the Obstetrics & Gynecology website, and the standardised TOEFL grading rubric. The breadth of articles provided an appropriate and broad academic context for health professionals to learn both medical vocabulary and grammar. Additionally, both pre- and post-intervention examinations were adapted from uWISE, a professional question bank used by some medical students to prepare for the National Board of Medical Examiners (NBME) examination in obstetrics and gynaecology. The grading rubrics for both the written and speaking portions were adapted from the respective TOEFL rubrics with comparable score reliability estimates. Weighted kappa scores for the pre- and post-intervention written scores were 0.67 and 0.71, respectively, and 0.58 and 0.57, respectively, for the pre- and post-intervention speaking scores. Score reliability estimates for the TOEFL writing and speaking examinations are comparable at 0.74 (SEM 2.76) and 0.88 (SEM 1.62), respectively.39 The preferred TOEFL kappa value between automated and human scoring is 0.70, which represents the threshold at which the signal outweighs the noise in prediction.40

One of the limitations of our study is the fact that it is not clear if it is appropriate to use a multiple choice test to evaluate medical knowledge acquisition and language comprehension. Since the clinical question stems are modelled on US-based examinations that test knowledge of guidelines and treatment, they may not have been an appropriate test vehicle for a population of Chinese medical professionals with limited education in Western medicine. These participants have an undergraduate background in Traditional Chinese Medicine (TCM), with a curriculum that is 40% based on Western medicine. Although this strengthens the integration of Eastern and Western medicine, it may have limited the efficacy of our examinations. An additional limitation is the lower compliance rate of the self-directed control group compared to attendance at the journal club.

The Obstetrics & Gynecology journal club may provide an efficient vehicle for learning both written and spoken English and acquiring content-specific medical knowledge. Further research should assess the effect of native English-speaking journal club facilitators on medical English improvement, as this may be a more sustainable model with potentially greater reproducibility than utilising bilingual US professionals. Future research could also focus on using the journal club model to teach manuscript preparation for obstetrics and gynaecology articles for English language medical journals and more broadly, to also evaluate the effect of interactive educational activities on learning outcomes in professional contexts.


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Related Data


  • Based on a poster presentation at the 62nd Annual Meeting of the American Congress of Obstetricians and Gynecologists, Chicago, Illinois, USA, 24–28 April 2014.

  • Contributors IKT designed the study, prepared data collection tools and surveys, conducted the journal club, collected data, and drafted and revised the paper. WCD designed the study, served as a blind evaluator of data, and drafted and revised the paper. ARK wrote the statistical analysis plan, cleaned and analysed the data, and revised the paper. HK and FJH were international contacts in the setting where the research was conducted and helped to enrol participants and implement the journal club. RSL designed the study, served as a blind evaluator of data, and drafted and revised the paper. He is guarantor. XW designed the study, served as an international contact at the research setting and helped to enrol participants in the journal club.

  • Funding (1) 2011 Annual Project (No. 2011TD007) of Excellent Innovation Talents by Heilongjiang Province Universities; (2) 2013 Annual Funding (No. 19, 25) for Postgraduate Academic Exchange (Summer School and Innovative Knowledge Competition) in Heilongjiang Province Universities; and (3) Thousand Talents Program, University of Chinese Academy of Sciences.

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethics approval The First Affiliated Hospital, Heilongjiang University of Chinese Medicine approved this study.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Additional data can be accessed via the Dryad data repository at with the doi:10.5061/dryad.75qh4

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.