Article Text

Original research
Gap between real-world data and clinical research within hospitals in China: a qualitative study
  1. Feifei Jin1,
  2. Chen Yao1,2,
  3. Xiaoyan Yan2,
  4. Chongya Dong1,
  5. Junkai Lai1,
  6. Li Li3,
  7. Bin Wang1,
  8. Yao Tan1,
  9. Sainan Zhu1
  1. 1Department of Biostatistics, Peking University First Hospital, Beijing, Beijing, China
  2. 2Peking University Clinical Research Institute, Beijing, Beijing, China
  3. 3First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, Tianjin, China
  1. Correspondence to Dr Chen Yao; yaochen{at}


Objective To investigate the gap between real-world data and clinical research initiated by doctors in China, explore the potential reasons for this gap and collect different stakeholders’ suggestions.

Design This qualitative study involved three types of hospital personnel based on three interview outlines. The data analysis was performed using the constructivist grounded theory analysis process.

Setting Six tertiary hospitals (three general hospitals and three specialised hospitals) in Beijing, China, were included.

Participants In total, 42 doctors from 12 departments, 5 information technology managers and 4 clinical managers were interviewed through stratified purposive sampling.

Results Electronic medical record data cannot be directly downloaded into clinical research files, which is a major problem in China. The lack of data interoperability, unstructured electronic medical record data and concerns regarding data security create a gap between real-world data and research data. Updating hospital information systems, promoting data standards and establishing an independent clinical research platform may be feasible suggestions for solving the current problems.

Conclusions Determining the causes of gaps and targeted solutions could contribute to the development of clinical research in China. This research suggests that updating the hospital information system, promoting data standards and establishing a clinical research platform could promote the use of real-world data in the future.

  • qualitative research
  • health informatics
  • health policy

Data availability statement

Data are available upon reasonable request. Study protocol and original data are available on request by emailing the corresponding author.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • A qualitative approach and stratified purposive sampling method are used in this study to explore the perceptions of people in different roles regarding the gaps between real-world data and clinical research initiated by doctors in China and ensure diversity among the participants.

  • The findings of this exploratory analysis should be viewed as preliminary given the limitations of the methodology.

  • The participants were recruited from six hospitals affiliated with a university, and the sample size was limited.

  • The cultural background and experience of the authors may influence the interpretation of the data.


Real-world data (RWD), which are data related to patient health status and the delivery of healthcare, are routinely collected from various sources, such as electronic health records, and can be used to support many types of clinical research.1–4 In China, electronic medical records (EMRs) are the most commonly available RWD sources for doctors. EMR data could be used to evaluate clinical practice and discover which care protocols are the most successful, allowing such information to be disseminated to practitioners, who would then have evidence concerning how to quickly improve care and care outcomes.5 6 Given the value of EMR, many clinical studies using routinely collected data have been conducted worldwide.7–9 However, a gap remains between raw EMR data and qualified data that can be used for clinical research in China and other countries.10–12

In recent years, the global information technology industry has rapidly developed. The Chinese government actively promotes the development of information technology infrastructure in clinical research.13 14 Currently, over 90% of hospitals in China have advanced EMRs,10 15 and electronic data collection (EDC) is commonly used to improve the quality and speed of clinical research. The potential of EMR data is that these data can be easily accessed within hospitals and directly downloaded into clinical research files. However, in most countries, including China, the current problem is that EMRs generally constitute a separate system that is not directly connected to EDC systems,11 12 leading to the repeated manual transcription of EMR data into the EDC system during clinical research. This redundant process leads to lower efficiency due to manual transcription, poor data quality due to human error and difficult source data verification due to unconnected systems.16

The causes of this gap may involve complicated factors that are not commonly explored in qualitative studies. Some scholars report barriers to EMR data or big data usage by primary doctors or in specific disease areas, including inconvenient systems, poor data quality and technical limitations.17–19 Many countries are actively taking measures to improve the use of EMR data, such as promoting the use of data standards, increasing clinicians' awareness and developing new information technologies.11 12 20 21 Due to cultural differences, the factors that hinder the use of EMR data in hospitals in China are worth exploring. Therefore, our study investigates the gap between RWD and clinical research initiated by doctors in China, explores the potential reasons for this gap and collects different stakeholders’ suggestions.



Qualitative research allows us to understand the participants’ experiences.22 Constructivist grounded theory (CGT) provides an in-depth and holistic understanding of experience for theory construction by considering sociocultural contexts. This theory contends that researchers are intrinsically a part of a study and do not discover theory but rather construct it through interaction and interpretation with the participants.23 24 Exploring why EMR data cannot be directly downloaded into clinical research files is the concern of this study. Therefore, a qualitative research strategy guided by CGT was employed.

The research team conducted in-depth interviews and focus groups in six tertiary hospitals affiliated with Peking University Health Science Center, which is representative of high-tier research hospitals in China. Among the six hospitals, three were general hospitals and three were specialised hospitals. The interviews were conducted between July and August 2019. The study is reported following the guidelines of the Consolidated Criteria for Reporting Qualitative Research (COREQ) guidelines.25


We intended to interview the following three types of hospital personnel: doctors, information technology managers and clinical managers. Doctors who do not conduct clinical research may not be able to provide useful information. Therefore, the doctors included in this study had clinical research experience and could express their views regarding the subject of this study. The doctors were further divided into senior, intermediate and junior doctors according to their professional titles. The reasons for interviewing people with different roles within the hospital were to examine the issue from different perspectives and develop a comprehensive understanding.26 27 YC and ZSN contacted the hospitals to arrange the interviews. A stratified purposive sampling method was used to select representatives of different roles in each hospital.28 29 The recruitment of interviewees ended when the study reached information saturation, which occurred when the interviews with the participants in each role no longer generated new coding information.30

The inclusion criteria for the interviewees were as follows:

Inclusion criteria

  1. Formal staff of the hospital,

  2. Consent to record interviews and voluntary signing of informed consent, and

  3. Clinical research experience (doctor-specific).

Exclusion criteria

  1. Inability to provide at least 15 min of interview time.

Data collection

Face-to-face, semi-structured interviews were conducted.31–33 Depending on the doctors’ time and willingness, a focus group interview was arranged instead of one-on-one interviews to promote discussion and communication.34 In each interview, the interview time, location and basic information of the interviewees were collected. Three interview guides were designed for the different roles of the interviewees. We engaged in conversations with the interviewees regarding their issues to establish a broad description of the problem. During the interviews, we asked different questions based on the interviewee’s answers.

1) The interview guide for the interviews with the doctors was as follows:

  • ‘What type of clinical research do you frequently perform (for example, randomised trials, pragmatic clinical trials or observational studies)?’

  • ‘What ethical considerations do you usually have when you conduct research?’

  • ‘Please describe in detail how you collect data when conducting research.’

    • Why do you need to perform manual transcription? (If the respondents transcribe data manually)

    • How do you ensure the accuracy of the data?

    • How do you store and manage your source data?

  • ‘What, in your opinion, are the greatest barriers when you conduct clinical research?’

  • ‘What are your suggestions to better the process of conducting research?’

2) The interview guide for the interviews with the information technology staff was as follows:

  • ‘Please describe the process of data management in your hospital.’

  • ‘What are common requests to your department that doctors make with regard to clinical research?’

    • Why don’t you allow doctors to export data? (If the respondent mentions that the doctor wants data)

  • ‘What are your thoughts about doctors needing to manually transcribe EMR data to the clinical research database?’

  • ‘Do you have any suggestions to better doctors’ accessibility to data?’

3) The interview guide for the interviews with the clinical managers was as follows:

  • ‘Please describe the current state of clinical research in your department.’

  • ‘What are the main current obstacles to conducting clinical research?’

  • ‘Has your department employed external cooperation with information technology companies to accommodate your research needs?’

    • Do you have such a plan in the future? (If not)

  • ‘What are your suggestions for solving the problems preventing the efficient completion of clinical research, and what does the hospital need to do to help you?’

The interviewers were four postgraduate students. JFF and DCY were mainly responsible for the interviews, and TY and LHQ played supportive roles and were mainly responsible for the recordings. The interviewers were trained in a qualitative research course and had experience conducting interviews. The structured process of the interviews is presented in the online supplemental file 1.


All interviews and discussions were audio-recorded and transcribed verbatim by the two interviewers (TY and LHQ), and no private details of the respondents were recorded. Coding and memoing began in the first interview. The iterative process continued with data collection, coding and analysis, followed by further data collection and analysis until saturation was reached, which occurred when the last few interviews fit existing patterns and did not generate new ideas. A team of two trained qualitative researchers (JFF and WB) drew on the techniques of constructivist grounded theory while analysing the data.35–37 QSR NVivo V.12 software was used for coding. The team read all interviews and developed a structured coding tree that started with inductive open coding. Once the core categories emerged, deductive selective coding was performed. Open coding was performed independently by the two researchers, and the derived core categories were compared in multiple rounds of discussions until all members agreed.

Patient and public involvement

There was no patient or public involvement in this research.


In total, 52 respondents participated, and 51 respondents were included in the analysis because one record was lost due to a malfunction of the recording equipment (table 1). Thirty-nine in-depth interviews and four focus group interviews were conducted. In total, 42 doctors, 4 clinical managers and 5 staff from the information technology department were interviewed. In total, the following 12 speciality departments were included: trauma surgery, paediatrics, radiology, obstetrics and gynaecology, lymphoma, anaesthesiology, dermatology, general surgery, cardiology, stomatology, ophthalmology and oncology. The average age of the respondents was 36.55±8.23 years (range 24 to 58), and their average years of work experience was 11.84±10.5 years (range 1 to 46).

Table 1

Demographics of the participants

Gap between RWD and clinical research

The CGT framework generated from the three stages of coding and the 51 participants’ responses are summarised in the flow chart (figure 1). The advantage of EMR data is that these data can be easily accessed within hospitals and directly downloaded into clinical research files. However, we found three problems that hinder this goal, including a lack of data interoperability, unstructured EMR data and concerns regarding data security. Updating hospital information systems, promoting data standards and establishing an independent clinical research platform may be feasible suggestions for solving these problems.

Figure 1

Gap between real-worlddata and clinical research and its potential causes and suggestions. EMR,electronic medical record.

EMR data cannot be directly downloaded into research files. Most respondents noted that the most time-consuming and laborious part of the clinical research process is manual transcription. It is very difficult to query or export from a hospital’s electronic information system. Documentation is a time-intensive activity that the respondents may not have time to perform due to the difficulty of exporting data. Doctors may need to first manually transcribe data from the EMRs into an Excel file and then enter the data again into the research files.

In fact, these materials still need a manual entry process. Because most materials are stored in PDF or picture format, some data must be manually exported. - Doctor 310/311/312


Lack of data interoperability: According to the respondents, different platforms or hospitals may use different data standards, and it is difficult to share data across information systems in general practice. The data stored in the EMR uses data coding and data storage schemes that are incompatible with the coding and storage formats of EDC systems. The lack of interoperability between different terminologies or coding schema used in information systems causes a great challenge in EMR data usage.

Different platforms or hospitals may use different data standards, so data cannot be used across settings. For me, the value of data from other systems is limited. - Intermediate doctor 114

Data security concerns: The information department staff reported that any private patient-related data should not be exported to clinicians unless they adhered to the full protocol to obtain access to data. These respondents believe that they will be responsible for data leaks, and they do not know how doctors will use the data. Thus, EMRs and EDC systems cannot communicate. Strong security concerns generate complicated approval procedures for data access, leading speciality departments to use an external company platform to manage their data.

EMR contains the private information of patients, and we will not allow doctors to download the data because the hospital does not have policies. Therefore, clinicians can only collect and transcribe the data. - IT (information technology) 313

Unstructured EMR data: EMR is used for clinical practice but not for clinical research. The doctors reported that there are many problems when using hospital EMRs for clinical research data collection. Many laboratory test or imaging test data are stored in PDF format or image files in EMR and must be manually transcribed.

The information recorded in the EMR includes pictures, such as ECG or laboratory examination. The results in these pictures cannot be directly presented in EMR. - Senior doctor 601


Updating hospital information systems: The EMR and other information systems in the hospital should be updated such that most data are entered as numeric data in clinical reports and other hospital reports. A senior doctor said,

The EMR system has defects and needs to be improved; otherwise, a large amount of data stored in the system is meaningless. - Senior doctor 401

Establishment of an independent clinical research platform: EMRs and EDC systems cannot communicate in many countries. Thus, doctors and information department staff both believe that an independent clinical research platform is necessary, and some hospitals have already begun to attempt to implement such a platform. The platform needs to fully consider data security, which is a top consideration for information technology department staff and all parties in the hospital:

A very good thing is that our hospital plans to build a platform for clinical research. Data from electronic medical records can be directly converted to the platform. Clinicians do not need to collect data repeatedly, and the data in the platform are deprived of private information to ensure data security. - IT 313

Promote data standards: The respondents reported that the source data should be stored in the EMR with unified data standards and should be able to be transferred to other external clinical research platforms. The process of data standards should begin by implementing technology that can both store and transform paper records into EMRs through technology, such as optical character recognition software.

The country continues to introduce new versions, and everyone uses different standards. It is very important to develop and promote data standards - IT 116


This study investigated the gap between RWD and clinical research and explored some of the underlying causes of this gap and possible recommendations. Qualitative interviews were conducted in six high-tier research hospitals. These hospitals have excellent doctors with a strong sense of research and high-quality research conditions.

Through interviews, we found that manual transcription is still a common phenomenon and that EMR data cannot be directly downloaded into research files, which is the most significant obstacle for doctors and is experienced in many other countries. Some researchers have found that the lack of awareness or usability of EMR capability, poor-quality EMR data, technical limitations and data security may be the reasons preventing EMR data from being applied in clinical research.17–19

In this study, the first reason identified was the lack of data interoperability. The different platforms used in different hospitals present a barrier to integrating the data needed for research because it is difficult to match variables across settings and link patient-level data. Thus, it is important to promote data standards. Many countries have exerted efforts to improve the interoperability of EMR in different hospitals20 21 and adopt the standards from the Clinical Data Interchange Standards Consortium.11 38 EDC systems and EMR systems should be revised to ensure that first, the clinical systems use data recording formats that are primarily in coded formats.

The second reason is that EMR data are unstructured. EMRs are mainly used for clinical practice.39 40 EMR data are only stored in the system for a period, and then, the data are exported from the EMR system to PDF files. Therefore, when a doctor wants to search for the information of a past patient, he or she can only view the data in PDF format. These file formats lead to extraneous efforts involving manual transcription and high lag times for data retrieval for any purpose other than patient care. Thus, an updated hospital information system is necessary. Therefore, the EMR should be converted from narrative to codified data formats, and the EDC should be designed to use the same coding and storage formats as the EMR. Clinician notes should be entered as check lists or numbers in a clinical flow sheet. Artificial intelligence should also be used for unstructured text analysis. Many countries, such as South Korea,41 Japan38 and the USA,42 have attempted to upgrade their overall EMRs. For example, redesigning the EMR interface has rendered data extraction easier.19 Researchers have used image processing technology to change the storage of image data in EMR and obtained plain numerical data from files stored in PDF format.42 43 Text processing technology has been used to extract structured information from medical text.17 44 EMR combines patient-generated health data to observe more patient health data.41 45

Finally, regarding data security concerns, information technology departments are overly cautious regarding data. In 2017, the Chinese government issued the ‘The Cybersecurity Law of the People’s Republic of China’ and ‘Regulations for the Application of Electronic Medical Records’.46 According to document requirements, data sharing can only be performed when the safety of patients’ electronic data is ensured. An independent platform through which data can be securely accessed and managed by different key players in a hospital may be a feasible suggestion to drive communication and more efficient requests among the players.47–49 Such a platform could be a good way to integrate research-specific data and routinely collected data and de-identify the data to be used for statistical analyses to lower the risk of a data leak.50 51 According to the identity of the visitor, access authority to the platform should be granted at different levels.52 Implementing process management for project establishment, data security review, source data tracking and the publication of data results could greatly improve the efficiency, quality and traceability of research data.53

Several limitations of this study warrant attention. The participants were recruited from six hospitals. Unselected departments and doctors may have different views, which could result in selection bias. To minimise selection bias, stratified purposive sampling methods were used, many clinical departments were included and information saturation was assumed to be achieved. The cultural background and experience of the authors may have influenced the interpretation of the data, although the interviewers had experience and training in conducting qualitative research.


This qualitative study investigated the gap between RWD and clinical research based on constructivist grounded theory. Doctors, information technology managers and clinical managers were interviewed. The advantage of EMR data is that these data can be directly downloaded into clinical research files. However, the lack of data interoperability, unstructured EMR data and concerns regarding data security cause a gap between RWD and research data. These problems suggest that the benefits of EMR data storage for research are lost. Updating hospital information systems, promoting data standards and establishing an independent clinical research platform may be feasible suggestions for solving the current problems. In future research, we aim to explore and verify effective methods to solve these problems.

Data availability statement

Data are available upon reasonable request. Study protocol and original data are available on request by emailing the corresponding author.

Ethics statements

Patient consent for publication

Ethics approval

Ethical approval was obtained from Peking University Institutional Review board (No. IRB00001052-19052).


We thank all individuals who took the time to participate in our interviews. We also thank Xueying Li and Meixia Shang for their help with recruitment and Xueyan Han for comments regarding the early drafts of this work.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Correction notice This article has been corrected since it first published. The provenance and peer review statement has been included.

  • Contributors JFF, DCY, YXY and YC designed the study. JFF, DCY and TY collected the data. ZSN and YC contacted the respondents. JFF and WB analysed the data. JFF and LJK wrote the first draft of the manuscript. LL revised the manuscript. All authors contributed to the interpretation of the data and editing of the manuscript and approved the final manuscript. YC had full access to all data in the study and had final responsibility for the decision to submit for publication.

  • Funding This study was supported by the National Science and Technology Major Project of China (grant no. 2017ZX09304028-002).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.