Article Text

Download PDFPDF

Barriers and facilitators to data quality of electronic health records used for clinical research in China: a qualitative study
  1. Kaiwen Ni,
  2. Hongling Chu,
  3. Lin Zeng,
  4. Nan Li,
  5. Yiming Zhao
  1. Research Center of Clinical Epidemiology, Peking University Third Hospital, Beijing, China
  1. Correspondence to Dr Yiming Zhao; yimingzhao115{at}


Objectives There is an increasing trend in the use of electronic health records (EHRs) for clinical research. However, more knowledge is needed on how to assure and improve data quality. This study aimed to explore healthcare professionals’ experiences and perceptions of barriers and facilitators of data quality of EHR-based studies in the Chinese context.

Setting Four tertiary hospitals in Beijing, China.

Participants Nineteen healthcare professionals with experience in using EHR data for clinical research participated in the study.

Methods A qualitative study based on face-to-face semistructured interviews was conducted from March to July 2018. The interviews were audiorecorded and transcribed verbatim. Data analysis was performed using the inductive thematic analysis approach.

Results The main themes included factors related to healthcare systems, clinical documentation, EHR systems and researchers. The perceived barriers to data quality included heavy workload, staff rotations, lack of detailed information for specific research, variations in terminology, limited retrieval capabilities, large amounts of unstructured data, challenges with patient identification and matching, problems with data extraction and unfamiliar with data quality assessment. To improve data quality, suggestions from participants included: better staff training, providing monetary incentives, performing daily data verification, improving software functionality and coding structures as well as enhancing multidisciplinary cooperation.

Conclusions These results provide a basis to begin to address current barriers and ultimately to improve validity and generalisability of research findings in China.

  • electronic health records
  • data quality
  • clinical research
  • qualitative study

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • The qualitative approach used in this study allowed for in-depth exploration of healthcare professionals’ experiences and perceptions of barriers and facilitates to data quality of electronic health records used for clinical research in China.

  • Purposive sampling was used to select potential participants in our study. Age, gender, education background and medical specialties were taken into account in the recruitment process in order to ensure diversity of participants.

  • The participants were recruited from four hospitals in Beijing and that the generalisability of the findings may be limited because of the small sample size.

  • The cultural background and experience of the authors may influence the interpretation of the data.


China has lagged behind in the adoption of electronic health records (EHRs) in healthcare institutions but the use of EHR systems is now rapidly growing as a result of the New Medical Reform.1 2 The implementation of health information technology, especially EHRs, has been identified as a primary focus for this New Medical Reform, with a large potential impact on modern hospital management and care quality.3 4 The widespread adoption of EHRs has made it easier for researchers to access and aggregate longitudinal patient data. There are significant opportunities to use EHR data for research purposes, including estimation of disease incidence or prevalence, comparative effectiveness studies and safety surveillance.5–8

Although randomised controlled trials (RCTs) are recognised as the ‘gold standard’ to assess the efficacy of interventions and treatments, they are costly, time-consuming and do not reflect real clinical practice.9 The data from EHR-based research are collected under real-world circumstances, which improves representativeness as well as reducing cost and effort.10 Evidence generated from EHR-based studies can complement that of RCTs and may provide important information in real-world settings where RCTs are not feasible.10 11 However, a major concern about EHR-based studies is the quality of data. As EHRs are used for patient care rather than specifically for clinical research, data quality may not meet the criteria for research standards to be met and this may have a negative impact on the validity of research findings.12 13

A key concept in the field of data quality is ‘fitness for use’. Specifically, the same database may be considered as poor quality for one purpose but high quality for another purpose, which means data quality should be assessed within the context of the intended use.14 15 Considering the topic of this study related to clinical research, the data quality refers to the fitness for use of EHR data for specific research in the paper. Poor data quality may derive from various origins, which can occur at any or each stage of data flow, including recording, migration, extraction and translation.16 Previous studies have identified some factors that influence data quality, particularly concerning clinical documentation and coding in EHRs.17 18 Although the quality of EHR data has been assessed in other countries, few Chinese studies exist.19

The use of EHRs for clinical research should take into account the contextual features that might influence data quality in China. There are hundreds of vendors of software companies and research institutions to provide EHR systems for hospitals, resulting in partial data standardisation.20 21 Additionally, the difference between Chinese and English languages makes some existing extraction methods for dealing with unstructured free-text inapplicable to Chinese EHRs.22 In light of its increased use of EHRs for clinical research, more knowledge is needed on how to assure and improve data quality. The aim of this study was to explore healthcare professionals’ experiences and perceptions of barriers and facilitates to data quality of EHRs used for clinical research in China.


Study design

The research is reported according to the Consolidated Criteria for Reporting Qualitative Research guidelines.23 A qualitative study was conducted at four tertiary hospitals in Beijing with between 970 and 1500 beds from March to July 2018. These four hospitals provide medical services across all medical specialisations. EHR systems in these hospitals are provided by four different vendors and do not have research-specific functionality. This study was based on interviews with healthcare professionals at the four hospitals in order to achieve varying perspectives of data quality of EHRs used for clinical research. Each participant was informed about the project and was free to withdraw from the study at any time.

Participant recruitment

Purposive sampling was used in this study. Healthcare professionals meeting the following criteria were eligible for the study: having at least 3 years of work experience; participating in using EHRs for clinical research, including recording, extraction and translation; having experience in publication of clinical research based on EHR data or in reporting relevant research at conferences; willing to participate this study. Age, gender, education background and medical specialties were taken into account in the recruitment process in order to ensure diversity of participants.24 Potential participants were known by researchers through meetings and expert recommendations. The individuals targeted for enrolment were contacted by email or WeChat, a popular social network application in China that is similar in function to Twitter.

Data collection

Face-to-face semistructured interviews were conducted by two research assistants (KN and HC), a male and a female, each with a PhD. Both interviewers received training on interview methods and have experience in conducting qualitative research. Interviewers had no previous relationship with the participants prior to recruitment. Participants did not know contact details of interviewers and the study purpose prior to this study. The semistructured interview guide was constructed to begin with a general question regarding participants’ experience of using EHR data for clinical research. Participants were asked to briefly describe one or two studies using data extracted from EHRs in which they had been involved. They were then asked how they extracted and managed data from EHR systems. Finally, participants were asked about two key questions, ‘In your opinion, what were barriers to data quality of EHR data used for your studies?’ and ‘Please provide some suggestions to improve data quality of EHRs used for clinical research.’ Each interview lasted between 25 and 65 min and was conducted at participants’ office or meeting room. Field notes were also taken. No repeat interviews were conducted and transcripts were not returned to participants for comments.

Data analysis

The anonymised audio records were transcribed verbatim by an external transcription company. QSR NVivo V.10 was used to assist data management and analysis. The transcripts were analysed using inductive thematic analysis.25 The analysis consisted of six phases: familiarisation, generating initial codes, searching for themes, reviewing themes, defining and naming themes and producing the final results. To ensure reliability and validity, KN and HC independently coded the transcripts and met regularly to discuss the identification of themes. Interviews and the review of transcripts continued until no new themes emerged, at which point we determined theoretical saturation had been achieved. Final themes were reviewed by all research team members to confirm the interpretation of data in a discussion meeting.

Patient and public involvement

No patients or members of the public were involved in the design and implementation of this study. The study protocol and results are available on request.


Characteristics of study participants

Nineteen healthcare professionals were interviewed in the study. Two potential participants refused to take part due to heavy workload. The sample characteristics of the study are showed in table 1. There were seven respondents in one of the hospitals, and four respondents in each of the other three hospitals. Respondents were from a broad range of medical specialties, including urology, radiology, haematology, paediatrics, orthopaedics, neurology, pneumology, geriatrics, pharmacy and general surgery. The range of clinical research undertaken by respondents included quantifying the value of certain diagnostic tests, identification of disease risk factors and assessment of outcomes of interventions.

Table 1

Demographics of the participants

Barriers to data quality

Factors that influenced the data quality of EHR-based clinical research were identified and organised into four themes: healthcare systems, clinical documentation, EHR systems and researchers. Table 2 shows a detailed list of themes and subthemes of the perceived barriers. Illustrative quotes are in table 3.

Table 2

Overview of key themes and subthemes

Table 3

Quotes related to barriers and facilitators of data quality of EHR-based clinical research

Healthcare systems

Heavy workload

Most participants mentioned that heavy workload would affect the quality of clinical documentation in EHR systems. They pointed out that it takes too long to input each item carefully into EHRs, which seems unlikely to ensure consistent data quality in a high-intensity work environment. For example, one orthopaedic doctor noted, ‘One day, you have to take charge of three or four new inpatients, you have to go to surgery, and then you have to do some of your own things, so the quality of EHR data can be affected.’ One haematologist said, ‘There are about 70 patients in a morning. It is time-consuming for us to record patient information into the EHR system.’

Staff rotations

Some participants indicated that staff rotations were an important barrier to the data quality of EHR-based studies. Teaching hospitals are responsible for training medical students and clinicians from other hospitals in rotations in China. One orthopaedist mentioned, ‘Many healthcare workers like me are just becoming familiar with the tasks of one department and then are transferred to the next department.’ One participant noted, ‘There are some routine behavior questionnaires for each new inpatient and are usually completed by medical students or visiting clinicians, but these healthcare workers are often changed. Staff rotations have a negative impact on quality of EHR data.’

Clinical documentation

Lack of detailed information for specific research

EHRs are optimised for the storage of information needed for patient care but may not be sufficient for specific clinical research. One general surgeon said, ‘Given the study requires laboratory values of one day, three days and five days after surgery, these data are not routinely recorded in EHRs.’ Some participants noted that EHR data cannot have as much detailed information about patient’s family history or medical history as prospectively collected data for studies.

Variations in terminology

Almost all participants pointed out that the lack of data standardisation is a key barrier to data quality. For efficiency, healthcare workers may use shortened terminologies or acronyms to refer to diagnoses and interventions for clinical uses. One pharmacist said, ‘This drug we studied is very common in hospitals, but different physicians are accustomed to using different terms and formats to indicate the drug name in EHRs.’ One gerontologist said, ‘The use of International Classification of Diseases is non-standard and field definitions are often modified for clinical purposes by clinicians. This meets the needs of routine clinical records, but partially affects data standardization and has challenges for research purposes.’

EHR systems

Limited retrieval capabilities

The usability of EHRs for clinical research relies on the ability to retrieve complete and accurate information. Hospitals choose from different vendors to provide EHR systems, and they have different levels of retrieval capabilities and are unlikely to be chosen with data extraction for research purposes as a factor. One ultrasonologist noted, ‘Our EHRs cannot use pathological diagnosis results as search terms to obtain desired patient populations.’

Large amounts of unstructured data

EHR systems incorporate many free-text fields with unstructured data, including physician notes, images, medications and so on. Participants mentioned, ‘Some important information for clinical research is in the form of unstructured data in EHR systems. The data has to be extracted from each individual health record manually by researchers.’ Data quality of EHR-based studies may be affected due to human errors in these processes.

Challenges with patient identification and matching

Only one of the four hospitals could link outpatient data and hospitalisation data in our study. Outpatient data can provide significant patient follow-up information after interventions or treatments. However, patients receive a new patient identification number each time when they receive outpatient services. As data from different sources cannot be linked by matching patient identification numbers, it is difficult for researchers to optimally use this important combined information. One ultrasonologist said, ‘Unfortunately the patient identification numbers for the same patient from the inpatient and outpatient settings are not the same, which makes it more difficult to combine information from these settings for clinical research.’


Problems with data extraction

Many participants extracted data from EHRs and entered it into the Excel software by themselves. A gerontologist explained, ‘It always takes one or two months for the staff in the hospital information department to assist with the extraction of data, due to time constraints concerning the operation and project development of the hospital information system, and data extraction is not their main task. I therefore prefer to extract data from EHR system by myself.’ One pharmacist noted, ‘The data extracted by staff in the hospital information department often have various problems, you need to communicate with them again and again.’ Additionally, researchers are required to manually extract information from free-text fields and scanned documentation that stored as images in EHR systems, which is a time-consuming process and may cause errors that affect data completeness and consistency.

Unfamiliar with data quality assessment

Most participants agreed with the importance of data quality, but they were not familiar with how to systematically assess data quality of EHR-based studies. The lack of understanding of features of EHR data may have impacts on data quality improvement and result interpretation. One urologist said, ‘I am not quite sure how to assess data quality. I usually mainly consider data completeness.’

Facilitators to improving data quality

All of the interviews included discussions on how to improve the data quality of EHR-based clinical research. Table 2 also presents themes and subthemes of the perceived facilitators.

Training of staff

Training of staff was a perceived facilitator to data quality of EHR-based studies. Participants mentioned that it is important for healthcare workers to receive education and training regarding the utilisation of EHR system functions and coding schemes, as systematic documentation of clinical data for regular healthcare delivery is the basic prerequisite before further enhancing data quality. One pharmacist said, ‘Targeted training is often seen as a way to improve data standardization for research purposes.’

Need for monetary incentives

Some participants suggested that there was a potential for monetary incentives to improve data quality. Under the premise of meeting clinical purposes, monetary incentives can stimulate healthcare workers to improve standardisation and accuracy of EHR data for research purposes. One orthopaedist said, ‘Giving a reward to resident physicians and medical students may improve the data quality of routine behavior questionnaires, so that other healthcare professionals can use these baseline data for clinical research.’

Performing daily data verification

Performing data verification by research nurses was stated as a way to improve the quality of EHR data. Several participants pointed out that it may be possible for a research nurse to check the EHR data of the ward on the day it is entered. If there are obvious problems, they can perform follow-up with clinicians or patients.

Improving software functionality and coding structures

The EHR system needs to be developed to make coding options more convenient and effective to clinicians than free-text fields. Compared with unstructured data, structured data can be easily extracted electronically, which avoids human errors. One urologist mentioned, ‘(I) suggest that data in the departments of radiology and pathology can be modularized in EHR systems, so that data screening and extraction can be facilitated.’

Enhancing multidisciplinary cooperation

Many participants mentioned that enhancing multidisciplinary involvement is an important factor for conducting clinical research based on EHRs. The research team should include the following: clinician, nurse, IT staff, clinical epidemiologist and data professional. One neurologist said, ‘Conducting a real-world study (like EHR-based clinical research) requires teamwork. The overall quality of data will improve if the team can communicate and cooperate to ensure the quality of each stage.’


In this study, we tried to identify barriers and facilitators to data quality of EHR-based clinical research as reported by healthcare professionals in China. The main themes included factors related to healthcare systems, clinical documentation, EHR systems and researchers. Considering the increasing trend of use of EHR data for clinical research, understanding the factors that influence the quality of data may alleviate current barriers and ultimately improve validity and generalizability of research findings.

Heavy workload was identified as a key barrier to the quality of EHR data. Over 1.56 billion patient visits were made to hospitals in the first half of 2016 in China.26 It is not surprising that some participants considered that the quality of outpatient data might not meet the needs of clinical research. This point is consistent with previous discussions of Pourasghar et al27 that high workload was one of main barriers to the quality of documentation in EHR systems, which was improved in areas where nurses were involved. Staff rotations were considered to affect adequately record clinical information for patients in our study, which may further affect the quality of secondary use of EHR data for research purposes. Other studies have also reported that documentation habits had an impact on data quality, and have suggested that training and feedback can be adopted to mitigate the variability of clinical documentation.28 These feelings were similar to those presented by participants in our study. Additionally, participants mentioned that providing monetary incentives may be an effective way to improve data quality, encouraging clinicians and medical students to be more responsible for recording in EHR systems. Nevertheless, incentives could influence clinical practices differently depending on the amount and business model.29 Providing monetary incentives may be an effective way to ensure the quality of some EHR data, but may also lead to selective recording behaviours that attract higher incentives.30

Participants perceived that software functionality and coding schemes used in hospitals may be not comprehensive sufficiently, which affects the data quality of EHR-based clinical research. EHR data are difficult to standardise for data integration and interoperability in China and this is likely to be the case in other countries.20 31 32 This is mainly because hospitals choose their own vendors to develop different EHR systems. The available data entry functions and coding systems built into EHR systems determine what is recorded. Previous studies demonstrated that EHR systems developed by different vendors can result in considerable differences in the quality of prescription records and episodes of care.33 34 It also leads to problems about interoperability between different medical terminologies, affecting the potential for data sharing and assurance of consistency and quality of large-scale EHR-based studies that need to link EHR data.

Healthcare professionals usually conduct EHR-based clinical research independently in China.26 However, using EHR data for clinical research requires multidisciplinary involvement. Clinicians should acknowledge the limitations of their expertise and cooperate with other professionals.35 For example, relating to manual data extraction from EHRs by clinicians, collaboration with IT staff and using electronic data extraction methods can avoid potential human errors. Additionally, although participants agreed with the importance of data quality of EHR-based clinical research, they had no idea about how to assess the quality of data. Evidence showed that studies based on EHR data of unknown quality have negative impacts on care quality and patient safety.36 37 It is important for researchers to achieve competence in the methods of assessing and reporting data quality findings, which helps EHR data users and consumers understand the limitations of data and results based on these data. However, there are still inconsistencies in the dimensions of data quality and the methods used to assess these dimensions both in China and in other parts of the world.13 38 The lack of consistency of methods and processes of data quality assessment (DQA) makes it difficult to compare the results of DQAs.13 39 Thus, DQA should be further studies to standardise the terminologies of data quality concepts and to improve the feasibility of task-dependent DQA by researchers.

The limitations of this study include that the participants were recruited from four hospitals in Beijing and that the generalisability of the findings may be limited because of the small sample size. There is also a possibility that we had continued to interview and other themes may have emerged. However, all of the research team felt that no new information was forthcoming and that saturation had been achieved with the nineteen participants. All the interviews were conducted by the first and second authors of our study. Their cultural background and experience may have influenced data collection and analysis, but they were experienced in conducting qualitative research and a meeting was held to discuss and ensure the rigour of interpretation. We believe that these findings can contribute to an increased understanding of barriers and facilitators of that influence data quality of EHR-based clinical research in China.


We have identified various barriers to data quality of EHR-based clinical research in the Chinese context, which may have impacted on the validity and generalisability of research findings. Staff training regarding systematic and complete documentation of clinical data for routine medical services was emphasised as a key factor, which is the basic prerequisite before further enhancing data quality of EHR-based clinical research. To overcome the time-related problems, providing monetary incentives and performing daily data verification were perceived as two important ways. In terms of EHR systems, the improvement of software functionality and coding structures may help to increase data standardisation and interoperability for research purposes. Multidisciplinary cooperation can also improve the quality of clinical research based on EHRs.


We would like to thank all participants for participating in the face-to-face interviews.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.


  • Contributors KN and HC conducted the interviews and analysis and writing of the manuscript. YZ designed the research project. LZ and NL made critical revisions to the paper. All authors reviewed and gave final approval of the version to be submitted.

  • Funding This study was supported by the National Natural Science Foundation of China (grant no.81701067).

  • Disclaimer The funders had no role in the study design, data collection, analysis and decision to publish.

  • Competing interests None declared.

  • Ethics approval The study was approved by the Ethics committee of Peking University Third hospital (No. M2018095).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Study protocol and original data are available on request by emailing the corresponding author.

  • Patient consent for publication Not required.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.