Objective We aimed to evaluate the validity of an algorithm to classify diagnoses according to the appropriateness of outpatient antibiotic use in the context of Chinese free text.
Setting and participants A random sample of 10 000 outpatient visits was selected between January and April 2018 from a national database for monitoring rational use of drugs, which included data from 194 secondary and tertiary hospitals in China.
Research design Diagnoses for outpatient visits were classified as tier 1 if associated with at least one condition that ‘always’ justified antibiotic use; as tier 2 if associated with at least one condition that only ‘sometimes’ justified antibiotic use but no conditions that ‘always’ justified antibiotic use; or as tier 3 if associated with only conditions that never justified antibiotic use, using a tier-fashion method and regular expression (RE)-based algorithm.
Measures Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the classification algorithm, using classification made by chart review as the standard reference, were calculated.
Results The sensitivities of the algorithm for classifying tier 1, tier 2 and tier 3 diagnoses were 98.2% (95% CI 96.4% to 99.3%), 98.4% (95% CI 97.6% to 99.1%) and 100.0% (95% CI 100.0% to 100.0%), respectively. The specificities were 100.0% (95% CI 100.0% to 100.0%), 100.0% (95% CI 99.9% to 100.0%) and 98.6% (95% CI 97.9% to 99.1%), respectively. The PPVs for classifying tier 1, tier 2 and tier 3 diagnoses were 100.0% (95% CI 99.1% to 100.0%), 99.7% (95% CI 99.2% to 99.9%) and 99.7% (95% CI 99.6% to 99.8%), respectively. The NPVs were 99.9% (95% CI 99.8% to 100.0%), 99.8% (95% CI 99.7% to 99.9%) and 100.0% (95% CI 99.8% to 100.0%), respectively.
Conclusions The RE-based classification algorithm in the context of Chinese free text had sufficiently high validity for further evaluating the appropriateness of outpatient antibiotic prescribing.
- electronic health records
- drug utilisation
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
This is the first study to establish rules for evaluating appropriateness of outpatient antibiotic prescriptions by using Chinese diagnosis text.
The rule-based and tier-fashion algorithm for classifying diagnoses according to whether antibiotic was indicated or not had high validity.
The algorithm provides a feasible method to use big electronic medical records or claim data to evaluate the appropriateness of antibiotic use in China.
Certain synonyms, abbreviations or acronyms of medical terms may be omitted thus cannot be detected by the algorithm.
Overuse of antibiotics and the consequential antimicrobial resistance (AMR) have been serious threats to public health worldwide.1 China is one of the countries with the highest antibiotic consumption and hence, high prevalence of AMR in the world.2 3 Reducing unnecessary and inappropriate use of antibiotics is essential to reduce both AMR and adverse drug reactions,4 5 which cause a large number of deaths and economic losses every year.4 6 7 In China, the evaluation of the appropriateness of antibiotic use has occurred mainly thorough manual prescription review, which is time consuming and, hence, it is not feasible for evaluating large-scale prescription data. The rapid implementation of the electronic medical records (EMRs) across the whole country has enabled the routine collection of medical data available for research purposes.8 9 However, a huge amount of these data, especially diagnosis information, is in the form of Chinese free text, which makes it difficult for evaluating the appropriateness of antibiotic use.
Previous studies using big EMR or claim data for the evaluation of appropriate antibiotic prescribing applied an innovative tier-fashion method to classify and assign diagnosis to the outpatient visits based on whether antibiotic is indicated for treatment or not.4 10 11 The key aspect of this method is to classify all diseases into three tiers corresponding to whether the disease always, sometimes or never justifies antibiotic use4 5 11; that is, tier 1 diagnoses are diseases for which antibiotics are almost always indicated, such as pneumonia and specific bacterial infections; tier 2 diagnoses are diseases for which antibiotics are only sometimes indicated, such as sinusitis and pharyngitis; finally, tier 3 diagnoses are all the other diseases for which antibiotics are not indicated or the indication is unclear, such as influenza and cancer. However, most of these studies are conducted in the USA and the UK where 20%–30% of outpatient antibiotics are estimated to be used inappropriately,4 10 11 with little reliable evidence from China and other developing countries where antibiotic use is high.2 12
The aim of this study was to establish and validate an regular expression (RE)-based algorithm for extracting and classifying outpatient diagnosis from the Chinese free text using the tier-fashion method mentioned above.
We used a national database for monitoring the rational use of drugs. Online supplementary appendix 1 describes the recruitment process of the hospitals and the representativeness of the database. In total, there were 194 hospitals from 128 cities of 31 provinces, autonomous regions or municipalities in China. The database consisted of both outpatients’ and inpatients’ information of demographic characteristics, prescriptions, item costs and diagnoses from EMR of sample hospitals between October 2014 and April 2018. The prescription for each outpatient visit consisted of three parts, which were recorded in different tables in the database. The preface mainly included basic information of the patients, such as gender and age, and diagnosis; the main body consisted of a list of drug information, including drug name, dosage and usage; and the postscript included other information such as doctor’s signature. All information generated during the same visit could be linked by a unique identifier consisting of the hospital code, patient identification number and the date of visit. Chemical drugs including antibiotics were coded according to the Anatomical Therapeutic Chemical classification system.13 Outpatient diagnosis in the database was in the form of Chinese narrative free text. Several diagnoses could be written together and separated by punctuation marks. We treated multiple prescriptions and diagnoses from the same patient on the same day in the same hospital as one visit. Thus, multiple diagnose and drugs could be linked to the same visit. At present, there were 239 million outpatient visits and 170 million diagnosis records. Fifty-five hospitals did not submit diagnosis records and 88% prescriptions from the remaining 139 hospitals could be linked to at least one valid diagnosis. This proportion was comparable to previous studies.5 11 In this study, antibiotics for systemic use were evaluated (see online supplementry appendix 2 for the list of antibiotics in the database).
Outpatient diagnoses were processed in three steps (see figures 1 and 2 in online supplementry appendix 3): standard tiers classifying, REs establishing, and dictionary and pattern mapping.
Standard tiers classifying
In the first step, based on the standard description and classification of the International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) in 2016 Chinese version,14 we first classified all diseases into three tiers according to whether antibiotics are indicated or not: tier 1 if the disease almost always justifies use of antibiotics, such as pneumonia; tier 2 if the disease only sometimes justifies use of antibiotics, such as sinusitis; or tier 3 if the disease almost never justifies use of antibiotics, such as cancer. Two groups of researchers worked on this independently and in parallel, the first group classified the standard diagnoses according to the list of categories established in previously published researches.4 5 11 The other group consisting of one clinician and one pharmacist performed the same classification based on their clinical knowledge and experience. If there were any conflicts between the results of the two groups, recommendations for antibiotic use from the Guidelines for Clinical Application of Antibiotics15 and UpToDate16 were referenced and the final tiers of the diagnoses were determined. Online supplementry appendix 4 gives the basic rules that we used to classify the diagnoses. Then, the primary list of the tiers of diagnoses was reviewed by an expert of respiratory disease, an expert of infectious disease and an experienced pharmacist in three of the top hospitals in Beijing. Conflicts of the reviewers were discussed and the primary list was modified, if necessary. Finally, we classified 34 337 ICD codes, among which 2465 (7.2%) were classified as tier 1, 2608 (7.6%) were classified as tier 2 and 29 264 (85.2%) were classified as tier 3. The final list of the standard tiers of diagnoses (LiSToD) is available from https://www.researchgate.net/publication/336286590_LiSToD_The_list_of_the_standard_tiers_of_diagnoses_based_on_China-ICD-10. Online supplementry appendix 5 gives a comparison with the classification schemes from previous studies. Furthermore, we set up more diagnostic subcategories according the ICD-10 chapters and previous studies,4 5 17 18 and finally all diagnoses were classified into 15, 21 and 13 different categories of tier 1, tier 2 and tier 3 diagnoses, respectively (see online supplementary appendix 6 for more details).
In the second step, we first established the dictionary and patterns of the clinical terms in the LiSToD. The clinical terms were derived from the following sources: (1) code string descriptions of Chinese version of ICD-10 and corresponding synonyms or abbreviations; (2) pilot string searches for the diagnoses in the LiSToD; (3) clinicians’ suggestions about the common abbreviations (in Chinese or in English) of infectious diseases. Similar patterns or key words of different descriptions of the same condition in the raw data were identified and used to establish the REs which can be used for information extraction. For example, Helicobacter pylori infection coded as A49.809, standard Chinese name in the ICD-10 system would be ‘幽门螺旋杆菌感染’, but it may be written in the raw diagnosis text as ‘幽门螺旋杆菌’, ‘幽门螺杆菌’, ‘幽门螺旋菌’, ‘幽门杆菌’, ‘幽门螺菌’, or even ‘幽门螺旋杆’, ‘幽门螺杆’ or in English abbreviation ‘HP’, ‘Hp’, ‘hP’ or ‘hp’. Thus, the REs for matching the diagnoses of H. pylori infection after converting all the letters to the uppercase would be ‘(幽门螺?旋?杆?菌)|(幽门螺旋?杆)’ and ‘([ˆA-Z]HP[ˆA-Z])|([ˆA-Z]HP$)|(ˆHP[ˆA-Z])|(ˆHP$)’ (this subexpression means that the abbreviation of ‘HP’ cannot be prefixed or suffixed by any other letter). Finally, we established the lists of REs for diagnoses (REoD) in the LiSToD. Online supplementry appendix 7 gives more details of the rules we used for constructing REs.
Dictionary and pattern mapping
In the third step (figure 2 in online supplementry appendix 3), first, we did the preprocessing of the diagnosis text. All the punctuations (except question marks and short dashes which may indicate the uncertainty and negation) and blanks were converted to semicolons, and all the English letters were converted to single byte, uppercase ones. Then raw diagnosis text was cut into segments of single disease. Finally, the first five diagnoses were extracted and used for mapping the clinical terms in the LiSToD based on the REoD.
Initially, we tried to identify the tier 1 diagnoses from each of the first five diagnoses. For accurate extraction of the clinical information, modification information was further tried to detect from the identified tier 1 diagnoses, as the underlying meaning of the free diagnosis text is significantly affected by other co-occurring concepts.19 We detected three kinds of modifiers: (1) negation modifiers, such as ‘排除’ (except or exclude), ‘除外’ (except or exclude), ‘阴性’ (negative), ‘非’ (non or not) and the short dash ‘-’; (2) temporal information, such as ‘复查’ (re-examination), ‘复诊’ (revisit); (3) uncertainty modifiers, which indicated that the event may not have actually occurred, such as ‘待排除’ (to be excluded), ‘待查’ (unknown origin or to be examined), ‘可能’ (maybe or likely), ‘不能排除’ (cannot be excluded), ‘咨询’ (for consultation or consulting) and the question mark ‘?’. If negation modifiers were detected, the tier 1 diagnosis was changed to tier 3; while if the temporal or uncertainty modifiers were detected, the tier of the diagnosis remained unchanged with the addition of a marker indicating uncertainty to the diagnosis. Further, if no tier 1 diagnosis or only negative tier 1 diagnosis was identified, we tried to identify tier 2 diagnosis. Modification information was also further detected for classified tier 2 diagnosis. Negative tier 2 diagnosis was changed to tier 3 or uncertainty marker was added when the information of uncertainty was detected. If no tier 1 or tier 2 diagnosis, or only negative tier 1 or tier 2 diagnosis was identified, then the diagnosis was classified as tier 3.
After all the first five diagnoses were classified as tier 1, tier 2 or tier 3, with or without uncertainty, the tier-fashion method was applied to assign a single diagnosis to each visit. This means that, for multiple diagnoses in the same visit, priority was given to tier 1 diagnosis without uncertainty marker (tier 1A), followed by tier 1 diagnosis with uncertainty marker (tier 1B), then tier 2 diagnosis without uncertainty marker (tier 2A), then tier 2 diagnosis with uncertainty marker (tier 2B), and finally, tier 3 diagnosis. If multiple diagnoses from a single tier exist in the visit, the first-listed certain diagnosis was assigned.
All the procedures above have been encapsulated into a PL/SQL package, which is accessible in the online supplementry appendix 8.
Reference standard and validation
We selected a random sample of 10 000 outpatient visits that could be linked with diagnosis records during a 4-month period of 1 January 2018 to 30 April 2018 in the database. The sample data in this study had a good representativeness of the entire database (online supplementry appendix 9). Prescription review was used as the reference standard for classification validation against which the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for classifying different tiers of diagnosis were calculated. Two researchers who were trained with the classification scheme and blinded to drug exposure status independently reviewed the raw prescription and classified the diagnoses into tier 1, tier 2, or tier 3 according to the LiSToD, with or without uncertainty. Conflicting results of the two researchers were further discussed with a third clinician and the final tier of the diagnosis was determined.
We calculated the sensitivity, specificity, PPV and NPV for each tier of diagnosis and the Clopper-Pearson Exact method was used for calculating the 95% CIs. For tier X diagnosis (X was 1, 2, 3, or 1A, 1B, 2A, 2B), the sensitivity was the proportion of true tier X diagnosis that was classified as tier X by the RE algorithm; the specificity was the proportion of true non-tier X that was classified as non-tier X by the RE algorithm; the PPV was the proportion of tier X classified by the RE algorithm that was true tier X diagnosis; the NPV was the proportion of non-tier X classified by the RE algorithm that was true non-tier X diagnosis. The information extraction, and mapping the diagnosis text to the LiSToD using REoD were performed by using Oracle 11gR2 and PL/SQL developer V.11.0 (Oracle Corp., Redwood Shores, California, USA). Statistical analyses were performed by using SAS V.9.4 (SAS Institute).
Patient and public involvement
Patients and the public were not involved in the design or conduct of the study.
Out of the 10 000 outpatient visits, 91.1% occurred in outpatient clinics, and 92.9% visits occurred in tertiary-level hospitals. Further, 90.5% of the patients were adults and 51.0% were women. Most outpatient visits were from the Eastern and Western regions, which accounted for 54.8% and 30.2% of all the visits (table 1). One thousand and ninety-two visits (10.9%) ended with antibiotic prescription and 1417 individual antibiotics were prescribed. The most commonly used antibiotics in the dataset were cefdinir, azithromycin, levofloxacin, cefixime and moxifloxacin. Then, 9984 of the 10 000 visits could be linked to valid diagnoses, and the diagnoses of 16 visits were pure numbers or punctuation that did not contain useful information. Among the 10 000 visits, 66% had just one diagnosis, over 85% had less than 2 diagnoses, and only 2.2% had more than 5 diagnoses. Among all diagnosis records, the median of the length of diagnosis text was 6 (IQR 4–10) characters; 73.1% contained less than 10 characters.
Among the 9984 visits with valid diagnoses, 3.9% (n=390), 11.5% (n=1144) and 84.6% (n=8450) were, respectively, classified as tier 1, tier 2, and tier 3 diagnosis (table 2). The sensitivities of the RE algorithm were 98.2% (95% CI 96.4% to 99.3%), 98.4% (95% CI 97.6% to 99.1%) and 100.0% (95% CI 100.0% to 100.0%) for classifying tier 1, tier 2 and tier 3 diagnoses, respectively. The specificities were 100.0% (95% CI 100.0% to 100.0%), 100.0% (95% CI 99.9% to 100.0%) and 98.6% (95% CI 97.9% to 99.1%). The PPVs for classifying tier 1, tier 2 and tier 3 diagnoses were 100.0% (95% CI 99.1% to 100.0%), 99.7% (95% CI 99.2% to 99.9%) and 99.7% (95% CI 99.6% to 99.8%), respectively. The NPVs were 99.9% (95% CI 99.8% to 100.0%), 99.8% (95% CI 99.7% to 99.9%) and 100.0% (95% CI 99.8% to 100.0%), respectively, for tier 1, tier 2 and tier 3 diagnoses (table 3). In addition, the RE-based algorithm had sufficiently high accuracy to detect the diagnosis modifiers, with all sensitivities, specificities, PPVs and NPVs for classifying diagnoses with uncertainty (tier 1B and tier 2B diagnoses) approaching or exceeding 95%. Online supplementry appendix 10 gives more detailed results of all the 49 subcategories under different diagnosis tiers.
An analysis of the errors in precision was performed (table 4). In total, diagnoses from 30 visits were inaccurately classified, among these, 17 were because the infectious disease were written after the 5th diagnosis, 12 were due to inappropriate writing of the diagnoses (where single diagnosis was improperly split or multiple diagnoses were incorrectly concatenated together); the last inaccuracy was due to the use of traditional Chinese, which was not considered when constructing REs.
We developed a rule-based approach with high validity which could be used for identifying whether the use of antibiotics in outpatient prescriptions is appropriate in the Chinese context. Our findings indicated that the sensitivities, specificities, PPVs and NPVs of the algorithm were all over 98% for classifying tier 1, tier 2 and tier 3 diagnosis according to whether antibiotics were indicated by using the previously proposed method.4 5 11 To our knowledge, there is no current estimate of the appropriateness of outpatient antibiotic prescribing at the national level in China. This paper presents an approach to extract structured diagnosis information that represents the antibiotic usage to support subsequent pharmacoepidemiology studies, rather than depending on the manual review of prescriptions, which is very time consuming and involves high cost of labour. In addition, this approach can provide a method to evaluate antibiotic use for different categories of diseases by using large EMRs and administrative data.
The study is the first to use rule-based natural language processing (NLP) to establish the classification system for evaluating inappropriate prescribing of antibiotics using Chinese diagnosis text. NLP has been used for over 30 years to identify key clinical information from unstructured and semistructured text.19–21 As the amount of EMRs and administrative data is increasing rapidly and most of the information is stored in unprocessed and heterogeneous textual formats, NLP plays a crucial role in this context to transform narrative medical text into structured data.22 23 The main approaches to NLP are rule based,24 machine learning25 or hybrid approaches.8 26 Machine learning can offer easier portability solutions, while rule-based methods tend to provide reliable results.24 NLP has been used for extracting and standardising information on drugs,9 27 for automatic detection of diseases,28 29 and for deidentification of protected health information.8 However, as most of the NLP researches have been performed in English, similar researches in Chinese are relatively limited.30–32
The primary reason for incorrect classification using our algorithm was that diagnoses of infectious diseases were not written or inputted in a priority order. In this study, 3 tier 1 and 11 tier 2 diagnoses were misclassified as tier 3 because the diagnoses of the infections were written after the fifth one. The Prescription Administrative Policy33 and the Regulations on Prescription Review Management34 issued by the Ministry of Health of China (now the National Health Commission) stipulate that no more than five kinds of drugs can be prescribed in the same prescription, and thus logically, if a clinician wants to prescribe antibiotics, he/she should mention the infectious diseases in the first five diagnoses; otherwise, the prescription will be considered as irrational. In addition, clinicians tend to write the primary diagnosis in a priority order and other diagnoses were written as complications or comorbidities. This was reflected in our research as nearly 90% of tier 1 and tier 2 diagnoses were in the first two diagnoses. Thus, these 14 misclassifications due to lack of prioritisation of infectious diagnosis writing may actually not be misclassified and if we assign single diagnosis to a visit based on more than 5 diagnoses, the risk of overdiagnosing may arise.
Another kind of errors that occurred was due to structural ambiguity of diagnosis writing. There were cases where the diagnosis text could be interpreted in contrary ways, since single diagnosis was improperly split, or multiple diagnoses belonging to different tiers were incorrectly concatenated together. This resulted in incorrect word segmentation. In the first case, parts of key information was cut into another independent diagnosis, which may contain the opposite meaning, resulting in mapping of incomplete information and misclassification. In the other case, since we performed the classification using a step-by-step process, in the first mapping process, the tier 1 or tier 2 diagnosis was detected, while in the second process involving detection of modifiers or lower tier diagnosis, negation was detected or another tier was mapped due to the coexistence of contrary information.
Typo and traditional Chinese created another category of errors that may led to misclassification but this was very rare in this study. As there are many homophonic and homomorphic words in Chinese language, especially homophonics, for example, ‘幽门’ (Chinese pronunciation ‘youmen’), which means pylorus in the diagnosis of H. pylori, may be inputted as ‘油门’ (Chinese pronunciation ‘youmen’) which means accelerator and has nothing to do with disease. Thus, typos and traditional Chinese characters are likely to occur in a larger amount of diagnosis text, and it may depend on which type of input method is used by clinicians since Chinese characters can be inputted thorough pronunciation-based (the Chinese Pinyin) and glyph-based (the five stroke) input method, making this kind of error difficult for the rule-based method to detect.
Strengths and limitations
Our study had some strengths. The rule-based and tier-fashion algorithm that we used provided a feasible and validated method to evaluate the appropriateness of antibiotic prescribing by using big EMRs. It can process diagnosis extremely rapidly compared with manual prescription review, with less than 30 s needed to classify all the sample diagnoses. The algorithm was effective and had good extensibility and portability, as it is easy to add new REs or remove old ones from the REoD we developed. Since REs are well supported by most of the common database management system and statistical software, it is easy for other researchers to reuse our algorithm for conducting similar research using other kinds of data written in Chinese text.
However, there were limitations in our study. Building rule-based systems is often time consuming.20 In this study, the first two steps for establishing the LiSToD and REoD took us several months. The validity of our method depended heavily on whether the LiSToD included all the bacterial infections, and whether the REoD contained all possible patterns of diseases in the LiSToD. Medicine is a large and complex domain with rich synonyms and semantically similar and related concepts.23 In addition, medical abbreviations and acronyms are common and can also be ambiguous, making it difficult to identify them.20 Although we applied multiple strategies to make the list of synonyms, abbreviations and similar patterns of infectious disease as complete as possible, some variations of standard concepts in the ICD-10 may still be possible to not be found. Since we can only extract features through the manually constructed word list, some omissions may occur in a much larger diagnosis data.
In conclusion, to our knowledge, our study is the first to use the rule-based algorithm to establish the classification system for evaluating inappropriate prescribing of antibiotics using Chinese diagnosis text. Further studies focusing on antibiotics in China can apply this validated algorithm to evaluate the appropriateness of antibiotic use by using big EMRs or administrative data.
Contributors HZ and SZ developed the research question. HZ conducted the data analyses and drafted the manuscript. JB and MZ collected, cleaned and managed the data. SZ, JB and MZ are responsible for the integrity of the data. SZ, JB, LW and MZ reviewed and edited the manuscript. HZ, XY, YY and ZZ established the primary list of the standard tiers of diagnoses. JB, LL and BC reviewed the list of the standard tiers of diagnoses. HZ, XY and LZ constructed the regular expressions for mapping the diagnosis text. HZ, XY and MZ reviewed and evaluate the raw prescription for appropriateness of antibiotic prescribing. All authors read and approved the final manuscript.
Funding This study was supported by the National Natural Science Foundation of China (Grant numbers 81973146, 81473067 and 91646107).
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Patient consent for publication Not required.
Ethics approval This study was approved by the Ethical Review Board of Peking University Health Science Centre (approval number: IRB00001052-18013-Exempt).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data of prescriptions are available upon reasonable request. All other data relevant to the study are included in the article or uploaded as supplementary information.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.