Article Text

Original research
Systematic review of global clinical practice guidelines for neonatal hyperbilirubinemia
  1. Meng Zhang1,2,
  2. Jun Tang1,2,
  3. Yang He1,2,
  4. Wenxing Li1,2,
  5. Zhong Chen1,2,
  6. Tao Xiong1,2,
  7. Yi Qu1,2,
  8. Youping Li3,
  9. Dezhi Mu1,2
  1. 1Department of Pediatrics, Sichuan University West China Second University Hospital, Chengdu, China
  2. 2Key Laboratory of Obstetrics & Gynecologic and Pediatric Diseases and Birth Defects of the Ministry of Education, Sichuan University, Chengdu, China
  3. 3Chinese Evidence-Based Medicine Center, Sichuan University West China Hospital, Chengdu, China
  1. Correspondence to Professor Jun Tang; tj1234753{at}


Objective Hyperbilirubinemia is one of the most common clinical symptoms in newborns. To improve patient outcomes, evidence-based and implementable guidelines are required. However, clinical guidelines may vary in quality, criteria and recommendations among regions and countries. In this study, we aimed to systematically assess the quality of guidelines using the Appraisal of Guidelines for Research & Evaluation (AGREE)-II instrument and summarise the specific recommendations for neonatal hyperbilirubinemia in order to provide suggestions for future guideline development.

Design Systematic review.

Interventions We searched the PubMed, Embase, Medline and guideline databases for relevant articles on 10 April 2020. The studies were screened by two independent reviewers according to our inclusion criteria. Two reviewers independently extracted the descriptive data. Four appraisers assessed the guidelines using the AGREE-II instrument.

Results Our systematic review appraised 12 clinical practice guidelines for the diagnosis and management of neonatal hyperbilirubinemia. The 12 guidelines achieved an average score of 36%–89%. The guidelines received the highest scores for clarity of presentation and lowest scores for rigour of development. Most recommendations for diagnosis were relatively consistent, but recommendations regarding risk factors, the initiating threshold of treatment and pharmacotherapy varied.

Conclusions Our study revealed that current guidelines vary in the quality of the developing process and are inconsistent with regards to recommendations. Future guidelines should afford more attention to the quality of methodologies in guideline development, and more qualified evidence is needed to standardise the initiating threshold of treatment for neonatal hyperbilirubinemia.

  • neonatology
  • developmental neurology & neurodisability
  • protocols & guidelines

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study is the first English systematic appraisal of guidelines targeted to neonates with hyperbilirubinemia.

  • The strengths also included the use of the validated Appraisal of Guidelines for Research & Evaluation (AGREE) II instrument and four independent reviewers to minimise subjective bias.

  • A Chinese-language guideline by the Chinese Pediatric Society was appraised.

  • The AGREE-II was used to evaluate guidelines with less attention on detailed recommendations.

  • We only assessed guidelines through the reported literature without the use of additional methods such as contacting guideline developers.


Neonatal hyperbilirubinemia, characterised by the elevation of total serum bilirubin (TSB), is one of the most common clinical conditions affecting newborns, particularly preterm infants. Hyperbilirubinemia affects approximately 60% of full-term and 80% of preterm neonates.1 Approximately 10% of newborns are likely to develop clinically significant hyperbilirubinemia requiring close monitoring and treatment.2 In the early period (0–6 days), neonatal hyperbilirubinemia accounted for 1309.3 deaths per 100 000 livebirths and was the seventh most common cause of neonatal deaths.3 Effective and timely treatment with phototherapy or exchange transfusion can reduce the occurrence of neurological dysfunction in neonates with hyperbilirubinemia.

Clinical practice guidelines are in place to aid clinical, policy-related and system-related decisions.4 Guidelines have also been developed to bridge the gap between research and clinical practice.5 Therefore, guidelines have become increasingly popular in recent years.6Although several organisations from different regions have developed clinical practice guidelines, these guidelines may vary widely in quality.7 8 Moreover, the criteria for diagnosis and treatment in published guidelines vary among regions and countries.9

The Appraisal of Guidelines for Research & Evaluation (AGREE) instrument is used to assess methodological rigour and transparency of a guideline.10 In this study, we aimed to systematically review and assess the quality of guidelines on neonatal hyperbilirubinemia using the AGREE-II instrument in order to provide suggestions for future guideline development.


Selection criteria

We included clinical practice guidelines produced by local, regional, national or international groups or affiliated governmental organisations for the diagnosis and management of hyperbilirubinemia in newborn infants. The guidelines were included if they met the following criteria: (1) published in English or Chinese language, (2) based on systematic evidence synthesis and containing specific statements to guide decisions regarding hyperbilirubinemia, (3) include recommendations for the diagnosis and/or treatment of neonatal hyperbilirubinemia and (4) published between 2000 and 2020, and only the most recent editions of updated guidelines were considered.

Search strategy

A systematic literature search was performed on 10 April 2020. We searched for relevant studies in the PubMed, Embase and Medline databases. In addition, we searched the Guidelines International Network, National Health Service Evidence website, National Institute for Health and Care Excellence (NICE) website, Scottish Intercollegiate Guidelines Network website, Turning Research Into Practice Database and Wan fang Database. The titles and abstracts of the searched citations were screened by two independent reviewers (MZ and YH). Any discrepancies between the reviewers were resolved by discussion. The detailed search strategy for PubMed is shown in the online supplemental material.

Guideline characteristics

Two independent reviewers (MZ and YH) extracted the general characteristics of the included guidelines: country, founding organisation, year of publication or updating status, method of evidence identification and funding.

Appraisal of guideline quality

Four appraisers (MZ, YH, WL and ZC) independently assessed the selected guidelines using the AGREE-II instrument. The AGREE II is an international, validated and rigorously developed tool to evaluate the quality of clinical practice guidelines and consensus statements.11 The AGREE II consists of 23 key items organised within six domains (scope and purpose, stakeholder involvement, rigour of development, clarity of presentation, applicability and editorial independence) followed by two global rating items (overall assessment). Each domain points to a unique dimension of guideline quality.12 Each of the AGREE II items is rated on a 7-point scale (1=strongly disagree to 7=strongly agree). Domain scores are calculated by summing the scores of the individual items in a domain and by scaling the total as a percentage of the maximum possible score for that domain.12 The score for each domain of each document is calculated as follows: (obtained score−minimal possible score)/(maximal possible score−minimal possible score).10 All reviewers were trained online using the AGREE training tools. Discrepancies of >3 points were discussed in a consensus meeting.


We extracted descriptive data from the guideline recommendations to identify the consistencies and discrepancies. The recommendations were then summarised according to different items related to the diagnosis and treatment strategies of neonatal hyperbilirubinemia such as the test used for the early prediction and diagnosis, time to start phototherapy and exchange transfusion, recommendation for drug use, criterion for discharge and timing or frequency of follow-up. The intraclass correlation coefficients for the six domains were calculated to assess the reliability of the scores between investigators. The analysis of the reliability study was performed using SPSS V.24.0.

Patient and public involvement

No patient involved.


Search results

Figure 1 illustrates the search and guideline selection process. The systematic search retrieved 725 records, of which 701 were excluded after removing duplicates and articles that did not meet the eligibility criteria. Consequently, after the full-text evaluation of the remaining records, 12 additional clinical practice guidelines were excluded for the following reasons: not written in English or Chinese, not original guidelines and not clinical practice guidelines or consensuses. Ultimately, we included 12 clinical practice guidelines from 12 different national or regional organisations.

Figure 1

Study selection diagram. GIN, Guidelines International Network; NHS, NationalHealth Service; NICE, National Institute for Health and Care Excellence; TRIP, TurningResearch Into Practice Database.

General characteristics of the guidelines

Tables 1 and 2 summarise the general characteristics of the included clinical practice guidelines. Twelve clinical practice guideline documents were published by national or regional organisations, including the American Academy of Pediatrics (AAP) Subcommittee on Hyperbilirubinemia,13 Canadian Pediatric Society (CPS) Fetus and Newborn Committee,14 Chinese Pediatric Society (ChPS) Chinese medical Association,15 Israel Neonatal Society (INS),16 Italian Society of Neonatology (ISN),17 Malaysia Health Technology Assessment Section (MaHTAS),18 NICE in the UK,19 Norwegian Pediatric Association,20 Queensland Clinical Guidelines (QCG) in Australia,21 Spanish Association of Pediatrics (SAP),22 Swiss Society of Neonatology (SSN)23 and Turkish Pediatric Association (TPA).24 Five of these guidelines are new and the others have been updated or reaffirmed. Four guidelines from the USA,13 Canada,14 Italy17 and Switzerland23 were targeted towards neonates born at >35 weeks of gestation, while the other guidelines covered all preterm and term babies. Six organisations (QCG,21 CPS,14 SAP,22 NICE,19 INS16 and MaHTAS18 reported performing a systematic review and appraisal of the evidence and were explicit about the level of evidence that underpinned their recommendations. Three groups were funded by governmental institutions (QCG,21 NICE19 and MaHTAS),18 one declared no financial support (TPA),24 and the remainder did not disclose a funding source.

Table 1

General characteristics

Table 2

General characteristics

Appraisal of guidelines

Table 3 shows the scores for each guideline for the six domains of the AGREE II instrument. The overall quality of the guideline development process varied widely both among guidance documents and within guidance documents among different domains. The average score was 36.3%–89.3%. Most guidelines achieved average scores of <50% in four of the six domains, and only two received an average score of >50%. The highest scores were achieved in the domains of clarity of presentation and the lowest scores were achieved for rigour of development.

Table 3

Domain scores of the nine guidelines assessed by using the AGREE-II instrument (%)

Domain 1: the mean score for scope and purpose was 88.8%±6.5% and the MaHTAS18 guideline achieved the highest score at 98.6%. Domain 2: the mean stakeholder involvement score was 47.6%±22.4% and ChPS15 received the lowest score at 9.7%. Domain 3: the mean score for rigour of development was 31.9%±22.6%. NICE19 scored the highest for this domain at 85.9% with the most extensive development process, while TPA24 received the lowest at only 9.9%. Domain 4: the mean score for clarity of presentation was 91.7%±5.7%. For this domain, most of the guidelines obtained a score of >90%. Domain 5: the mean score for applicability was 43.0%±18.9%, with five guidelines scoring <30%. Domain 6: the mean score for editorial independence was 36.8%±36.1%, and four guidelines obtained scores of 0% for this domain. In terms of overall quality, 50% of the guidelines received an average score of >50%. The NICE19 guidelines received the highest score at 89.3%±5.7%.

Table 4 shows the intraclass correlation coefficients, 95% CIs, and p values for each domain between the four evaluators. The intraclass correlation coefficients ranged from 0.818 to 0.995.

Table 4

Inter rater reliability study results

Clinical guideline recommendations

Nine guidelines covered risk factors for severe neonatal hyperbilirubinemia, including maternal and neonatal risk factors. All guidance documents provided recommendations for diagnosis. Tables 5 and 6 show the main risk factors and some example diagnostic strategies for neonatal hyperbilirubinemia. The guidelines differed somewhat in their report of risk factors. Nearly all guidelines reported prematurity, exclusive breastfeeding and glucose-6-phosphate dehydrogenase (G6PD) deficiency as neonatal risk factors. Cephalohematoma or bruises and male sex were also defined as neonatal risk factors in some guidelines, while NICE25 stated that the evidence was inconclusive and that the results of most studies revealed no significant association between these factors and hyperbilirubinemia.

Table 5

Summary of risk factors of severe neonatal jaundice

Table 6

Summary of recommendations for approaches to diagnosis of neonatal hyperbilirubinemia

Visual assessment was recommended as a first step in diagnosis by most organisations, and the guideline of Malaysia18 specifically mentioned that Kramer’s rule could be widely practiced. All guidelines advocated TSB measurement as the gold standard for detecting and determining the level of hyperbilirubinemia. Non-invasive methods such as a transcutaneous bilirubinometer are accepted by all guidelines. Other methods of detection such icterometers were not recommended by NICE19 and MaHTAS18 because there was no good quality evidence to indicate their reliability. In addition, nearly all guidelines recommended additional laboratory tests for babies with prolonged hyperbilirubinemia that could be of value to evaluate and identify the underlying disease. These tests included complete blood counts, blood group compatibility, a direct antiglobulin test, septic workup, urinalysis, urine culture, thyroid functions, G6PD, reticulocyte count and conjugated component of bilirubin.

Table 7 shows the recommendations for the management of neonatal hyperbilirubinemia. The key areas included the initiating threshold and details of different types of therapies and care for babies during therapy. The guidelines distinguished treatment scenarios based on the level of hyperbilirubinemia, including phototherapy, exchange transfusion and pharmacotherapy.

Table 7

Summary of recommendations for approaches to treatment of neonatal hyperbilirubinemia

All guidelines discussed the threshold of phototherapy and exchange transfusion, and most of the organisations divided patients into groups according to gestational age and risk factors. As an example, we reported the detailed initiation TSB levels for full-term neonates according to the presence and absence of risk factors in table 7, finding that there were few differences among the guidelines regarding the initiation of TSB levels. The majority of the guidelines proposed a number of general care strategies during phototherapy, such as temperature measurement, eye protection and continued breastfeeding. Among other forms of phototherapy, home phototherapy was recommended by AAP13 and MaHTAS,18 while sunlight exposure was not supported by four organisations (AAP,13 NICE,19 QCG,21 SAP).22 Moreover, seven guidelines mentioned the complications of phototherapy.

The threshold for initiating exchange transfusion was higher than that for phototherapy in all risk groups. Potential signs of acute bilirubin encephalopathy were highlighted as important in all guidelines. Most guidelines reported the details of performing exchange transfusion such as the blood product and blood volume. Double-volume exchange transfusion was advocated by the majority of guidelines. Furthermore, observations during exchange transfusion including heart rate, blood pressure, respiratory rate, oxygen saturation and skin temperature were only proposed by three organisations (MaHTAS,18 ChPS15 and ISN).17 After the exchange transfusion, seven guidelines recommended maintaining intensive phototherapy and six suggested monitoring the TSB at varied time points. Pharmacotherapy was also mentioned by 10 guidelines. However, the recommendation of medication varied greatly.

Most of the guidelines discussed follow-up after discharge, and some provided different follow-up time recommendations according to the time of discharge and risk factors. In addition, some guidelines focused on the follow-up of children with severe hyperbilirubinemia. The CPS14 guidelines recommend that the hearing screen of patients with severe hyperbilirubinemia should include brainstem auditory evoked potentials. The MaHTAS18 guideline reported that term and late preterm babies with TSB of >20 mg/dL or exchange transfusions should have auditory brainstem response (ABR) testing performed within the first 3 months of life. If the ABR is abnormal, neurodevelopmental follow-up should be continued. The ABR test was also recommended by the Turkish guidelines for babies with hyperbilirubinemia requiring treatment. Moreover, two of the guidelines (SSN23 and ISN)17 mentioned the national institute for monitoring the incidence of kernicterus and severe hyperbilirubinemia.


This systematic review appraised 12 clinical practice guidelines for the diagnosis and management of neonatal hyperbilirubinemia. The quality of the guidelines was highly variable. The included guidelines received acceptable AGREE II scores in the domains of clarity of presentation and scope and purpose, but the mean scores were moderate or low in the stakeholder involvement, rigour of development, applicability and editorial independence domains. This finding was similar to that of the 2010 review by Alonso-Coello et al.26 In recent years, although the number of guidelines has increased, the quality of guidelines still needs to be improved.

As evaluated by the AGREE II instrument, most guidelines had good clarity regarding their objective, clinical questions and scope. Further, as the AGREE II revealed in the stakeholder involvement domain, many guideline development groups represented a variety of relevant professional areas.12 It is valuable to explore the views of the target population, that is, healthcare providers or the parents of neonates with hyperbilirubinemia. However, although some guidelines targeted healthcare providers and parents, almost all development groups ignored the preferences of parents of the hyperbilirubinemia neonates.

The mean score of the rigour of development domain, which was considered the indicator of quality in all domains,27 varied significantly among different guidelines. Guidelines typically received low scores in this domain because of poor reporting of systematic methods for searching for evidence and formulating recommendations, lack of external review and updating mechanisms. Some guidelines, such as NICE,19 provided detailed search strategies, evidence tables and reasons for excluded studies to confirm their systematic methods, while some guidelines did not provide complete information regarding methods of searching and selecting evidence. Muka et al28 provided a 24-step guide on how to perform a systematic review and meta-analysis in 2020. The guide described the most important 24 steps, such as defining the search strategy, designing the data collection form, checking reporting bias and so on. We suggest that these methodologically sound tools should be used to help future guideline designers conduct or appraise systematic reviews. Guidelines need to reflect current research, but most of the guidelines did not provide a statement about the procedure for updating. Alonso-Coello et al29 conducted an international survey of the updating practices of guidelines in 2011 and concluded that there was an urgent need to develop rigorous international standards for the updating process.

The clarity of presentation of the recommendations was specific and unambiguous in most guidelines. The scores of the applicability domain were highly reflective of the implementation of guidelines. Additional materials, including summary documents and educational tools, could be beneficial in this respect. However, >50% of the included guidelines did not discuss facilitators and barriers to their application or tools for practicing; thus, the guidelines might have a limited effect.30 Therefore, future guideline developers should afford greater consideration to the potential resource implications and facilitators of application, particularly for guidelines published in developing regions. Regarding the editorial independence domain, the views of the funding body and interests of the developers should be reported as part of the standard practice of guideline development.

In this study, we also summarised and compared the specific recommendations for the diagnosis and treatment of neonatal hyperbilirubinemia. All guidelines covered the threshold of phototherapy and exchange transfusion, while most of the guidelines stated that the threshold graph was reproduced and adapted with permission from the AAP.13 However, the AAP noted that the suggested levels represented a consensus of committee but were based on limited evidence, and the levels shown were approximations.13 Therefore, more qualified studies of different populations are needed to standardise treatment methods. In terms of pharmacotherapy, variations also existed among different guidelines. The discrepancies were mainly due to varying evidence quality, limitations in generalisability and lack of approval by a national administration.

The burden of hyperbilirubinemia is highest in South Asia and sub-Saharan Africa.2 Hyperbilirubinemia is the 7th leading cause of neonatal mortality in South Asia, 8th in sub-Saharan Africa, 9th in western Europe and 13th in North America.2 In our review, we appraised five guidelines from Europe with a mean score of 55.9%, four guidelines from Asian countries with mean scores of 55.2% and two guidelines from North America with mean scores of 50.6%. In 2015, Olusanya et al31 provided a practical framework for the management of late-preterm and term infants (≥35 weeks of gestation) with clinically significant hyperbilirubinemia in low-income and middle-income countries lacking local practice guidelines. They provided recommendations for comprehensive management, including primary prevention, early detection, diagnosis, monitoring, treatment and follow-up.31

To our knowledge, our study is the first systematic critical appraisal of guidelines with diagnostic and treatment recommendations targeted to neonates with hyperbilirubinemia. The strengths of our review include the integration of comprehensive search strategies, use of the validated AGREE II instrument and use of four independent reviewers to minimise subjective bias. Further, in addition to guidelines written in English, a Chinese-language guideline by the Chinese Pediatric Society was appraised in our study. As a representative of developing countries, the inclusion of Chinese-language guidelines may minimise the overestimation of the quality of guidelines to some degree.

However, there were several possible limitations to our study. First, guidelines written entirely in languages other than English and Chinese might have been overlooked. Second, the AGREE-II was used to evaluate guidelines with less attention on detailed recommendations. Although it is thought that a global appraisal of a guideline’s developing process may reflect the strength of recommendations,9 the quality of specific recommendations has a direct influence on practice. Finally, we only assessed guidelines through the reported literature without the use of additional methods such as contacting guideline developers to obtain further clarification. This may have underestimated the systematic methods of guideline development by organisations.


Our study evaluated the quality of methodologies and rigorous strategies in the guideline development process and summarised the recommendations on the diagnosis and treatment of neonatal hyperbilirubinemia. The results revealed that current guidelines varied in the quality of the development process and were inconsistent in their recommendations, despite some similarities. Therefore, future guidelines should afford greater attention to the quality of methodologies in the guideline development process, and more qualified evidence is needed to standardise the initiating threshold of treatment for neonatal hyperbilirubinemia.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Correction notice This article has been corrected since it first published. The provenance and peer review statement has been included.

  • Contributors MZ conceptualised and designed the study, screened the titles and abstracts of searched citations, extracted general characteristics and descriptive data from guideline recommendations, assessed the selected guidelines using the AGREE-II instrument and drafted the initial manuscript. JT conceptualised and designed the study, coordinated and supervised guideline assessment, and critically reviewed the manuscript for important intellectual content. YH screened the titles and abstracts of searched citations, extracted general characteristics and descriptive data from guideline recommendations, assessed the selected guidelines using the AGREE-II instrument and revised the manuscript. WL and ZC assessed the selected guidelines using the AGREE-II instrument, reviewed and revised the manuscript. TX, YQ, YL and DM coordinated and supervised guideline assessment, and critically reviewed the manuscript for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

  • Funding This work was supported by the National Science Foundation of China (Numbers 81630038, 81971433), the grant from Ministry of Education of China (IRT0935), the grant of clinical discipline program (Neonatology) from the Ministry of Health of China (1311200003303) and the grants from the Science and Technology Bureau of Sichuan Province (2020YJ0236, 2020YFS0041).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.