Objectives To appraise the quality of guidelines developed by the International Federation of Red Cross and Red Crescent Societies (IFRC) between 2001 and 2015.
Study design Cross-sectional.
Methods 2 authors independently assessed the quality of IFRC guidelines using the Appraisal of Guidelines for Research and Evaluation (AGREE II) instrument. Average domain scores were calculated and overall quality scores and recommendation for use were determined.
Results Out of 77 identified guidelines, 27 met the inclusion criteria and were assessed. The domains with the highest average scores across guidelines were ‘scope and purpose’, ‘clarity of presentation’ and ‘applicability’. The lowest scoring domains were ‘rigour of development’ and ‘editorial independence’. No guideline can be ‘recommended for immediate use’, 23 guidelines are ‘recommended with modifications’ and 4 guidelines are ‘not recommended’.
Conclusions The IFRC produces guidelines that should be adhered to by millions of staff and volunteers in 190 countries. These guidelines should therefore be of high quality. Up until now, the IFRC had no uniform guideline development process. The results of the AGREE II appraisal indicate that the quality of the guidelines needs to be improved.
- Quality assessment
- AGREE II
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
This study is the first to assess the quality of the guidelines of the International Federation of Red Cross and Red Crescent Societies (IFRC) using a widely used instrument for guideline appraisal.
The quality assessment was performed by two assessors. More assessors could in theory increase the reliability of the assessment. However, since the inter-rater reliability between both assessors was good to very good, the view of an extra assessor is unlikely to influence the overall scores.
Appraisal of Guidelines for Research and Evaluation (AGREE II) is designed for clinical practice guidelines, and most of the guidelines developed by the IFRC are not intended for clinical practice. Some items were therefore not applicable; since it only concerns 2 out of 23 items, this has not influenced the overall score.
The International Federation of Red Cross and Red Crescent Societies (IFRC) is the world's largest humanitarian organisation, providing assistance without discrimination as to nationality, race, religious beliefs, class or political opinions. Founded in 1919, the IFRC comprises 190 member Red Cross and Red Crescent National Societies, a secretariat in Geneva and more than 60 delegations around the world. It is an organisation with more than 17 million active volunteers and 427 000 paid staff, serving 182 million people annually.1
The IFRC carries out relief operations to assist victims of disasters, and combines this with development work to strengthen the capacities of its member National Societies. The role of the secretariat in Geneva is to coordinate and mobilise relief assistance for international emergencies, to promote cooperation between National Societies and to represent these National Societies in the international field. Over the years, the IFRC has published dozens of guidelines, guidance series, etc, to assist and guide the millions of volunteers and staff in their work.
In the past, guidelines developed by humanitarian and development agencies did not usually adhere to rigorous quality assessment standards.2–4 In recent times, however, international organisations such as the WHO and some non-governmental organisations such as Médecins Sans Frontières have adopted a more rigorous and well-structured guideline development process, whereby the guidelines are based on the latest available scientific evidence.5 ,6
Given the importance of guidelines in an organisation such as the IFRC, the objective of this study is to appraise the quality of the guidelines developed by the IFRC. The present study has been initiated and performed by the Belgian Red Cross. The Belgian Red Cross is one of the 190 members of the IFRC. Having established the ‘Centre for Evidence-Based Practice’ about 10 years ago (and which later became a reference centre of the IFRC), specialising in guideline development according to the ‘Evidence-Based Practice’ principles, the Belgian Red Cross took the initiative to make this analysis.
A list of the IFRC publications is available in the IFRC online library catalogue (http://weblis.ifrc.org/Libcat/index.html). The authors identified guidelines from this database using the following search strategy (‘advanced search’ function): keyword from title: guideline*; keyword from record: guideline*. Tick-off box: IFRC publication. The search was performed on 27 November 2015.
The objective was to analyse all guidelines aimed at guiding the work of staff and volunteers of all National Red Cross and Red Crescent Societies in the field of emergency relief, development cooperation and organisational development.
Therefore, (internal) administrative guidelines for the staff of the IFRC (eg, ‘human resource or finance guidelines’ or ‘social media guidelines for IFRC staff’) were excluded, as well as managerial guidelines (eg, ‘strategic planning guidelines for African National Societies’ or ‘guidelines for National Societies to organise local youth actions and celebrate the international year of the youth’). We only included IFRC guidelines and guidelines published jointly with the International Committee of the Red Cross (ICRC). One guideline developed with another (non-Red Cross) organisation (‘guidelines for drug donations’ developed with the WHO) was excluded.
Only guidelines in English were included. Guidelines in other Federation languages such as Arabic or Spanish were excluded, given that all IFRC guidelines are also available in English.
We included guidelines published between 1 January 2001 and 27 November 2015. Since there is no established process within the IFRC to review the guidelines after a certain number of years, or to abrogate guidelines, we decided to perform the analysis on the guidelines of the past 15 years.
The quality of each guideline was assessed using the Appraisal of Guidelines for Research and Evaluation (AGREE II) instrument.7 AGREE II is an international instrument for assessing the quality of guidelines in any disease area targeting any step in the healthcare continuum, including those for health promotion, public health, screening, diagnosis, treatment or interventions. AGREE II is one of only a few tools, specifically developed for quality assessment of guidelines, that uses a numeric rating scale to quantify guideline quality. It contains 23 items grouped into six quality domains. Furthermore, two global rating items are used to assess the overall quality of the guideline (box 1).8 AGREE II was chosen since it is the most comprehensively validated appraisal tool.9 AGREE II was also used for the appraisal of the WHO guidelines.5
Appraisal of Guidelines for Research and Evaluation II domains and items
Domain 1: scope and purpose
The overall objective(s) of the guideline is (are) specifically described.
The health question(s) covered by the guideline is (are) specifically described.
The population (patients, public, etc) to whom the guideline is meant to apply is specifically described.
Domain 2: stakeholder involvement
The guideline development group includes individuals from all relevant professional groups.
The views and preferences of the target population (patients, public, etc) have been sought.
The target users of the guideline are clearly defined.
Domain 3: rigour of development
Systematic methods were used to search for evidence.
The criteria for selecting the evidence are clearly described.
The strengths and limitations of the body of evidence are clearly described.
The methods for formulating the recommendations are clearly described.
The health benefits, side effects and risks have been considered in formulating the recommendations.
There is an explicit link between the recommendations and the supporting evidence.
The guideline has been externally reviewed by experts prior to this publication.
A procedure for updating the guideline is provided.
Domain 4: clarity of presentation
The recommendations are specific and unambiguous.
The different options for management of the condition or health issues are clearly presented.
Key recommendations are easily identifiable.
Domain 5: applicability
The guideline describes facilitators and barriers to its application.
The guideline provides advice and/or tools on how the recommendations can be put into practice.
The potential resource implications of applying the recommendations have been considered.
The guideline presents monitoring and/or auditing criteria.
Domain 6: editorial independence
The views of the funding body have not influenced the content of the guideline.
Competing interests of guideline development group members have been recorded and addressed.
Judgement as to the quality of the guideline, taking into account the criteria considered in the assessment process.
Provide a recommendation for use of the guideline.
The 23 items are rated on a seven-point Likert scale that measures the extent to which the specific criterion is fulfilled, ranging from 1 (strongly disagree) to 7 (strongly agree). The overall quality of the guideline is also rated with a seven-point Likert scale with 1 as the lowest possible quality and 7 the highest possible quality. Furthermore, a recommendation is made for the use of this guideline (yes; yes, with modifications; no).
Two authors (AVV and VB) independently assessed the quality of each guideline. The two assessors had complementary profiles; the first assessor (VB) is a methodologist with a PhD working full time on the development of systematic reviews and evidence-based guidelines; the second (AVV) has a 35-year career in international disaster and development assistance. The two assessors were not involved in the drafting of any of the guidelines assessed.
Both assessors worked according to the online tutorial (http://agree2.machealth.ca/players/open/index.html) and used the AGREE II user's manual instructions.8
In accordance with the AGREE II guidelines, it was decided in advance how to rate items that are not applicable (eg, items 2 (health question) and 11 (health benefits) for governance guidelines). The assessors decided not to skip the item, as this would entail modifications in calculating the domain scores (which is discouraged by AGREE II7), but to give these items an average score of 3. Furthermore, it was important to be as consistent as possible throughout the exercise in providing similar rates when the level of information is comparable. For example, for guidelines providing detailed descriptions about the contributors, such as name, occupation, affiliation, role, etc (compared with limited information such as ‘nutrition experts’ or no information at all), a similar higher score (5–7) should be given. In an effort to ensure that consistency, the assessors drafted a table detailing per AGREE II item ‘how to rate’ in three categories (1–2; 3–4; 5–7) with details about the level of information per category. This table was drafted after the first discussion of discrepancies and was used by both assessors for further evaluation of the guidelines.
Major scoring discrepancies between the two assessors (difference of more than two points on the Likert scale) were discussed and scores were changed in case of misinterpretation. No attempt was made to reach a consensus. A first comparison of scores was made after scoring five guidelines to clarify discrepancies. A second discussion round was held when all guidelines were scored.
The scores were entered into a Microsoft Excel (2013) spreadsheet. A domain score was calculated for each domain by scaling the sum of the scores of the individual items in a domain as a percentage of the maximum possible score for that domain, according to the AGREE II manual:8
The intraclass correlation coefficient (ICC) was calculated, before and after discussion, as an indicator of agreement between both assessors, using StatsDirect V.2.8.0 (StatsDirect statistical software. 2013. England, StatsDirect). For the classification of the degree of agreement, the scale proposed by Altman was used: ICC<0.20: poor; ICC 0.21–0.40: fair; ICC 0.41–0.60: moderate; ICC 0.61–0.80: good, ICC 0.81–1.00: very good.10
As in the AGREE II checklist no clear cut-off was provided on the relationship between the ICC value and the overall recommendation for use in practice, the authors decided to take the average of the scores for each item to rate the overall quality, and to base the overall recommendation on the ranking described in a previous study.11 This ranking states that if more than half of the domains have an overall domain score of more than 60%, the guideline is ‘recommended for immediate use’. If most domain scores are higher than 30%, the guideline is ‘recommended with modifications’, and if most domain scores are below 30%, the guideline is ‘not recommended for use’.
Guideline identification and characterisation
Seventy-seven records were identified with the aforementioned search strategy. Of these, 49 records were excluded based on the inclusion and exclusion criteria (figure 1): 13 records were duplicates, 18 were in a different language than English (eg, Arabic or Russian), 7 guidelines were excluded because they concerned internal managerial guidelines and 11 records were excluded for being internal administrative guidelines. One guideline was excluded because the full text of this guideline could not be obtained (figure 1).
Of the 27 guidelines that met the inclusion criteria, 13 are designed for a humanitarian aid context, dealing with a variety of subjects such as disaster preparedness, reconstruction, disaster law, food security or how to deal with a nuclear emergency.12–24 Only eight guidelines are related to a health issue (of which two in an emergency context), dealing with first aid, HIV/AIDS, nutrition, tuberculosis (TBC) or drug abuse.25–32 Six guidelines dealt with other topics such as cash transfer programming, humanitarian diplomacy, income-generating projects or strategic planning.33–38 Four guidelines were produced together with the ICRC (‘guideline for cash transfer programming’,34 ‘rapid assessment for markets—guidelines for an initial emergency market assessment’,22 ‘guidelines for assessment in emergencies’15 and ‘guidelines on first aid and HIV/AIDS’26). Half of the guidelines analysed (15 out of 27) were published before 2011, which allows us to conclude that a lot of IFRC guidelines are outdated.
The AGREE II domain scores across the guidelines are shown in figure 2 and table 1. The error bars represent the range of scores within each domain. The top and bottom of the boxes are the 75th and 25th centiles, respectively; the middle line of the boxes represents the median. The diamond shows the mean domain score.
The domain ‘scope and purpose’ had the highest average score (mean score 45.1; median 44.4; range 22.2–63.9). All guidelines described the overall objectives and most described the target population, although not always very specifically. For most guidelines, the health question was not applicable and this item was therefore rated with an average score (score of 3). ‘Clarity of presentation’ also had a higher average domain score (mean score 42.5; median 44.4; range 11.1–80.6). Most guidelines provided relatively specific recommendations which varied in presentation (bullet points or generally phrased descriptions). The different options for management of the condition or health issue was also not applicable for most guidelines, but where applicable, different options were mentioned. In the domain ‘applicability’ (mean score 38.7; median 37.5; range 18.8–77.1), 40% of the guidelines regularly mentioned facilitators and barriers to its application and almost all guidelines presented some advice and/or tools on how the recommendations could be put into practice.
‘Stakeholder involvement’ (mean score 31.9; median 27.8; range 11.1–72.2) was highly variable. Half of the guidelines did not mention the contributors of the guideline development group, and in more than half it was not clear whether the views and preferences of the target population were sought. However, all the guidelines gave some description of the target users.
The domains with the lowest scores were ‘rigour of development’ (mean score 8.9; median 6.3; range 0.0–41.7) and ‘editorial independence’ (mean score 5.2; median 0.0; range 0.0–29.2). There was only one guideline17 that had an average to fairly good score on most items within the rigour of development domain. All other guidelines scored low to very low on all items of this domain. As for ‘editorial independence’, none of the guidelines provided a disclosure of interests and only seven guidelines reported the funding body, although they did not state their role in the development process.
The initial level of agreement between assessors was good (ICC 0.79, 95% CI (−1.870971 to 2.083532)). After discussion, some scores were adjusted and the level of agreement increased to very good (ICC 0.90, 95% CI (−1.240409 to 1.443307)).
Overall quality and recommendation
The overall quality of the guidelines is moderate to low and quite variable; the mean overall quality score is 3 (median 3; range 2–4). No guideline can be ‘recommended for immediate use’, 23 guidelines are ‘recommended with modifications’ (mean score 3; median 3; range 2–4) and 4 guidelines are ‘not recommended’ (mean score 2; median 2; range 2–2; table 1). This means that all guidelines fail to meet several aspects of good guideline development, which can have serious consequences for practice. For example, guidelines with a low score in the domains ‘stakeholder involvement’, ‘rigour of development’ and ‘editorial independence’ may contain recommendations that are ineffective or even harmful, which is a waste of time and effort. Guidelines with a low score in the domains ‘scope and purpose’, ‘clarity of presentation’ and ‘applicability’ may be guidelines that are not implementable and thus have no impact on practice. When we compare the scores of the individual domains, there is no consistent difference between the guidelines that are ‘recommended with modifications’ and guidelines that are ‘not recommended’, except that the latter guidelines all have the lowest applicability score. It can only be concluded that more effort needs to be made to revise guidelines with a lower score. Concrete recommendations concerning the revision of these guidelines are given in the Discussion section.
Discussion and conclusions
The IFRC regularly drafts guidelines for the different fields of activities of the Red Cross & Red Crescent (RC&RC). These guidelines can be commissioned by a variety of Federation structures, from governing bodies to thematic RC&RC working groups, but there is no formal process for ‘officially’ initiating, drafting and approving a guideline. IFRC guidelines are written and published by the department responsible for the subject matter (eg, the Health Department wrote and published the nutrition guideline). Our study is the first to evaluate guidelines of the IFRC for quality, and the assessment was performed independently of the IFRC.
The quality assessment with AGREE II of the guidelines developed by the IFRC showed that there is a great variability in the quality of the guidelines developed. There is great overall variety between different guidelines, as well as a large variability within each domain of AGREE II (within guidelines; figure 2). The domains that score best were scope and purpose, clarity of presentation and applicability. The lowest scoring domains were rigour of development and editorial independence. The very low score on the rigour of development domain for all guidelines but one (all guidelines scored an average of <2, except the ‘guidelines for the domestic facilitation and regulation of international disaster relief and initial recovery assistance’17) indicates that the recommendations are usually not linked to supporting evidence, systematically collected from scientific literature searches. Furthermore, the low average score for taking into account target group experiences and expectations (1.98) indicates that in most cases the process relies heavily on expert opinion. Qualitative guidelines should be developed according to the method of ‘Evidence-Based Practice’. This means that the best available evidence is collected through systematic literature searches, which are important since guidelines that are mainly based on expert knowledge may be biased due to undeclared conflicts of interest, and a lack of or outdated knowledge.39 Evidence-based guidelines are generally considered to produce more valid recommendations.40 However, it is not always possible to support guidelines with evidence, since evidence is often lacking. For example, in an emergency setting, the priority is to help people in need, and it is usually perceived as unethical to perform research and test the effectiveness of interventions in such circumstances.41 However, the strength of the Evidence-Based Practice methodology is that, in addition to collecting the best available evidence, expert opinion and practice experience are also taken into account, as well as the preferences of the target group.42–44 Guideline recommendations are based on finding a balance between the quality of the evidence, benefits and harms, costs and preferences, and this applies to each of the three guideline categories that we defined for the purpose of this project (humanitarian aid, health issues, other). For guideline subjects for which existing evidence is scarce, as in the case of disaster aid, for example, more weight will be given to the aspect of collecting ‘expert opinion and practice experience’ and of formulating good practice points. It is important to do this in as unbiased a way as possible. It might therefore be interesting for those developing guidelines with a limited evidence base to use formal consensus methods, such as the Delphi method, nominal group technique and the consensus development conference.45 These methods provide an objective way to come to a decision, and are being used by other guideline development groups as well, for example, in the development of postdisaster psychosocial care guidelines, where a Delphi method was used to collect expert consensus from 106 experts.46 In an overview paper of guideline development methods, it was reported that ‘making group decisions and reaching consensus’ is mentioned in 17 different guideline methodology handbooks, and 13 handbooks provide a clear explanation about it.47
The low score on the editorial independence domain does not necessarily mean that the majority of the guidelines could not guarantee editorial independence, but rather that no information on this subject was transparently reported. Nevertheless, it is well known that conflicts of interest can negatively influence the quality of guidelines and could result in disadvantages for the target group.48–50 Therefore, it is recommended to include enough methodologists (who are often unbiased about the content of a specific guideline), in addition to content experts (whose content expertise is needed, but who may have subjective opinions on certain topics). Content experts should be prevented from making decisions on recommendations where there is a conflict of interest.51 An additional approach to having an independent review by external experts is publication in an open-access peer-reviewed journal. This study, however, has some limitations. First, the quality assessment was performed by two assessors, as minimally recommended by AGREE II.7 More assessors could in theory increase the reliability of the assessment. However, since the inter-rater reliability between both assessors was good to very good (meaning there was a high agreement between both assessors), the view of an extra assessor is unlikely to influence the overall scores. Second, some challenges were encountered during the process. AGREE II is designed for guidelines in any disease area targeting any step in the healthcare continuum,7 and most of the guidelines developed by the IFRC are guidelines outside healthcare. AGREE II was used since this is the only tool specifically developed for quality assessment of guidelines, with use of a numeric scale to quantify guideline quality. We acknowledge that AGREE II is not developed for guidelines outside healthcare; however, while appraising the included guidelines for this project, we did not identify any problems in using this checklist for non-healthcare guidelines. Only 2 of the 23 items (items 2 and 11, box 1) dealt specifically with ‘health questions’ or ‘health benefits’, and the guidelines assessors agreed on a procedure to deal with this). By giving a score of 3 for those items (instead of excluding the item, which is discouraged by AGREE II8), the overall score of the guideline will not be affected.
The IFRC produces guidelines that should be adhered to by millions of staff and volunteers in 190 countries. These guidelines therefore should be of high quality. Since none of the guidelines included in our analysis can be recommended for practice, all guidelines should ideally be revised (meaning that a ‘de novo guideline’ is developed), so that they comply with set quality standards before they are published and promoted, and they are developed following an agreed on guideline development process covering all stages from initiation, through development, to approval. Once this is the case, the guidelines should be updated regularly (ie, every 5 years) in order to take into account the changes in the operational environment, the role and mandates of the different humanitarian actors and the newly available scientific evidence. When setting priorities for guideline revision, several aspects can be taken into account: first, the four guidelines that are ‘not recommended’ for practice date from 2001, 2003, 2007 and 2010, and therefore their further use is probably not recommended. Second, based on the reach and impact of the guidelines, it could be decided which guidelines should be revised first.
A similar evaluation was performed for the WHO guidelines. In 2003, the WHO performed an in-house analysis, using the AGREE checklist. This analysis was followed by interviews conducted with Department Directors at the WHO headquarters, leading to the conclusion that evidence was rarely used in the development of guideline recommendations.5 As a consequence of these analyses, a Guideline Review Committee (GRC) was established in 2007 and a WHO Handbook of Guideline Development was published. Following this improvement in guideline development, a second quality appraisal analysis of a set of guidelines was made in 2013, making use of AGREE II. The latter analysis included 124 guidelines intended for a global audience, all concerning health interventions on a wide variety of topics (34% of all guidelines addressed HIV/AIDS and/or tuberculosis). It was concluded that the quality of the WHO guidelines is better, but further improvement is still necessary.5 From this experience, we know that documenting the quality of the WHO guidelines resulted in fewer, albeit higher quality, guidelines being developed.
In addition to guideline quality, guideline implementation is at least as important, since there may be barriers that hinder implementation of guidelines among a certain target group. Implementation strategies include strategies to promote guideline use.52 The AGREE II tool already takes into account some aspects that should facilitate implementation (eg, clarity of presentation, applicability). In addition, it is considered relevant to identify implementation barriers and facilitators, so that implementation strategies can take these into account in order to maximise impact.53 Various efforts have been made to facilitate guideline implementation, such as the development of a checklist for guideline implementation planning.52
Considering that the IFRC has no uniform guideline development process and that the results of AGREE II indicate that the quality of the guidelines needs to be improved, the IFRC could improve its guidelines by:
Setting up a formal procedure for guideline development and/or a revision process;
Setting up a GRC to ensure that IFRC guidelines are of high quality and are developed according to a transparent, evidence-based decision-making process;
Drafting a handbook to provide guidance on the development of guidelines and other documents detailing the procedures to be followed when submitting a guideline or document with recommendations to the GRC;
Setting up a formal procedure to monitor the implementation of the guidelines by the membership.
Some RC&RC National Societies already make efforts to produce evidence-based guidelines. In addition to the American Red Cross, which takes the initiative to follow the principles of Evidence-Based Practice, the Belgian Red Cross with its Centre for Evidence-Based Practice also works according to a methodological charter for the development of evidence-based guidelines and systematic reviews.42
This paper thus forms a baseline measurement with regard to IFRC guidelines, which will help to monitor progress in the future, and can also be a first step in motivating all other components of the RC&RC Movement to develop evidence-based guidelines.
Contributors AVV contributed to the study design, literature search, data collection, data analysis, data interpretation and writing of the manuscript. VB contributed to the data collection, data analysis, data interpretation and writing of the manuscript. EDB and PV contributed to the study design and writing of the manuscript. All authors are employees at the Belgian Red Cross-Flanders and were involved in the development of this manuscript. They gave final approval of the version to be submitted.
Funding This work was made possible through funding from the Foundation for Scientific Research of the Belgian Red Cross-Flanders.
Competing interests The Belgian Red Cross is a member of the National Society of the International Federation of Red Cross and Red Crescent Societies (IFRC).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The scoring details of the guideline appraisal of each assessor are available on request from the corresponding author at firstname.lastname@example.org.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.