Article Text

Download PDFPDF

Do guidelines offer implementation advice to target users? A systematic review of guideline applicability
  1. Anna R Gagliardi1,
  2. Melissa C Brouwers2
  1. 1University Health Network, Toronto, Ontario, Canada
  2. 2McMaster University, Hamilton, Ontario, Canada
  1. Correspondence to Anna R Gagliardi; anna.gagliardi{at}


Objective Providers and patients are most likely to use and benefit from guidelines accompanied by implementation support. Guidelines published in 2007 and earlier assessed with the Appraisal of Guidelines, Research and Evaluation (AGREE) instrument scored poorly for applicability, which reflects the inclusion of implementation instructions or tools. The purpose of this study was to examine the applicability of guidelines published in 2008 or later and identify factors associated with applicability.

Design Systematic review of studies that used AGREE to assess guidelines published in 2008 or later.

Data sources MEDLINE and EMBASE were searched from 2008 to July 2014, and the reference lists of eligible items. Two individuals independently screened results for English language studies that reviewed guidelines using AGREE and reported all domain scores, and extracted data. Descriptive statistics were calculated across all domains. Multilevel regression analysis with a mixed effects model identified factors associated with applicability.

Results Of 245 search results, 53 were retrieved as potentially relevant and 20 studies were eligible for review. The mean and median domain scores for applicability across 137 guidelines published in 2008 or later were 43.6% and 42.0% (IQR 21.8–63.0%), respectively. Applicability scored lower than all other domains, and did not markedly improve compared with guidelines published in 2007 or earlier. Country (UK) and type of developer (disease-specific foundation, non-profit healthcare system) appeared to be associated with applicability when assessed with AGREE II (not original AGREE).

Conclusions Despite increasing recognition of the need for implementation tools, guidelines continue to lack such resources. To improve healthcare delivery and associated outcomes, further research is needed to establish the type of implementation tools needed and desired by healthcare providers and consumers, and methods for developing high-quality tools.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study found that, among 137 guidelines published from 2008 to 2013 described in systematic reviews published from 2010 to 2014, the mean and median domain scores for applicability were 43.6% and 42.0%, respectively, and applicability scored lower than the other five Appraisal of Guidelines, Research and Evaluation (AGREE) domains.

  • Applicability of guidelines, which refers to the inclusion of implementation instructions and tools, did not improve subsequent to the publication of two similar meta-reviews in 2010 and 2012, respectively, which examined a total of 654 guidelines published from 1980 to 2007.

  • Country (UK) and type of developer (disease-specific foundation, non-profit healthcare system) appeared to be associated with applicability when assessed using AGREE II (not original AGREE) though these findings should be interpreted with caution.

  • Our literature search may not have identified all relevant studies, the AGREE instrument may not objectively appraise guidelines, or high AGREE scores may not be a determinant of guideline use; therefore, further research is needed to identify strategies that promote and support the development of guideline implementation tools.


Guidelines play a fundamental role in healthcare planning, delivery, evaluation and quality improvement. However, they are not consistently translated into policy or practice.1–3 Interviews with users found they were frustrated with the vast number of guidelines and uncertain about how to implement them given numerous interacting contextual challenges.4–6 Greenhalgh et al7 described this as an evidence-based medicine ‘crisis’ and called for guideline-based tools that could be used by providers and patients to clarify the goals of care, quality and completeness of evidence, and relevance of potential benefits and harms. Pronovost8 also advocated for the development of implementation tools such as instructions for assessing barriers and choosing corresponding implementation strategies, and point-of-care checklists that integrate recommendations for patients with comorbid conditions.

Considerable evidence supports the assertion that guidelines featuring implementation instructions or tools such as those recommended by Greenhalgh et al7 and Pronovost8 are more likely to be used.9–11 For example, a systematic review of 68 studies of provider adherence to asthma guidelines found that decision support tools (electronic or paper-based guideline summaries, algorithms, history-taking template, asthma status reminders) increased prescribing and provision of patient self-education or action plans, and was the only intervention studied that reduced emergency department visits.9 A Cochrane systematic review of eight studies found that mailing of print summaries improved compliance with care delivery recommendations.10 A systematic review of 100 randomised/non-randomised studies involving 3826 practitioners/practices caring for more than 92 895 patients found that nearly two-thirds of studies resulted in improved guideline adherence for diagnosis, prevention, disease management and prescribing.11

The Appraisal of Guidelines, Research and Evaluation (AGREE) instrument and its revised version, AGREE II, can be used to develop or appraise guidelines and related material in separate documents that may be published or publicly available on websites.12 ,13 AGREE II consists of 23 items grouped in six domains: scope and purpose, stakeholder involvement, rigour of development, clarity and presentation, applicability and editorial independence.13 The domain of applicability includes four items related to planning, undertaking and evaluating implementation—facilitators and barriers of guideline implementation, resource considerations, monitoring or audit criteria, and implementation instructions or tools similar to those recommended by Greenhalgh et al7 and Pronovost,8 and for which there is evidence of association with guideline use.9–11 A metareview by Alonso-Coello et al14 of 42 studies in which 626 guidelines on a range of topics published in various countries from 1980 to 2007 were assessed with AGREE found that most guidelines scored low for applicability (mean 22%, 95% CI 20.4% to 23.9%) relative to all other domains. Another meta-review by Knai et al15 of 28 European guidelines on a range of topics published from 2000 to 2007 similarly found that most guidelines scored low for applicability (mean 44%, range 0–100%) relative to all domains but editorial independence. Although scoring reflects all domain items, not only the presence of implementation tools, the finding that applicability consistently scored lower than other domains across multiple years and types of guidelines is striking.

Limited use of guidelines contributes to omission of beneficial therapies, preventable harm, suboptimal patient outcomes or experiences, or waste of resources.7 ,8 Alonso-Coello et al14 and Knai et al15 showed that few guidelines featured implementation tools, which improve guideline use, but both studies were based on guidelines published in 2007 or earlier. This study reviewed the applicability of guidelines published in 2008 or later given emerging views and evidence regarding the need for implementation tools. A secondary purpose was to identify factors associated with applicability. The findings may reveal whether additional guidance is needed to promote the development of guideline implementation tools, thereby enabling guideline use, and improved care delivery and associated outcomes.



We conducted a meta-review of studies that used the original AGREE instrument or AGREE II (henceforth referred to collectively as AGREE) to evaluate the quality of guidelines. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria guided reporting of the methods and findings (etable 1).16 A protocol was not registered and ethics review was not required.

Searching and screening

MEDLINE and EMBASE were searched from 2008 to July 2014 for English language studies that assessed guidelines using AGREE. The search strategy (box 1) was based on terms used to index previous meta-reviews.14 ,15 The references of all eligible studies were also screened. Titles and abstracts of search results were reviewed independently by the principal investigator and a trained research assistant. All items selected by at least one reviewer were retrieved for further assessment. Studies were eligible if they were systematic reviews, one or more of the guidelines they evaluated were published in 2008 or later, guidelines were assessed by at last two reviewers, scores for all AGREE domains were reported and either domain score, or scores for individual items such that domain score could be calculated, were reported for each guideline. Eligible studies reviewed all guidelines in a particular country, or all guidelines on a particular topic, clinical condition or type of patient management. Studies were not eligible if they compared guideline content only (eg, underlying evidence, development methods or recommendations across guidelines) and did not report domain scores; evaluations served as a baseline needs assessment in a country new to guideline development since they had not yet developed capacity for generating guidelines; or guidelines were sampled from and assessed by the same organisation since this would not reflect a range of factors of interest that might influence applicability, and potentially bias the assessment. Studies in the form of abstracts, letters, commentaries or editorials were not eligible.

Box 1

Literature search strategy. Database: Ovid MEDLINE(R) without Revisions <1996 to June Week 2 2014> Search Strategy:

1. Practice Guidelines as Topic/st [Standards] (5129)

2. Quality control/ (27721)

3. (14714)

4. 2 or 3 (42328)

5. 1 and 4 (204)

6. Limit 5 to (English language and yr=“2008-Current)” (117)

7. Limit 6 to (comment or editorial or interview or lectures or letter or news) (15)

8. 6 not 7 (102)

Data collection and analysis

There are frameworks that identify multiple, often interacting factors that influence guideline use,4–6 but there are no frameworks that identify why some guidelines and not others feature implementation tools. We postulated that type of developer, nature or complexity of guideline topic, year produced or AGREE version may have influenced decisions about whether to develop implementation tools. For eligible reviews, data were collected on year published, clinical topic, version of AGREE, range of years during which guidelines were published, number of guidelines appraised and number of guidelines appraised that were published in 2008 or later. For individual guidelines published in 2008 or later included in eligible reviews, data were collected on date published, country, type of developer (professional organisation, disease-specific organisation, government agency, non-profit agency, healthcare system, academic organisation), AGREE version and domain scores. Data were extracted and tabulated by the principal investigator, then independently reviewed by the research assistant. Descriptive statistics were calculated for all domains (mean, median, range, IQR). We tested the association between applicability score and guideline publication date, country and type of developer using mixed effect models accounting for the review source as a nested variable. A secondary analysis was conducted testing the association between applicability and the covariates using the same statistical procedure stratified by AGREE. We used SAS V.9.4 (SAS Institute, Cary, North Carolina, USA) to conduct the analysis. All p values were two-sided and reported as being statistically significant on the basis of a significance level of 0.05. The methodological quality of eligible studies was not scored since AMSTAR is appropriate for assessing systematic reviews of randomised controlled trials.17 However, most of its 11 items (items 1, 2, 5–9) were screening criteria and therefore present across all studies.


Study characteristics

The search resulted in 245 articles; 53 were retrieved and 20 were eligible for review (figure 1). Studies were published from 2010 to 2014 and reviewed 254 guidelines (range 5–24) published from 1992 to 2013 on numerous topics (table 1).18–37 Guidelines were appraised with the original AGREE instrument in 9 studies and AGREE II in 11 studies. Of the guidelines included in eligible studies, 137 were published in 2008 or later.

Table 1

Characteristics of eligible studies

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.

Guideline characteristics

Of 137 guidelines, 33 (24.1%) were published in 2008, 37 (27.0%) in 2009, 28 (20.4%) in 2010, 22 (16.1%) in 2011, 14 (10.2%) in 2012 and 3 (2.2%) in 2013. Almost half were published by professional associations or societies (67, 48.9%). The remaining guidelines were published by government agencies (36, 26.3%), disease-specific organisations (16, 11.7%), non-profit health delivery systems (10, 7.3%), academic organisations (7, 5.1%) and one by the WHO. Most guidelines were developed in the USA (46, 33.6%), UK (25, 18.2%) and Canada (20, 14.6%), and 13 (9.5%) by international groups. Several countries produced one or more guidelines included in the sample, including Argentina, Australia, Brazil, Finland, France, Italy, Japan, Malaysia, Mexico, the Netherlands, New Zealand, Saudi Arabia, Singapore, South Africa, Spain, Sweden and Turkey. Most guidelines were appraised using AGREE II (103, 75.2%).

Applicability scores

Table 2 summarises scores for all AGREE domains. The mean and median domain scores for applicability across all guidelines were 43.6% and 42.0%, respectively. These were lower than the mean and median of all other domains for guidelines in the sample. These results are higher than those reported by Alonso-Coello et al14 (mean 22%, 95% CI 20.4% to 23.9%) and similar to the findings of Knai et al15 (mean 44%, range 0–100%). The spread across range and IQR for each domain shows that scores for all domains were inconsistent across guidelines but more so for editorial independence, rigour of development, then applicability, followed by remaining domains.

Table 2

Domain scores for guidelines published in 2008 or later

Factors influencing applicability

An analysis of factors associated with applicability appears in table 3. The estimated intraclass correlation was 0.47. Applicability mean score differed by year of guideline publication. Guidelines published in 2010 and 2012 were associated with higher applicability score than those published in 2013. The differences in mean applicability score for 2010 and 2012, were 26.5 (p<0.03) and 28.3 (p<0.02), respectively. With respect to country, the highest mean applicability score was for guidelines developed in the UK. Guidelines developed by international groups, in Canada or the USA had significantly lower applicability scores compared with the UK. As for type of developer, disease-specific foundations and non-profit healthcare systems were associated with higher applicability scores than professional guideline developers. Mean applicability score differences were 16.2 (p=0.01) and 14.9 (p<0.04), respectively. When stratified by version of the AGREE instrument, 34 studies were included in the analysis for AGREE and 103 for AGREE II. The association between applicability score, year, country and type of guideline developer remained significant for AGREE II only and not for AGREE (data not shown).

Table 3

Observed, adjusted and mean applicability score difference using mixed effect model controlling for publication year, country and type of guideline developer nested within the review source, 2008–2013


Providers and patients are most likely to benefit from guidelines featuring implementation tools.5 ,6 ,9–11 The applicability of guidelines published in 2008 or later has not markedly improved compared with guidelines published in 2007 or earlier, did not increase over time from 2008 to 2013 and remains low compared with other AGREE domains.14 ,15 Guidelines published in the UK, or by disease-specific foundations or non-profit healthcare systems, appeared to be associated with higher applicability scores, though only when assessed using AGREE II. These findings are of concern given the intensity and cost of efforts to generate an ever-increasing body of guidelines that are not used. Although multiple factors other than implementation tools influence guideline use, including patient, provider, institutional and system-level issues, implementation tools are meant to overcome many of these barriers.4–6 Furthermore, guideline developers, implementers and researchers said that, in comparison with other approaches for implementing guidelines, developing implementation tools was more feasible, could be widely applied and was therefore more likely to impact guideline use.38

Several issues may limit the interpretation and use of these findings. The literature search may not have identified all relevant studies; however, we searched the two most relevant medical databases, and screening and data extraction were undertaken independently by two individuals to improve reliability. We relied on published meta-reviews, so the sample of guidelines was non-random. However, eligible studies included 137 guidelines published in 2008 or later with a variety of characteristics, so the findings may be generalisable across other guidelines. Others have noted several limitations of the AGREE instrument that was used to score guidelines in eligible studies.17 For example, scoring of domain items can be subjective, and domain or overall score has not been definitively associated with guideline use. With respect to applicability, this information may be more likely found outside the guideline document compared with content reflecting other domains, rendering an assessment of applicability more challenging. However, AGREE remains the internationally accepted gold standard for appraising guidelines. It is notable that associations between applicability scores and other factors were revealed by AGREE II13 in which the definition and instructions for scoring of applicability were elaborated on compared with the original AGREE instrument.12 While this study found that country (UK) and type of developer (disease-specific foundation, non-profit healthcare system) were associated with applicability score, the finding may not be meaningful, in part because all non-profit healthcare systems were located in the US and not the UK, and because fewer guidelines were produced by non-professional organisations. However, this study was exploratory in nature and examined preliminary hypotheses, so ongoing research is needed to investigate the influence of these, and other factors, perhaps by repeating the same analysis once more meta-reviews were published. Alternatively, further investigation of other factors, for example, the characteristics and workflow of the intended users of these guidelines, may provide some insight on why implementation tools were created for these guidelines. Despite these potential limitations, this study underscores the urgent need to create impetus and guidance that would support the development of guideline implementation tools.

AGREE and other initiatives such as Grading of Recommendations Assessment, Development and Evaluation (GRADE) system and the GuideLine Implementability Appraisal (GLIA) instrument have improved the description of guideline methods, evidence and recommendations.39 ,40 However, there has been far less scrutiny of accompanying implementation tools. We interviewed international guideline developers who said there was a demand for such resources among their constituents but they required guidance for developing implementation tools.38 We and others analysed guideline development and implementation instructional manuals and found that they lacked guidance for developing implementation tools.41 ,42 Therefore, we consulted with members of the international guideline community to generate a 12-item framework that can serve as the basis for evaluating and endorsing or adapting existing guideline implementation tools, or developing new tools.43

Additional research is needed to examine the type of tools that are most needed and preferred by different types of guideline users, the types of implementation tools best suited for different guidelines and the features of implementation tools that are associated with guideline use. Pronovost8 noted that developers may lack relevant expertise to develop implementation tools and encouraged them to partner with others such as implementation or social scientists. Coordinating complex, protracted partnerships involving numerous stakeholders with differing interests can be challenging.44 ,45 However, the Choosing Wisely initiative, in which numerous specialty societies and consumer groups partnered to develop shared decision-making tools, demonstrates that partnership is indeed possible when there is a widely recognised need for improvement.46 Still, further research is needed to identify the capacity, including skills and resources needed to develop implementation tools.



  • Contributors ARG and MCB conceived the study, interpreted data, drafted the manuscript and approved the final version of this manuscript. ARG was responsible for collecting and analysing data. Both authors agree to be accountable for the work, and to provide data on which the manuscript was based. This research was conducted without peer-reviewed funding. All authors, external and internal, had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All relevant data are available in the manuscript and online supplementary files.