Introduction Using the best current evidence to inform clinical decisions remains a challenge for clinicians. Given the scarcity of trustworthy clinical practice guidelines providing recommendations to answer clinicians’ daily questions, clinical decision support systems (ie, assistance in question identification and answering) emerge as an attractive alternative. The trustworthiness of the recommendations achieved by such systems is unknown.
Objective To evaluate the trustworthiness of a question identification and answering system that delivers timely recommendations.
Design Cross-sectional study.
Methods We compared the responses to 100 clinical questions related to inpatient management provided by two rapid response methods with ‘Gold Standard’ recommendations. One of the rapid methods was based on PubMed and the other on Epistemonikos database. We defined our ‘Gold Standard’ as trustworthy published evidence-based recommendations or, when unavailable, recommendations developed locally by a panel of six clinicians following the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Recommendations provided by the rapid strategies were classified as potentially misleading or reasonable. We also determined if the potentially misleading recommendations could have been avoided with the appropriate implementation of searching and evidence summary tools.
Results We were able to answer all of the 100 questions with both rapid methods. Of the 200 recommendations obtained, 6.5% (95% CI 3% to 9.9%) were classified as potentially misleading and 93.5% (95% CI 90% to 96.9%) as reasonable. 6 of the 13 potentially misleading recommendations could have been avoided by the appropriate usage of the Epistemonikos matrix tool or by constructing summary of findings tables. No significant differences were observed between the evaluated rapid response methods.
Conclusion A question answering service based on the GRADE approach proved feasible to implement and provided appropriate guidance for most identified questions. Our approach could help stakeholders in charge of managing resources and defining policies for patient care to improve evidence-based decision-making in an efficient and feasible manner.
- evidence based practice
- decision support
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
The study was carried out in a real-world scenario (questions related to patients being treated in a clinical ward).
Three different clinicians were randomly assigned to apply the different answering strategies.
We developed a transparent framework to categorise the recommendations obtained by the rapid strategies.
We sought to provide trustworthy ‘Gold Standard’ recommendations; nevertheless, it is not possible to guarantee that they were optimal.
It is unclear if the observed results can be replicated in other settings, for example, with participants less trained in evidence-based decision-making.
Research consistently shows that there is an important gap between evidence and practice,1 2 and clinicians seldom use the best available evidence to guide their decisions.3–5 Limited time, lack of training in critical appraisal and low expectations for finding relevant answers are among the most commonly identified obstacles.6 7 These practices are problematic as the benefits of using the best current evidence to inform clinical decisions are widely accepted to such extent that evidence-based decision-making is frequently considered a measure of healthcare quality.8 In particular, hospital executive boards, insurance companies and consumers recognise that evidence-based practice may help prevent unsafe or inefficient practices.9–11
One of the potential solutions for bringing evidence to bedside decisions is the use of trustworthy and transparent clinical practice guidelines. Although the last decade has seen significant advances in guideline methodology (http://www.gradeworkinggroup.org/), important limitations still remain: (1) only a small number of guidelines have been tailored to clinicians’ needs;12 (2) finding relevant guidelines can be laborious and time consuming; and (3) typically, only a few guidelines are kept up to date.13
Another alternative for bridging the gap between evidence and clinical practice are clinical decision support systems designed to provide assistance to clinicians in the question identification and resolution process by finding the answer for them and presenting the information in a user-friendly way.14–18 Unlike products that passively provide preapprised evidence at the point of care (eg, UpToDate), this systems involve trained practitioners that search and deliver tailored answers to identified questions. However, the trustworthiness of the recommendations achieved by such systems is unknown.
The objective of this study was to evaluate the trustworthiness of a question identification and answering system that delivers timely recommendations to clinicians providing care to inpatients by comparing the imparted guidance with ‘Gold Standard’ recommendations. Additionally, we come up with a proposal on how to replicate the process.
We conducted the study on the Internal Medicine Service of the German Hospital of Buenos Aires, Argentina, from March 2014 to March 2016. The context in which this study was carried out has been described in another publication.17
We compared two rapid response methods with trustworthy published evidence-based recommendations or, when not available, recommendations developed locally by a panel of six clinicians which, for the purpose of this study, we considered as our ‘Gold Standard’. One of the rapid response methods was based on PubMed using clinical queries, which are a series of filters designed to improve the retrieval of scientifically strong and clinically relevant articles from PubMed database.19 The other was based on Epistemonikos, which is a relational, collaborative, multilingual database of health evidence that includes systematic reviews from multiple sources (Cochrane database of systematic reviews and PubMed, among others).20
Three clinicians trained in evidence-based decision-making (informationists) attempted to answer all the identified questions following three different strategies. The informationists differ from clinical librarians in that they are trained in clinical epidemiology methods rather than simply information acquisition and have clinical expertise relevant to the questions that allows contextual interpretation of research findings. Each question had its own randomisation schedule drawn from a computer pregenerated random number list in which each informationist was assigned to one of the three strategies defined below. We describe the question identification process and the strategies to address the questions in the following sections.
Identification and selection of clinical questions
One of the informationists (AI), otherwise uninvolved in the patients’ care, identified questions relevant to the staff and residents of the Internal Medicine Service. Either the staff or residents explicitly formulated the questions, or AI inferred them from the discussion of the clinical cases. He collected the relevant clinical question using the Population/Problem, Intervention, Comparison, Outcome (PICO) framework.
In order to focus on questions that could potentially impact clinicians’ course of action, we excluded questions that (1) were answered immediately by someone who was present in the session, other than the informationists, typically using electronic resources such as UpToDate; (2) were not related to therapeutic or diagnostic interventions and (3) addressed interventions already implemented in the patient's care.
All the identified questions that did not fulfilled one of the exclusion criteria were included and registered. The described question identification process was repeated until the study was finished.
Rapid strategy based on PubMed (strategy 1)
The informationist assigned to this strategy performed a literature search on MEDLINE using the PubMed clinical queries feature (online supplementary figure 1). First, he tried to identify relevant systematic reviews;21 when unavailable or when considered that additional relevant information could exist, he also searched for primary studies. Once the informationist identified the most relevant systematic review and/or primary study/s, he followed the GRADE approach to interpret the results and judge the certainty on the evidence (for a detailed description, see Grading of Recommendations Assessment, Development and Evaluation (GRADE) handbook available at http://gdt.guidelinedevelopment.org/app/handbook/handbook.html). Following the GRADE guidance, the informationist also considered additional relevant information related to patients’ values and preferences, costs, applicability and feasibility,22 23 and made a clinical decision simulating what clinicians could do in the optimal scenario. To capture the decision, the informationist formulated a recommendation that included the direction (in favour or against the intervention) and the strength (strong or weak). The process took no more than 2 hours.
Supplementary figure 1
Rapid strategy based on Epistemonikos (strategy 2)
The informationist assigned to this strategy searched on the Epistemonikos database using the ‘matrices of evidence’ tool, which is a tabular way of displaying the cluster of systematic reviews that share at least one included study24 and followed the same process described for strategy 1 (online supplementary figure 2). He also searched PubMed for randomised controlled trials (RCT) in cases where systematic reviews were not available or when he considered that additional relevant information could exist (online supplementary figure 2).
Supplementary figure 2
Strategy based on trustworthy recommendations (‘Gold Standard’) (strategy 3)
The informationist assigned to this strategy searched for recommendations developed with the GRADE approach on the following databases: Tripdatabase (http://www.tripdatabase.com); National guideline Cleringhouse (http://www.guidelines.gov); Canadian Medical Association (https://www.cma.ca); National Institute for Health and Care Excellence (http://www.nice.org.uk/); SIGN (http://www.sign.ac.uk); GuíaSalud (http://portal.guiasalud.es/web/guest/buscar-gpc); Australian clinical practice guidelines (http://www.clinicalguidelines.gov.au); New Zealand Guidelines Group (http://www.nzgg.org.nz/); US preventive Task Force (http://www.uspreventiveservicestaskforce.org/); eGuidelines (https://www.guidelines.co.uk/) and GIN (http://www.g-i-n.net/about-g-i-n/introduction).
He critically assessed the identified recommendations using the criteria proposed for evaluating GRADE recommendations25 and qualitatively categorised their trustworthiness as high, moderate or low based on the answers to the following questions: Was the question clearly formulated? Were all the critical outcomes considered? Was the recommendation based on the best current evidence? The evidence was clearly presented? Was the recommendation coherent with the supporting evidence? Were the values and preferences considered?
Additionally, for every question, the same informationist searched for systematic reviews, RCTs and observational studies on the following databases without time restriction: PubMed, Epistemonikos and the Cochrane database of systematic review. He used the information extracted from the relevant systematic reviews and/or primary studies to construct a Summary of Finding Table (SoF) following the GRADE principles (SoF example available in online supplementary table 1).26 27 The tables were then sent via email to six clinicians (‘local panel’) with experience in applying the GRADE approach. Each clinician used the information included in the SoF tables and considered issues related to patients’ values and preferences, costs, applicability and feasibility to individually construct a recommendation.22 23 When >66% of the clinicians who answered agreed on the strength and direction of the recommendation, we considered that recommendation final. Disagreement in the direction or the strength of the recommendation was recorded and resolved by seventh clinician (IN) with experience in developing GRADE recommendations. Although we intended to answer every question with the described ‘local panel’ approach, we only used the resultant recommendations when published GRADE recommendations developed by guideline panels rated as ‘high’ for trustworthiness were unavailable. Figure 1 provides a description of the ‘Gold Standard’ recommendation construction process.
Supplementary table 1
We compared the recommendations, quality of evidence judgements and information used by rapid strategies and the ‘Gold Standard’ strategy to define the following outcomes:
Inappropriate recommendations: when the ‘Gold Standard’ was a strong recommendation and the rapid strategies yielded a decision in the opposite direction of any strength or when the ‘Gold Standard’ was a weak recommendation and the rapid strategies yielded a strong recommendation in the opposite direction.
Overconfident recommendations: when the ‘Gold Standard’ was a weak recommendation and the rapid strategies yielded a decision concordant with a strong recommendation on the same direction.
Potentially misleading recommendations: composite of inappropriate or overconfident recommendations.
Concordant recommendations: when the ‘Gold Standard’ and the rapid strategies yielded a recommendation of the same direction and strength
Reasonable disagreement: when the ‘Gold Standard’ was a weak recommendation in favour and the rapid strategies yielded a weak recommendation against or vice versa, or when the ‘Gold Standard’ was a strong recommendation and the rapid strategies yielded a weak recommendation on the same direction
Reasonable recommendations: composite of concordant recommendations and reasonable disagreement.
Same direction recommendations: when the Gold Standardandthe rapid strategies yielded a recommendation of the same direction regardless of its strength.
Inappropriate quality of evidence judgement: proportion of recommendations in which the quality of evidence (1) was judged as low or very low by the rapid strategies and high or moderate by the ‘Gold Standard’ or (2) was judged as high or moderate by the rapid strategies and low or very low by the ‘Gold Standard’.
Coincidence in information usage: proportion of recommendations in which the publications used by the rapid methods were the same as the ones used the ‘Gold Standard’.
Table 1 describes the framework for rapid recommendation categorisation based on their comparison with ‘Gold Standard’ recommendations.
We also performed a post hoc qualitative analysis of the recommendations classified as potentially misleading. We analysed the reasons for the disagreement between the rapid strategies and the gold standard and we considered potential solutions. For this purpose, in cases in which the potentially misleading recommendations were judged to be a consequence of inadequate evidence selection, we determined if the appropriate use of the Epistemonikos matrices tool could have prevented that problem (ie, identification of a systematic review containing primary studies that were not considered for the development of the original recommendation). In cases in which potentially misleading recommendations were judged to be a consequence of inappropriate evidence interpretation, we determined if the correct presentation of the evidence could have prevented the problem. To assess this, we sent the SoF table constructed in response to the same question for the ‘Gold Standard’ strategy (strategy 3) to the investigator who originally constructed the potentially misleading recommendation. We asked the investigator to provide a new recommendation based in the SoF. We judged that the correct use of the SoF could have prevented the problem when the investigator provided a reasonable recommendation in response (compared with the Gold Standard (GS) recommendation).
For the comparisons between the rapid strategies and the ‘Gold Standard’, we calculated proportions and 95% CI for all the outcomes. We also calculated inter-rater agreement with kappa statistic using VassarStats calculator (http://vassarstats.net/kappa.html). For the kappa calculation related to recommendation concordance (strong in favour, weak in favour, weak against or strong against), we imputed the double of distance between strong in favour—weak in favour and strong against—weak against than weak in favour—weak against. For the kappa calculation related to quality of evidence agreement (high, moderate, low or very low), we imputed the double of distance between moderate—low than very low—low and moderate—high. For the comparison between strategies 1 and 2, we calculated relative risks and 95% CI when possible.
During the study period, we identified 100 questions all of which were answered with strategies 1 and 2 (200 recommendations). With strategy 3, we found recommendations in clinical practice guideline (CPG) for 80 of the 100 questions all of which could be answered by the ‘local panel’ approach. The process of answering each question with strategy 3 (‘Gold Standard, local panel’ approach) took, on average, 1 week per question. Table 2 presents the characteristics of the recommendations delivered by each strategy. A list of the PICOs is available in the online supplementary table 2.
Supplementary table 2
Following the process described in figure 1, we obtained 100 ‘Gold Standard’ recommendations. These recommendations were composed by 16 high confidence CPG recommendations, 55 panel recommendations and 29 expert recommendations. The results of the comparison between the rapid strategies and the ‘Gold Standard’ are described in table 3.
The comparison between strategies 1 and 2 is described in online supplementary table 3.
Supplementary table 3
There were 13 recommendations that were judged as potentially misleading; the causes and possible solutions are summarised in online supplementary table 4 .
Supplementary table 4
The results of the present study suggest that a rapid question answering system based on the GRADE approach provided appropriate guidance in response to most questions. Although the proportion of concordant recommendations (same strength and direction between rapid strategies and GS) was 62.5%, most of the remainder (31% of the total) were classified as ‘reasonable disagreements’. Only 13 of the 200 recommendations were judged as potentially misleading and approximately half of those could possibly have been avoided with an appropriate use of the available tools (Epistemonikos matrix of evidence or SoF tables). We also analysed the results considering exclusively the direction of the recommendations. The results showed that almost all strong recommendations constructed with the rapid strategies shared the same ‘Gold Standard's’ direction while 70% of the weak recommendations did. This finding is not surprising given that weak recommendations are frequently based on low or very low quality of evidence, or are warranted in situations where benefits and risks are closely balanced, hence their direction is subjectively defined by weighting those aspects (eg, in a setting in which benefits and harms are balanced, some guideline panel members can interpret the results as favouring the intervention while others as favouring the comparison).22 23 25 Although 30% of weak recommendations had a different direction from the ‘Gold Standard's’, we consider that it is unlikely that they would have resulted in misleading guidance as those willing to use them should carefully analyse the fundamentals of the recommendation before deciding their course of action.22–24 An exception would be the situation in which the ‘Gold Standard’ recommendations were strong in the opposite direction, but this was captured in the primary analysis as those recommendations were classified as inappropriate. A third analysis in which we calculated rapid strategies’ and ‘Gold Standard's’ recommendation strength and direction agreement beyond chance, using weighted kappa, informed moderate to substantial agreement.28 As described for the former analysis (considering only the direction of recommendations), this approach also does not acknowledge the possibility of reasonable disagreement. Hence, it only reflects the capability of the rapid strategies to provide concordant recommendations (same direction and strength) with the ‘Gold Standard's’ which we believe is an overdemanding approach that underestimates the ability of the rapid strategies to provide adequate guidance.
The comparison between the different rapid answering strategies (PubMed vs Epistemonikos) showed that although the proportion of potentially misleading recommendations was small in both strategies, there was a slight (3%) absolute difference in favour of PubMed strategy. One possible explanation for the difference is that the investigators involved in the study were less familiarised with Epistemonikos database and search engine than PubMed's.
The main limitation of our study is that it is not possible to define a ‘Gold Standard’ recommendation for a medical question. We sought to provide trustworthy ‘Gold Standard’ recommendations by performing rigorous evidence searches, constructing detailed evidence summaries and including multiple clinicians trained in evidence-based decision-making; nevertheless, this approach does not guarantee optimal recommendations. In addition, the system was applied to a specific subgroup of questions (intervention-related questions that were not immediately answered). We consider that addressing questions that do not meet these criteria are less likely to change clinicians’ behaviour. Also this study was carried out in a singular context (clinicians trained in evidence-based decision-making with advanced understanding of the GRADE system). It is unknown to what extent the observed results can be replicated in different situations where clinicians are less familiarised with evidence-based medicine concepts.
Although investigators have previously undertaken evaluation of the implementation of question answering services,29–34 these studies focused on clinicians’ attitudes and decisions in response to the answers provided. As long as it remains uncertain that the answers the services provide are based on the best available evidence, and that clinicians interpret and use the provided information appropriately to make coherent decisions, the benefits of the implementation of these services to improve patient outcomes cannot be assumed.35 Another approach would be to directly measure the impact of answering clinicians’ questions on patients’ clinical important outcomes (ie, mortality or length of hospital stay). However, for these kinds of interventions that are designed to improve quality of care through affecting physician's behaviour, demonstrating such an effect could be very difficult (huge sample sizes needed, low signal-to-noise ratio).36–38 Attempts have been made in this direction and the results suggest possible benefits with the implementation of the evaluated interventions but the quality of evidence provided was low, either because of imprecision (underpowered studies)16–18 or because of risk of bias (non-randomised comparisons).39–41
We found only one study that considered the trustworthiness of the answers provided.42 In that study, the investigators inserted study evidence statements related to the management of clinical conditions for which high-quality RCTs or meta-analyses had unequivocally established benefits greater than risks, costs and inconvenience into hospital discharge letters. The study results showed a significant increase in general practitioner adherence to discharge medications demonstrating that, in optimal conditions (no time restrictions to perform evidence searches, high quality of evidence available), providing information to clinicians improve patient care. However, that optimal scenario is probably the exception as for most clinical questions high-quality evidence remains unavailable16 43 44 and clinicians usually need very prompt answers to their questions. Hence, ours is the first study to use a structured and objective approach to measure the quality of the information provided in a timely way to clinician-generated questions.
To achieve a medical practice consistent with what Ubbink et al described as evidence-based practice,45 clinicians need to be able to quickly obtain and accurately assess the best available evidence to answer their questions. Clinical practice guidelines endeavour to provide these answers at the point of care and, when rigorously developed and up to date, constitute optimal guidance. However, most of the available guidelines have methodological flaws and do not provide trustworthy recommendations.12 13 In the present study, 80% of the identified questions could be answered with recommendations included in CPG but only 20% of them were judged to be trustworthy.
Given current guideline limitations, if feasible and properly implemented, a question answering system could provide a solution. This study adds to our previous study in which we evaluated the impact of implementing a response system, similar to the one evaluated in the preset trial, on clinicians’ decisions.16 The results of that trial suggested that these kinds of interventions can influence clinicians’ courses of actions and therefore patient care.
The present study was developed in a real-life scenario with limited amount of resources, which suggest that the proposed intervention can possibly be implemented in a variety of settings, including a busy clinical ward. We were able to efficiently implement the proposed system with (1) one clinician trained in evidence-based decision-making exclusively dedicated to this task for at least 2 hours a day and (2) a computer with internet connection. We used a systematic and transparent method to arrive at decisions. Finally, we have developed a framework to compare different recommendations developed with the GRADE approach acknowledging that not every discrepancy should be considered inappropriate as different values and preferences may lead to reasonable disagreement between recommendations.
Implication for practice
Those interested in improving evidence use in healthcare decision-making should consider the implementation of systems as the one proposed in the present study. This would require, at least, one trained healthcare provider (informationist) who would (1) search for trustworthy published recommendations or, when not available, systematic reviews in Epistemonikos and/or PubMed; (2) use the Epistemonikos matrices of evidence tool and/or PubMed to identify additional information (not included in the selected systematic review); (3) construct a summary of findings table including all critical outcomes and (4) define a recommendation based on the identified trustworthy recommendations or the summary of findings tables (figure 2). We think that the cornerstone to successfully replicate the described process is practitioners training in evidence search, critical appraisal, summary and evidence to decision translation.
Implication for research
Investigators who addressed the clinical questions using the proposed strategies in the present study were highly trained in evidence-based decision-making and could possibly be classified as experts. Whether similar results could be obtained when those responsible for solving the identified questions are not experts remains uncertain.
A question answering service based on the GRADE approach proved feasible to implement and provided appropriate guidance for most identified questions. Our approach could help stakeholders in charge of managing resources and defining policies for patient care to improve evidence-based decision-making in an efficient and feasible manner.
Contributors AI significantly contributed to the conception and design of the work, and the acquisition,analysis and interpretation of data. JMC, FP, IN and GG significantly contributed to the design of the work, and the acquisition, analysis and interpretation of data. MAR, CC, CGM, MC, MM, MD and HNC significantly contributed to the acquisition and interpretation and of the data.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No available additional unpublished data.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.