Objectives Healthcare is a complex system, so quality improvement will commonly lead to unintended consequences which are rarely evaluated. In previous qualitative work, we proposed a framework for considering the range of these potential consequences, in terms of their desirability and the extent to which they were predictable or expected during planning. This paper elaborates on the previous findings, using consensus methods to examine what consequences should be identified, why and how to prioritise, evaluate and interpret all identified consequences, and what stakeholders should be involved throughout this process.
Design Two-round modified Delphi consensus study.
Setting and participants Both rounds were completed by 60 panellists from an academic, clinical or management background and experience in designing, implementing or evaluating quality improvement programmes.
Results Panellists agreed that trade-offs (expected undesirable consequences) and unpleasant surprises (unexpected undesirable consequences) should be actively considered. Measurement of harmful consequences for patients, and those with high workload or financial impact was prioritised, and their evaluation could also involve the use of qualitative methods. Clinical teams were agreed as important to involve at all stages, from identifying potential consequences, prioritising which of those to systematically evaluate, undertaking appropriate evaluation and interpreting the findings. Patients were necessary in identifying consequences, managers in identifying and prioritising, and improvement advisors in interpreting the data.
Conclusion There was consensus that a balanced approach to considering all the consequences of improvement can be achieved by carefully considering predictable trade-offs from the outset and deliberately pausing after implementation to identify any unexpected surprises and make an informed decision as to whether quantitative or qualitative evaluation is needed and feasible. Stakeholders’ roles in in the process of identifying, prioritising, evaluating and interpreting potential consequences should be explicitly addressed within planning and revisited during and after implementation.
- quality improvement
- consensus Study
- unintended consequences
- stakeholder engagement
- measurement of quality
- balanced approach
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- quality improvement
- consensus Study
- unintended consequences
- stakeholder engagement
- measurement of quality
- balanced approach
Strengths and limitations of this study
To the best of our knowledge, this will be the first study to generate Delphi-based expert consensus on the identification, prioritisation, evaluation and interpretation of a wide range of quality improvement consequences, an area that has been largely overlooked in the existing literature.
This study provides insights into how a balanced approach to determining all consequences of quality improvement projects can be achieved, the specific factors that need to be considered and the stage at which relevant stakeholders can be actively involved.
The Delphi panel was purposively selected with the majority of participants identifying themselves as academics, quality improvement advisors and providers of healthcare services across the UK, with experience of designing, implementing or evaluating quality improvement interventions.
Although the selection of experts was appropriate for the purpose of this study, the answers provided may not be appropriate for all practice settings, and therefore might limit the generalisability of our findings beyond the UK healthcare system.
The complexity of the healthcare system, along with the multiple pressures it faces, means that eﬀorts to improve quality and safety often achieve only limited beneﬁts and can have unintended consequences,1 (Manojlovich, 2016 #19; Merton, 1936, The Unanticipated Consequences of Purposive Social Action) which may impact positively or negatively on care processes and outcomes. However, several systematic reviews have shown that most papers evaluating quality improvement programmes mainly report impact on targeted goals, with minimal reporting of other unintended consequences.2 3 For example, only 1 of 121 interventions aiming to reduce falls and catheter-related infections,4 none of 34 studies of improvement interventions to improve surgical care,5 only 6 of 94 (6.4%) studies examining the application of Plan Do Study Act improvement methods6 and only 1 of 100 perioperative care improvement interventions reported any impact on unintended consequences.7
Furthermore, improvement projects rarely evaluate consequences identified after full implementation.3 A recent Cochrane review of interventions to improve antibiotic prescribing practices for hospital inpatients showed that only 11 (10%) of 110 studies reporting interrupted time series data of improvement interventions (which typically evaluated healthcare rather than research interventions) reported any data about unintended negative consequences.8 Overall, while there are a number of recommendations for systematically designing quality improvement interventions,9–11 there is a lack of evidence that improvement programmes routinely evaluate the presence of unintended consequences either preimplementation or postimplementation,12–15 with little specific guidance on how to best account for improvement consequences other than goals and what potential stakeholders should be consulted in planning, conducting and interpreting evaluations.
We have previously conducted a qualitative analysis of data from 15 semistructured interviews and 2 focus groups with 24 experts to explore the current understanding of unintended consequences of quality improvement.2 Based on the findings of this analysis, we proposed a structured framework for considering the range of potential consequences of improvement interventions, in terms of their desirability and the extent to which they were expected during the initial planning. As described in figure 1, the framework proposes that a balanced approach should consider goals (expected desirable consequences) and predictable trade-offs (expected undesirable consequences) early in the design of a quality improvement programme and pause to identify and take stock of pleasant (unexpected desirable consequences) and unpleasant surprises (unexpected undesirable consequences) after a period of implementation.
Framed by our previous qualitative work,2 3 this paper aims to:
Validate the previously developed framework and, through expert consensus, establish what potential quality improvement consequences should be identified.
Extend the framework wider applicability within the quality improvement measurement context, by exploring and developing consensus in relation to why and how to prioritise, evaluate and interpret all identified consequences and what stakeholders should be involved throughout this process.
The well-established consensus process incorporated a two-round modified Delphi method,16 which consisted of rating and ranking of the importance of various propositions whose focus and scope were determined through the framework developed in our previous qualitative study.2 The modified Delphi process was chosen as it is recommended for use in the field of quality improvement and patient safety as a reliable means of determining consensus for a defined multifactorial and complex problem. It is also useful in minimising the impact of group interaction and influence, as well as using valuable expert knowledge where understanding is only partial or incomplete.17
The development of the Delphi survey
Four key survey sections were defined as follows:
Section 1: Identifying potential consequences of improvement: Delphi panellists were asked what types of improvement consequences should be identified and who should be involved in identifying them.
Section 2: Prioritising which identified consequences to systematically evaluate: Panellists were asked under what circumstances should evaluation be conducted to assess and/or explore any identified potential consequences, and who should be involved in this decision.
Section 3: Undertaking appropriate evaluation for any identified consequences. Panellists were asked to rate statements about how to appropriately evaluate consequences of improvement, and who should be involved in this.
Section 4: Interpreting the emerging data: Panellists were asked to rate statements about who should be involved in understanding and interpreting findings to inform potential action.
Delphi survey piloting
Draft statements were initially pretested by a group of 10 clinical academics who commented on clarity and appropriateness, followed by two rounds of piloting by 14 additional participants with similar academic, clinical or management background as the targeted sample, using the same open access web platform (Bristol Online Surveys) as the main study. Pretesting and piloting led to some additional statements being proposed and added, and to refine the survey in terms of wording and sequencing.
Panel selection and recruitment
The main study panel members were purposively selected to be individuals with experience of designing, implementing or evaluating quality improvement interventions with an academic, clinical or management background. We generated a list of experts by including all the stakeholders approached for the original qualitative study,2 plus additional improvement advisors, clinical academics, providers of health and social care services, policy-makers and patient representatives identified from online searches of articles with the highest number of citations in two leading quality improvement journals, authors of policy documents setting the general direction of quality improvement, keynote speakers at relevant conferences in the field, members of established quality improvement partnerships, service users attending community groups to advise local health boards and the research team’s own networks. Additional Delphi invitees were identified through a snowballing technique,18 whereby contacted panellists proposed suitable others with similar experience and knowledge.
Consistent with consensus development procedures,17 we used sequential, iterative stages as follows.
Delphi round 1
Participants were invited to take part by a personalised email which included a weblink to the round 1 online questionnaire, a letter of invitation to participate, the study information sheet and a briefing paper explaining the development process and theoretical underpinnings (online supplementary appendix 1). Participants were asked to score statements using a five-point Likert scale with a neutral option, ranging from ‘not at all important’ to ‘extremely important’. The type of Likert scales normally varies from study to study; however, the five-point Likert scale has been the most consistently used as an acceptable compromise between the conflicting goals of offering enough choice to measure strength of opinion and designing items that are readily comprehensible to respondents.19 Furthermore, five-point Likert scales represent a valid and reliable mean of measuring different levels of item agreement or assigned importance across similar modified Delphi studies.20
Space for free-text comments about existing statements was also given in the questionnaire, including justification for response and/or any other important areas which were not addressed. The survey also included open-ended questions for structured elicitation and demographic information relating to participants’ role and experience.
Delphi round 2
Participants from round 1 were sent a revised briefing paper (online supplementary appendix 2) along with feedback on their scores on each statement compared with the distribution of all scores (online supplementary appendix 3). Given that there were no significant differences in scoring associated with any of the participant characteristics examined in round 1 (eg, geographical location, roles or experience), feedback was presented in a combined form. This is consistent with previous literature which shows that if agreement is already satisfactory between stakeholder groups, then the type of feedback given may not make any difference in terms of the number of outcomes retained or reducing the variability of opinion.21
Using the same voting method as round 1, participants were subsequently asked to complete a revised questionnaire (online supplementary appendix 4) which included notes to indicate what and where changes had been made in response to all round 1 comments. Example of changes included removing examples which were ambiguous and clarifying and amending the wording of some statements (online supplementary appendix 5). Participants were given 1 month to complete each round of the survey and a reminder letter was sent via email to everyone who had not replied within 14 days.
Data analysis and definition of consensus
There is no accepted, set standard for the target percentage of agreement, with thresholds and definitions of consensus ranging between 51% and 80%.22 23 We chose to take a conservative approach to defining consensus, deeming it to be present if ≥80% of participants rated each individual statement as very important or extremely important on the five-point Likert scale. All data entered via the web platform were downloaded and analysed using SPSS V.21.0 to calculate frequencies and mean ratings. The synthesis and thematic analysis of free-text responses was undertaken in NVivo V.11 following each round. After coding the initial sets of responses, MT and BG met to compare the labels attached and agreed on a set of codes that MT applied to all subsequent data. The wider team also convened regularly to discuss the summaries of any emerging findings. The focus of the analysis presented in the paper is the round 2 quantitative responses and supporting free-text findings.
We attempted to email 180 individuals, with 170 emails delivered successfully. Seventy-two (42.4%) individuals completed the round 1, and of these, 60 (83.3%) completed round 2. Table 1 shows that 50 (83.3%) of the 60 participants completing both rounds worked in the UK, 8 (13.3%) worked in another European country and 2 (3.3%) worked in the USA. Participants had a variety of (often multiple) roles with 28 (46.6%) having an academic background and 26 (43.3%) currently working as improvement advisors. Despite our best efforts to optimise recruitment and retention, only two panellists were providers of social care services (3.3%), two were patients or carers (3.3%) and one identified as a service user representative (1.6%).
Furthermore, more than half of the sample (53.4%) had 6 or more years’ experience of working in healthcare quality improvement while 31 (51.6%) reported to have undertaken formal training in this area, most commonly (16.6%) to become Institute for Healthcare Improvement accredited improvement advisors. On the contrary, 29 (48.3%) revealed little or no experience of systematically measuring unintended consequences.
Section 1: identifying potential consequences of improvement
Table 2 shows that all participants rated measurement of predefined improvement goals as very important or extremely important. There was also consensus that measures are important for identifying trade-offs expected before the implementation (95% rated this as important or very important) or unpleasant surprises emerging after a period of implementation (90%). Although some participants valued pleasant surprises as being important, ratings of this statement (70%) did not achieve the prespecified consensus level, with some participants describing pleasant surprises as being less critical than other consequences in reaching a more balanced approach.
Improvement needs to be judged on its merits alone and an unpleasant surprise detracts from those merits, but I am not sure whether a more pleasant surprise necessarily augments them. Simply measuring everything in sight just in case it had a positive influence is neither desirable nor feasible and probably not the best use of resources. (Improvement advisor)
There was consensus that clinicians and non-clinicians who directly engage with patients in the targeted area (100%), patients (83%) and managerial staff (80%) involved in organising the targeted care should be involved in identifying all potential consequences of improvement activity (table 3).
Section 2: prioritising which identified consequences to systematically evaluate
There was consensus that potential consequences should be measured if there was a likelihood for high (100%) and moderate harms (95%) to patients, followed by high (98%) or moderate (90%) negative workload implications for the service doing the improvement as well as high (95%) or moderate (85%) negative workload implications for other health and social care services. Consensus was also achieved for reasons including high (95%) and moderate negative financial implications (85%) for services within the area targeted for improvement and high (88%) financial implications for services outside healthcare.
There was consensus about the importance of potential high benefits to patients (90%), to the service doing the improvement (95%) and to other health or social care services (81%), but not for matching moderate benefits (61%–75%), reinforcing the view that the occurrence of trade-offs and unpleasant surprises is probably the current focus when making informed decision as to whether systematic evaluation is needed. No ‘low severity’ consequences reached the agreed consensus level although, as explained by one participant, the perception of severity can sometimes be a subjective assessment:
What might appear trivial to an outsider like a one min increase in the time taken for clinical staff to do something might be perceived by the clinical staff as considerably longer and possibly with knock on consequences for scheduling of other tasks. Similarly, what might appear to an outsider to be minor inconvenience for patients might be the last straw. (Policymaker and regulator)
Beyond effects on the quality of the service or the patient care, increasing staff engagement with the improvement activity (86%) and reducing staff resistance to change (85%) were additional reasons to evaluate outcomes, because it demonstrated that improvers were taking staff concerns seriously (table 4).
There was only consensus that clinical teams (96%) and managerial staff (83%) involved in delivering and organising the targeted care should be involved in prioritising whether the identified consequences are important enough to be evaluated systematically (table 3).
Section 3: undertaking appropriate evaluation for any identified consequences
There was consensus that, irrespective of whether data are collected bespoke or for another purpose, both quantitative (90%) and qualitative data (86%) could be used to evaluate trade-offs, pleasant and/or unpleasant surprises with the same rigour as evaluating the predefined improvement goals (table 5).
As one participant described, qualitative data have much to offer both for the identification of trade-offs before implementation, and supporting postimplementation reﬂection on surprises, especially when retrospective measurement is not feasible.
Numerical measures will be important for pre-identified consequences while qualitative data will be particularly important for identifying consequences that fall at the right-hand end of the expected-unexpected continuum. It is important to be curious and find a reliable mechanism for galvanising insights and stories that ultimately bring those conventional metrics alive (Improvement science academic)
There was only consensus that clinical teams delivering the targeted care (91%) should be involved in the implementation of appropriate evaluation for any identified consequences (table 3).
Section 4: interpreting the emerging data
There was consensus that both clinical teams delivering the targeted care (86%) and improvement advisors (91%) should be involved in interpreting the data about unintended consequences (table 3). Panellists explained in their free-text comments how making more use of external expertise in interpreting the data could help make findings more meaningful and readily useful.
There is a need for the methodological expertise and critical distance of improvement advisors who can see data with fresh eyes. It may be more robust to have them interpret the data initially and then discuss the findings with other groups (Provider of healthcare services)
Overview of the main findings
Overall, there was consensus in the Delphi panel about the importance of the majority of the propositions. All participants rated measurement of predefined improvement goals as important, and there was agreement that trade-offs and unpleasant surprises should be actively considered, but no consensus about pleasant surprises. Participants prioritised the evaluation of seriously harmful consequences for patients, and those with high workload or financial impact in both the local implementation context, and in other health and social care services. There was also consensus that evaluation of a wider range of consequences could have additional value in terms of increasing staff engagement with the improvement activity and reducing resistance to change irrespective of whether measurement led to any change in implementation. Participants agreed that both quantitative data and qualitative data were helpful to evaluate trade-offs and surprises, with free-text comments highlighting that qualitative data are often useful either to contextualise quantitative data or to understand impact when formal measurement is not feasible. Agreement about the importance of involving various internal and external stakeholders varied depending on the stage of the improvement work. Clinical teams delivering the targeted care were agreed to be necessary to involve in all stages from identifying potential consequences, prioritising which consequences to evaluate, undertaking appropriate evaluation and interpreting the data. Patients were necessary in identifying potential consequences of improvement activity, managers in identifying consequences and prioritising which of those have to be systematically evaluated, and improvement advisors in interpreting the emerging data.
Strengths and limitations of the study
Strengths of the study are that it built on our previous qualitative work on this topic,2 3 that we recruited and retained a substantial expert panel with an 83.3% response rate between rounds thereby reducing attrition bias, and that it used a predefined criterion to define agreement on the importance of a proposition. A potential weakness is that the round 1 questionnaire was structured by our qualitative work, meaning that there was less contribution from the Delphi panel in defining the scope of the propositions, although participants did have the opportunity to add, alter or comment on each section. Additionally, there are no generally accepted rules for how the presence of consensus should be defined, with several factors, such as the aim of research, number of respondents and sequence of rounds, influencing the cut-off chosen.22 23 Given the exploratory nature of the study, we deliberately chose to take a conservative approach to defining consensus, requiring ≥80% of participants to agree that proposition was very important or extremely important. An implication is that lack of consensus does not necessarily mean lack of importance, and such propositions may be relevant under some but not all circumstances. We, therefore, report all results in detail, and others may choose to consider different cut-offs suitable for their purposes.
Lastly, four-fifths of participants were UK based and two-thirds Scotland based. The UK has a well-developed quality improvement infrastructure which does vary somewhat in the different UK countries, so panel composition may limit the generalisability of the findings. However, participants came from a variety of quality improvement, health service and academic backgrounds, and we believe that the problems, findings and recommendations described in this paper are likely have general application.
Patient and public involvement
The priorities, experience and preferences of the people who used the local services were represented through their participation in the individual and group interviews that informed this study,2 and their willingness to take part in piloting draft versions of the instrument and completing both rounds of the Delphi. However, we recognise that the final expert panel predominately consisted of academics, quality improvement advisors and providers of healthcare services. There is a need to engage a larger number of participants from outside the immediate word of frontline led to quality improvement, particularly service users, public, third sector partners and social care providers. However, this does not mean that everyone will choose to be involved to the same extent, or indeed will be responsible for planning, monitoring or evaluating care. We instead suggest moving beyond this narrow and exclusive approach,24 and engage in a critical appraisal of the focus, methods and benefits of involvement, regardless of whether participants are using or providing services.
Implications for quality improvement programmes
The importance of balanced measurement systems is well established, with Drucker making the case for this 50 years ago, and encouraging improvers and managers to think broadly about what success constitutes for their organisation and hence what should be actually evaluated.25 Furthermore, many of the practical guides to healthcare quality improvement emphasise the importance of developing a balanced set of measures during the planning of an improvement programme,10 26–28 but the focus of such guides is generally on the measurement of goals,29–31 with a smaller number of measures for expected undesirable consequences (trade-offs) which are easily predictable from the outset. The evidence from multiple systematic reviews of quality improvement evaluations is that few report any measures of unintended consequences,4–7 consistent with almost half of our participants having little or no experience of using them, despite considerable quality improvement experience overall.
The findings of this consensus study reiterate and confirm the results of our previous work,2 3 and suggest that those involved in improvement programmes should first articulate clear assumptions and formulate explicit predictions for both goals and trade-offs before implementation, and seek to identify relevant process and outcome measures for both. Second, an ‘improvement pause’ should be planned after implementation to deliberately step back from goal delivery to take stock and reflect on unexpected consequences of improvement activity. Unpleasant surprises in particular need to be carefully evaluated to see if any harm being caused is enough to stop or adapt the intervention to reduce the likelihood of any unpleasant surprises both within and outwith the area targeted for improvement.
Improvers and managers could anticipate these vulnerabilities by making careful and continuously planned efforts to explore all possible process and/or outcome failures both before and after implementation and as ongoing surveillance mechanisms. However, all improvement programmes are resource constrained and there will always be more risks than can feasibly be measured. Moving beyond simply identifying potential consequences, improvers need to reflect on whether measurement is truly meaningful,32 make choices as to what identified consequences should be systematically evaluated and rationally account for the relative balance of risk and benefits. Based on the findings of this study, we suggest that this decision should be made by assessing if the depth, seriousness and severity of any trade-offs and unpleasant surprises in relation to patient care and other widespread workload and financial implications are likely to be so significant that they warrant particular attention to ensure these undesirable consequences are identified correctly and evaluated thoroughly, and if necessary action is taken to mitigate them.
Lack of data to measure unexpected consequences remains a significant problem. This is particularly common in healthcare systems where electronic health records are in limited use or where the usable routine data available have limited scope for use in evaluation. This is particularly common in healthcare systems where electronic health records are in limited use or where the usable data have limited scope. Furthermore, even where relevant quantitative data are available retrospectively it will rarely provide a full explanation for what happened within or outwith the healthcare system. Consistent with other literature,33 34 our participants agreed that qualitative data have an important role in evaluating surprises directly, as well as contextualising quantitative data where these are available. In the (often misquoted) words of Deming, ‘It is wrong to suppose that if you can’t measure it, you can’t manage it—a costly myth’.35
Finally, improvers should think more broadly about the stakeholders they involve as different levels of engagement can be appropriate for different stages in the evaluation process. There was a clear agreement in our study that the clinical teams delivering care should be involved throughout the whole process, but that other stakeholders’ importance varied with the stage of improvement. For instance, patients might have a unique perspective on care which is often invisible to most professionals but can usefully inform the identification of wider unintended consequences.32 36 37
Managerial staff directly involved in organising the targeted care who also understands the implications of changes on other parts of the system can play a significant role in identifying both intended and unintended consequences and deciding whether measurement is needed by aligning the focus on short-term external demands with internal priorities and long-term focus on quality improvement. However, without interpretation, measurement has little meaning and can be misleading, particularly when unpleasant surprises tend to be under-reported. Improvement advisors can, therefore, actively contribute to summarising and distilling the data, bringing a body of expertise in explicit change theories which are different from, but complementary to, the expertise of managers and clinicians.
The active involvement of other stakeholders (eg, academics, clinical teams outside the targeted area, third sector representatives and policy-makers) was perceived as relatively important but failed to reach the consensus standard, potentially being judged as ‘nice to have, but not always crucial’.38 This was particularly surprising in relation to academics involvement, whose perceived importance was only marginal despite the majority of our expert panel having academic links. This lack of consensus might reflect the recognised pros and cons of using a more rigorous, generalisable, but time-consuming research approach as opposed to small-scale, rapid and locally responsive quality improvement methods.30 What is important to reiterate, is that in practice, improvers will have to make decisions appropriate to their own context, but we recommend that they actively consider these findings when making situational judgements, and notably that the development of the research skills of local teams might help ensure academic input is viewed more favourably.39
Based on evidence and consensus opinions of diverse stakeholder community, we conclude that a balanced approach should consider goals and predictable trade-offs early in the design of a quality improvement programme, and subsequently pause to take stock of unpleasant surprises after a period of implementation. Evaluation should be done iteratively during the improvement journey and simultaneously with implementation, using both qualitative and quantitative methods. Vigilance for unexpected consequences should be an ongoing, active pursuit for all relevant stakeholders, whose roles in the process of identifying, prioritising, evaluating and interpreting potential consequences should be explicitly addressed within planning and, if required, revisited during and after implementation.
This work was undertaken by and on behalf of The Scottish Improvement Science Collaborating Centre (SISCC).
Contributors MT and BG were responsible for planning the study and led the data collection and analysis. TD and NMG contributed to data analysis. MT drafted and led the writing of the manuscript. BG, TD and NMG participated in critically appraising and revising the intellectual content of the manuscript. All authors read and approved the final manuscript.
Funding The Scottish Improvement Science Collaborating Centre (SISCC) is funded by the Scottish Funding Council (SFC), Chief Scientist’s Office, NHS Education for Scotland and The Health Foundation with in kind contributions from participating partner universities and health boards. The grant reference number is 242343290 was received from SFC on behalf of all funders.
Competing interests None declared.
Ethics approval The ethical approval for the study was granted by the University of Dundee Research Ethics Committee (UREC 15069).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Relevant data supporting the results reported in this paper have been included in the submission as online supplementary material. Other data will be available from the corresponding author on reasonable request.
Patient consent for publication Not required.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.