Article Text
Abstract
Objectives To identify ML tools in hospital settings and how they were implemented to inform decision-making for patient care through a scoping review. We investigated the following research questions: What ML interventions have been used to inform decision-making for patient care in hospital settings? What strategies have been used to implement these ML interventions?
Design A scoping review was undertaken. MEDLINE, Embase, Cochrane Central Register of Controlled Trials (CENTRAL) and the Cochrane Database of Systematic Reviews (CDSR) were searched from 2009 until June 2021. Two reviewers screened titles and abstracts, full-text articles, and charted data independently. Conflicts were resolved by another reviewer. Data were summarised descriptively using simple content analysis.
Setting Hospital setting.
Participant Any type of clinician caring for any type of patient.
Intervention Machine learning tools used by clinicians to inform decision-making for patient care, such as AI-based computerised decision support systems or “‘model-based’” decision support systems.
Primary and secondary outcome measures Patient and study characteristics, as well as intervention characteristics including the type of machine learning tool, implementation strategies, target population. Equity issues were examined with PROGRESS-PLUS criteria.
Results After screening 17 386 citations and 3474 full-text articles, 20 unique studies and 1 companion report were included. The included articles totalled 82 656 patients and 915 clinicians. Seven studies reported gender and four studies reported PROGRESS-PLUS criteria (race, health insurance, rural/urban). Common implementation strategies for the tools were clinician reminders that integrated ML predictions (44.4%), facilitated relay of clinical information (17.8%) and staff education (15.6%). Common barriers to successful implementation of ML tools were time (11.1%) and reliability (11.1%), and common facilitators were time/efficiency (13.6%) and perceived usefulness (13.6%).
Conclusions We found limited evidence related to the implementation of ML tools to assist clinicians with patient healthcare decisions in hospital settings. Future research should examine other approaches to integrating ML into hospital clinician decisions related to patient care, and report on PROGRESS-PLUS items.
Funding Canadian Institutes of Health Research (CIHR) Foundation grant awarded to SES and the CIHR Strategy for Patient Oriented-Research Initiative (GSR-154442).
Scoping review registration https://osf.io/e2mna.
- BIOTECHNOLOGY & BIOINFORMATICS
- Information technology
- PUBLIC HEALTH
- PRIMARY CARE
Data availability statement
All data relevant to the study are included in the article or uploaded as supplemental information.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
STRENGTHS AND LIMITATIONS OF THIS STUDY
To our knowledge, this will be the first scoping review on the implementation of machine learning tools to inform decision-making for patient care in hospital settings.
Our search was limited to 2009 onwards; however, this allowed us to capture more recent/relevant machine learning tools.
A comprehensive search in multiple databases along with evidence found in published and grey literature sources allowed us to extensively map the evidence on the implementation of machine learning tools to inform decision-making for patient care in hospital settings.
Due to the large scope of the original scoping review question, we had to limit it to machine learning versus all artificial intelligence tools and inpatient hospital settings versus all settings.
In the coding of the interventions, we excluded any implementation strategies that were employed before the start of the study (not part of the machine learning intervention).
Background
Artificial intelligence (AI) techniques have gained popularity within healthcare in recent years.1–4 AI techniques consist of automated systems requiring ‘intelligence’ to perform tasks. Machine learning is an AI method that ‘refers to the process of developing systems with the ability to learn from and make predictions using data’.5 6 The use of AI in healthcare can transform clinical practice by providing aid to clinicians when interpreting data that are complex and diverse, allowing for support in clinical decision-making.
There are various ways that machine learning can be used to support clinical decision-making. One way is to assist with clinical tasks related to assessing, managing, and evaluating clinical issues and procedures.7 Another way is to assist with epidemiological tasks, such as predicting the health needs and outcomes of specific people.7 Machine learning can also be used for clinical administrative tasks.
Several systematic reviews have examined the use of machine learning for clinical decision-making focused on specific tasks, such as stroke and risk stratification,1 preterm birth prediction,2 and predicting radiation-induced neurocognitive decline.8 However, much less focus has been on how to implement machine learning methods in hospital settings. For example, a recent scoping review found very few clinical decision support systems that used machine learning were implemented in the hospital setting.9 This is imperative, as the successful translation of machine learning into hospital systems practice may help to improve the performance of clinical decisions related to diagnosis, prognostics and management, while saving time and improving patient outcomes.
A recent scoping review was identified that examined prognostic machine learning algorithms in paediatric chronic respiratory conditions.10 Twenty-five studies were included and only two were implemented in a clinical setting. Furthermore, none of the included studies explicitly reported results pertaining to implementation of the machine learning algorithms.
Implementation science and practice ensures that research results are transferred and used by key knowledge users11 and has the potential to reduce research waste.12 An examination of the effectiveness of implemented machine learning tools is required to understand which (if any) machine learning tools have been successfully implemented in hospital settings to support decisions related to patient care and how (if at all) implementation science strategies were used to implement those machine learning tools. This examination will facilitate appropriate AI use, as well as enhance return on investment for machine learning tools. We aimed to determine strategies that have been used to implement machine learning tools to inform decision-making for patient care in hospital settings through a scoping review.
Methods
Patient and public involvement
There was no patient or public involvement in this research.
Protocol and registration
The protocol for this scoping review was registered with Open Science Framework13 and developed in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) Statement for protocols.14 The JBI (formerly Joanna Briggs Institute) guidance for scoping reviews15 informed the conduct of this scoping review. The knowledge users on the team included the Physician-in-Chief at St Michael’s Hospital (SES) and former Vice President responsible for health at the Vector Institute (PAP). The knowledge users were engaged in all aspects of the review conduct. The results are reported using the PRISMA extension for scoping reviews,16 supplemented by the updated PRISMA 2020 statement.17
Search strategy and selection criteria
Search strategy
An experienced librarian (JM) developed a comprehensive literature search strategy, which was peer reviewed by a second information specialist using the Peer Review of Electronic Search Strategies checklist.18
The following databases were searched: MEDLINE (2009–7 June 2021), Embase (2009–7 June 2021), CENTRAL (2009–7 June 2019) and the Cochrane Database of Systematic Reviews (2005–7 June 2019). A broad approach to the search question was taken to include AI terms broader than machine learning. The search was translated from the primary database MEDLINE to the other databases. All search strategies can be found in online supplemental appendix 1. Grey literature (i.e., unpublished or difficult-to-locate studies) was searched using guidance from the Canadian Agency for Drugs and Technologies in Health’s Grey Matters checklist.19 All sources for grey literature searching are available in online supplemental appendix 2. The references of all included studies and relevant reviews were scanned to identify additional potentially relevant studies for inclusion.
Supplemental material
Eligibility criteria
The eligibility criteria are outlined according to the Population, Concept, Context mnemonic,15 as follows:
Population
Any type of clinician caring for any type of patient.
Concept
Machine learning tools used by hospital clinicians to inform decision-making for patient care, such as AI-based computerised decision support systems or ‘model-based’ decision support systems were included.20 Machine learning was defined as methods using mathematical operations to process input data, resulting in a prediction.6 Machine learning used for decision support was defined as algorithms used to provide some form of input into human decision-making.8 Machine learning tools used for automation without any input from the clinician were excluded. For example, machine learning tools used to predict patients at higher risk of a particular outcome without any decisions or interventions required by the clinician were excluded. Machine learning tools used for robotics (e.g., robotic surgery), interpretation of imaging such as the CT scan (if not being used to inform decision-making; for example, improving accuracy of an algorithm), medical devices (e.g., a device that monitors glucose levels and administers insulin automatically) and automatic transcribing of a clinical note for medical records were excluded. For studies that reported the use of a clinical decision support tool but did not explicitly report the use of machine learning, additional citations were identified by scanning the references to verify that machine learning was indeed used. All validation studies developing, testing or validating a machine learning model with hospital data were excluded if the model was not implemented for patient care decision-making. Only decision support tools using machine learning for clinical tasks defined as ‘tasks generally performed by qualified healthcare providers related to the assessment, intervention and evaluation of health-related issues and procedures’ or epidemiological tasks specified as ‘tasks related to more accurately identifying the health needs and outcomes of people within a given population’ were included.8 Machine learning used for operational tasks defined as ‘tasks related to activities that are ancillary to clinical tasks but necessary or valuable in the delivery of services (generally more administrative)’ was excluded.8 Studies that did not report implementation strategies for the machine learning interventions that were used for patient care were excluded.
Context
Only hospital settings were included. If a study was conducted outside of the hospital but used inpatient hospital data, then it was included.
Other criteria
Eligible study designs were primary research studies of experimental (e.g., randomised controlled trials, non-randomised controlled trials), quasi-experimental (e.g., controlled before and after studies, interrupted time series), observational (e.g., cohort studies, case-control studies, cross-sectional studies), qualitative (e.g., phenomenological, ethnography, qualitative interview) and mixed-methods (e.g., convergent parallel, embedded, explanatory sequential) design, with or without a comparator group. Studies published before 2009 were excluded to focus on the most recent evidence, along with non-English studies to increase feasibility for this project. No restrictions based on study duration were applied.
Study selection
The eligibility criteria were pilot tested by the team using 50 unique citations until >60% agreement was achieved (four training exercises with 18% agreement, 58% agreement, 30% agreement and 64% agreement). Agreement was calculated by taking the percentage of the responses of all team members to the 50 unique citations. Subsequently, the remaining titles and abstracts were screened independently by reviewers (AH, AP, VN, CH, OF, SMT, MG) working in pairs. For full-text screening, pilot testing on 20 studies was completed with 20% agreement observed across the team. The criteria were clarified, and the full-text articles were screened independently by two reviewers (AH, AP, VN, CH, OF, SMT, MG) working in pairs. All discrepancies were resolved by a third reviewer to ensure inter-rater reliability and quality checking of the screening that was completed. A clinician (SES) and methodologist (ACT) confirmed the final eligibility of all included studies. The full screening criteria for level 1 and level 2 screening are provided in online supplemental appendices 3 and 4.
Data charting
A standardised charting form was developed to chart study characteristics, population characteristics, intervention characteristics and type of outcome measure from each included article. Equity issues were abstracted using the PROGRESS-PLUS criteria.21 The specific outcome results were not abstracted, as recommended in the JBI guide.15 A pilot study was conducted with the team prior to charting until sufficient agreement (>60%) was achieved. All data were charted independently by two reviewers (AP, VN, OF, MG) working in pairs. All discrepancies were resolved by a third reviewer to ensure inter-rater reliability and quality checking of the screening that was completed. The data charting form is provided in online supplemental appendix 5.
Risk of bias appraisal
As recommended in the JBI guide,15 a risk of bias appraisal was not conducted.
Analysis and presentation of results
All findings are summarised descriptively using summary tables, figures and appendices. To code the implementation strategies, the modified Effective Practice and Organisation of Care (EPOC) classification was used22; descriptions of categories can be found in online supplemental appendix 6. A clinician (SES) and methodologist (ACT) coded all included studies using the EPOC classification independently. Implementation barriers and facilitators were coded by one reviewer (VN) using a pre-existing framework.23 For patient, clinician and system-level outcomes, a pre-existing framework24 25 was also used by one reviewer (VN). All coding was conducted using simple content analysis.
Role of the funding source
The study sponsor had no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
Results
Literature search results
After screening 15 306 citations from the database search and 2080 citations from the grey literature search, 3474 full-text articles were obtained and screened for inclusion. Subsequently, 20 unique studies and 1 companion report26 were included (figure 1). One included study was only available as a conference abstract.27 A list of the studies that were very close to fulfilling the eligibility criteria but were eventually excluded is provided in online supplemental appendix 7.
Patient characteristics
The 20 studies included a total of 82 656 patients, with an average of 5514 patients per study (table 1 and online supplemental appendix 8). The studies included a total of 915 clinicians, with an average of 153 clinicians per study. Almost half of the patients were female (48.8%), whereas most clinicians were female (64.4%). Among studies that reported age, patients were mostly 62–72 years of age (n=3, 15.0%); this information was not reported in 60.0% of the studies. Most studies included adults at risk of infection (n=4, 20.0%; online supplemental appendix 8). Most studies did not report comorbidities (n=18, 47.4%) and when they did, the most frequent were infection (n=5, 13.2%) and congestive heart disease (n=3, 7.9%). The type of clinician involved in implementing machine learning tools was most commonly a physician (n=3, 30.0%) or nurse (n=3, 30.0%). Regarding the PROGRESS-PLUS criteria (online supplemental appendix 9), seven studies reported patient or clinician sex27–33 and four studies reported additional PROGRESS-PLUS criteria (race, type of health insurance, rural/urban).27 28 34 35 The demographic variables were only calculated for studies that reported them.
Study characteristics
The 20 included studies were published between 2009 and 2021, with 75.0% published 2017 onwards (table 2 and online supplemental appendix 10). Of the 20 included studies, 16 were applying algorithms which had already been trained and validated to an intervention, and 4 studies trained, validated and applied algorithm(s) as part of the same paper. North America (n=10, 50.0%) followed by Europe (n=6, 30%) were the most common continents. Most studies were cohort studies (n=8, 40.0%) followed by randomised trials (n=6, 30.0%). Most studies were conducted within 1 year (n=10, 50.0%). All studies were based on inpatient hospital data with the majority based out of a single hospital (n=14, 70.0%).
Intervention characteristics
The machine learning tools used were supervised learning (n=14, 70.0%), unsupervised learning (n=2, 10.0%) and deep learning (n=4, 20.0%) (table 3 and online supplemental appendix 11). All studies reported implementation strategies with the most common being clinician reminders (n=20, 44.4%), facilitated relay of clinical information (n=8, 17.8%) and staff education (n=7, 15.6%) (table 3 and online supplemental appendix 12). The target population of implementation strategies was most commonly healthcare providers (n=40, 88.9%).
Outcome characteristics
The outcomes reported were at the clinician level in 12 studies (table 4 and online supplemental appendix 13), patient level in 10 studies (online supplemental appendix 14) and health system level in 10 studies (online supplemental appendix 15). At the clinician level, the majority of the outcomes were focused on perception and satisfaction of clinicians (n=44, 71.0%), whereas at the patient level, they were mostly focused on physiological or clinical (n=46, 71.9%) outcomes. At the health system level, the most common outcomes focused on delivery of care (n=18, 45.0%). The most commonly reported implementation barriers were that the machine learning tool was perceived as being time consuming (n=3, 11.1%) and unreliable (n=3, 11.1%). In contrast, the most commonly reported implementation facilitators were that it improved time/efficiency (n=3, 13.6%) and was perceived as being useful (n=3, 13.6%).
Discussion
We conducted a comprehensive scoping review on implementation strategies for machine learning tools within hospital settings. We identified only 20 studies that fulfilled our eligibility criteria. All examined implementation strategies. However, only 10 studies reported on the barriers and 14 studies reported on the facilitators to implementing machine learning tools. Across the studies, the most common implementation strategies were clinician reminders, facilitated relay of clinical information and staff education. The most common barriers were that the machine learning tool took time to use and was perceived as unreliable, whereas the most common facilitators were that the tool improved time or efficiency and was perceived as useful.
Our results identified several gaps in the literature. Most of the studies were conducted in high-income countries. This is understandable as machine learning tools are expensive to develop. The majority of the studies were focused on adults at risk of infection. Most of the studies reported the use of supervised machine learning tools. Few studies examined intervention strategies that have been found to be effective, such as audit and feedback.36 A recent scoping review confirms our results with no studies reporting implementation strategies for machine learning algorithms in paediatric chronic respiratory conditions.11
Only nine27–35 of the included studies provided data on any of the PROGRESS-PLUS criteria; most of the studies examined outcomes at the patient level. Examining equity in the use of AI tools is important, as some evidence suggests that machine learning algorithms can increase bias within health37–39 through propagation of existing racial discrepancies and inequalities in socioeconomic status, gender, religion, sexual orientation or disability. This in turn further exacerbates health inequities. Future research should examine equity in relation to machine learning tools.
AI is a multibillion dollar industry, with substantial investment within health.40 Our results suggest that very few studies are examining best strategies to implement these AI tools. Future primary research on the implementation of machine learning tool in hospital settings should be conducted to broaden the evidence base, including the effective implementation interventions on acceptability, appropriateness and feasibility. A future systematic review can be conducted to examine the effectiveness of the various implementation strategies to optimise the use of AI in health. This will ensure that this enormous investment is not wasted. It will also facilitate appropriate AI use within health. It is important to note that the European Union has proposed an Artificial Intelligence Act,41 which will likely regulate sociotechnical system, and may have implications on AI implementation in healthcare settings.
Limitations
Our scoping review has limitations worth noting. Due to the large scope of the original scoping review question, we had to limit it to machine learning versus all AI tools and inpatient hospital settings versus all settings (online supplemental appendix 16). Additionally, we limited our search to 2009 onwards; however, this allowed us to capture more recent/relevant machine learning tools as technology has advanced in the last two decades as compared with older studies. We also limited our search to the English language, and studies conducted in countries where English is not the first language could have been excluded. Another potential limitation is that any implementation strategies that were employed before the start of the study (not part of the machine learning intervention) were not included in the coding of interventions.
Conclusions
In conclusion, there is a lack of evidence on implementation strategies used for machine learning tools in hospital settings. This is an urgent area of research prioritisation, given the millions of dollars invested in AI technologies within health. Future studies can report on inequities using the PROGRESS-PLUS framework. A systematic review identifying effective implementation strategies of machine learning tools to inform decision-making for patient care within hospitals would be very useful for future implementation efforts.
Data availability statement
All data relevant to the study are included in the article or uploaded as supplemental information.
Ethics statements
Patient consent for publication
Ethics approval
Ethics approval is not applicable, as this is a scoping review of pre-existing studies.
Acknowledgments
We thank Dr Muhammad Mamdani for providing feedback on our protocol and screening criteria, Raman Brar, Chantal Williams, Elizabeth McCarron, Jane Pearson Sharpe, and Naveeta Ramkissoon for screening citations or full-text articles, Nazia Darvesh for assisting with the protocol and screening questions, Alissa Epworth for executing the literature searches, Raymond Daniel for searching for errata and retractions, and Faryal Khan for formatting the manuscript and creating the EndNote library.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Contributors ACT interpreted the results, drafted the manuscript, and provided methodological and technical expertise. AH coordinated the review, screened citations and full-text articles, cleaned and prepared the results, revised and edited the manuscript. AP supported coordination of the review, screened citations and full-text articles, and charted data. VN, CH, OF and MG screened citations and full-text articles, and charted data. VN coded data on implementation and outcomes. SMT worked on the protocol and screened studies. JM created the literature and grey literature search strategies. PAP and SES helped conceive the study, provided methodological and content expertise throughout the project. All authors confirm that they had full access to all the data in the study and accept responsibility to submit for publication. ACT is responsible for the overall content as the guarantor and accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.
Funding This project was funded by the CIHR Foundation grant awarded to SES (grant number not available) and the CIHR SPOR initiative (GSR-154442). ACT is funded by a Tier 2 Canada Research Chair in Knowledge Synthesis (17-0126-AWA). OF was in part supported by the Health Research Board (Ireland) and the HSC Public Health Agency (CBES-2018-001) through Evidence Synthesis Ireland and Cochrane Ireland. SES is funded by a Tier 1 Canada Research Chair in Knowledge Translation (17-0245-SUB).
Disclaimer The funder of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.