Objective To carry out a systematic review of the psychometric properties of international studies that have used the Hospital Survey on Patient Safety Culture (HSPSC).
Design Literature review and an analysis framework to review studies.
Setting Hospitals and other healthcare settings in North and South America, Europe, the Near East, the Middle East and the Far East.
Data sources A total of 62 studies and 67 datasets made up of journal papers, book chapters and PhD theses were included in the review.
Primary and secondary outcome measures Psychometric properties (eg, internal consistency) and sample characteristics (eg, country of use, participant job roles and changes made to the original version of the HSPSC).
Results Just over half (52%) of the studies in our sample reported internal reliabilities lower than 0.7 for at least six HSPSC dimensions. The dimensions ‘staffing’, ‘communication openness’, ‘non-punitive response to error’, ‘organisational learning’ and ‘overall perceptions of safety’ resulted in low internal consistencies in a majority of studies. The outcomes from assessing construct validity were reported in 60% of the studies. Most studies took place in a hospital setting (84%); the majority of survey participants (62%) were drawn from nursing and technical staff. Forty-two per cent of the studies did not state what modifications, if any, were made to the original US version of the instrument.
Conclusions While there is evidence of a growing worldwide trend in the use of the HSPSC, particularly within Europe and the Near/Middle East, our review underlines the need for caution in using the instrument. Future use of the HSPSC needs to be sensitive to the demands of care settings, the target population and other aspects of the national and local healthcare contexts. There is a need to develop guidelines covering procedures for using, adapting and translating the HSPSC, as well as reporting findings based on its use.
- patient safety culture
- international studies
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This is the first systematic review of a patient safety culture (PSC) instrument that compares the psychometric properties of a PSC instrument across a range of international studies.
Our findings cast some doubt over the value of using the Hospital Survey on Patient Safety Culture (HSPSC) without prior consideration of the value of adapting it to fit national and local contexts.
Our findings relate to only one PSC instrument (the HSPSC) and do not cover other instruments (eg, Safety Attitudes Questionnaire and Patient Safety Climate in Healthcare Organisations (PSCHO)).
The diversity in study methodology and reporting of studies using the HSPSC means that firm conclusions about the reliability and validity of the instrument are difficult to draw.
Over the course of the last few decades, the field of patient safety has expanded and evolved in a number of directions. Early work concentrated on understanding the extent and the types of human error, which were contributing to patient harm.1 2 More recently, the focus has shifted towards understanding the role played by human and organisational issues such as leadership, teamwork and communication in contributing to, as well as preventing, adverse incidents.3 4 One prominent organisational factor, safety culture, has in particular generated much interest and discussion.5–8 Use of the term ‘safety culture’ first came about as a result of the Chernobyl nuclear accident9 and has since been used as a way of understanding accidents in a wide variety of industries including aviation, oil and gas and most recently healthcare.10 11 Safety culture is typically taken to refer to ‘the attitudes, beliefs, perceptions and values that employees share in relation to safety in the workplace’.12 Safety culture as a construct is related to ‘safety climate’ and is typically associated with ‘the underlying assumptions and values that guide behaviour in organisations’, whereas safety climate focuses on ‘the direct perceptions of individuals’ of the underlying culture.13
Patient safety culture
Within healthcare, the design of survey instruments and other methods for measuring patient safety culture (PSC) has expanded considerably in the past few years.6 10 14 In Europe, for example, the European Network for Patient Safety Project15 identified 19 different survey instruments and other methods in use throughout the European Union (EU) member states until 2010. At that time, the most frequently used were the Hospital Survey on Patient Safety Culture (HSPSC,16 used in 12 EU member states), the Manchester Patient Safety Framework17 (used in 3 EU member states) and the Safety Attitudes Questionnaire18 (used in 4 EU member states). A similar review identified 26 studies within Europe, which had used the HSPSC instrument between January 2003 and February 2012.19 In the USA, the Middle East and the Far East, there is likewise evidence of a widening interest in measuring safety culture.20–24
Psychometric properties of PSC instruments
While increased efforts to measure PSC have been welcomed,4 6 others have suggested the need for a degree of caution. More than a decade ago, Flin et al argued in a series of articles25 26 that the psychometric properties of PSC instruments demonstrated poor levels of reliability and validity. More recently, Pronovost and Sexton27 warned that ‘the enthusiasm for measuring culture may be outpacing the science’. One explanation for low levels of reliability and validity may be the different interpretations of what constitutes a positive PSC among the diverse occupational groupings and subcultures involved in healthcare delivery.28–30 Previous work, for example, has shown that physicians report higher levels of PSC when there is good teamwork across units and a unit-level management that promotes PSC.31 Nursing staff, by contrast, are more likely to report higher levels of PSC safety when they feel that there is enough staff in the unit.31–33 In addition to these concerns, the psychometric properties of PSC instruments may be affected by other factors such as national culture.34 The extent to which healthcare workers are likely to report incidents and errors, for example, has been shown to be related to cultural norms such as fear of ‘losing face’ and the desire to avoid uncertainty and ambiguity,35 characteristics that are often associated with Eastern as compared with Western national cultures.36–38
Motivation for the review and aims
The use of a PSC instrument requires significant investment of time and other resources by healthcare providers. In addition, the outcomes from using the instrument often form the basis with which to identify areas for improvement and intervention within hospitals and other healthcare settings. In other cases, healthcare organisations sometimes use the findings to benchmark levels of PSC across units, departments and hospitals.18 It is therefore very important that these instruments demonstrate acceptable levels of reliability and validity when they are used.
Currently, there are no examples of systematic reviews of the psychometrics of PSC instruments. Hammer et al 19 published a review of the psychometrics of the HSPSC, but this was limited to Europe. The aim of this review was therefore to fill this gap in knowledge and to carry out a review of the international use of psychometric properties of one of the most well-known and widely used surveys on PSC, the HSPSC instrument.19 39 In addition, the review examined how the psychometric properties of the HSPSC varied across a range of different types of sample characteristics (eg, country of use, healthcare setting and participant job roles) in order to identify any trends or patterns in the data.
A literature search for publications that had used the HSPSC was conducted using the Scopus, Web of Science, PubMed and PsycINFO electronic databases. The literature search was initially carried out by the lead author and then checked and updated by the second author. The outcomes from subsequent updates of the literature search were checked by the other authors. The search terms used were ‘HSPSC’ and the following acronyms and terms that have been used to describe the survey, that is, ‘HSPSC’, ‘HSOPSC’, ‘HSOPS’, ‘Survey on Patient Safety Culture’ and ‘SOPS’. All of these search terms were used in conjunction with the Boolean operator ‘OR’. As the original report on the psychometric properties of the HSPSC was published in 2004, the search was limited to studies published between 2004 and July 2018. In addition, citations and references to other publications describing use of the HSPSC were consulted, as well as a list of national contacts provided by the statistical services company (Westat–Agency for Healthcare Research and Quality (AHRQ)40), which provides support to the AHRQ and their work with the HSPSC. Online supplementary file 1 provides details of the full search strategy for the Scopus database.
Inclusion and exclusion criteria
We included all internationals studies that published psychometric properties on healthcare professionals’ perception of PSC measured with the HSPSC. To be eligible for inclusion in the review, studies had to be published as journal papers, book chapters or reports that are in the public domain (eg, PhD theses) and written in the English language. A wide variety of study designs were included in the review (eg, studies comparing datasets from two or more countries or regions, studies conducted in one country). The original report16 outlining the details of the HSPSC and the process used to compile the survey was also included in the review.
Framework for analysis
An analysis framework (figure 1) was developed in order to systematically compare (1) psychometric properties of the HSPSC, including internal reliability and the use of factor analysis to assess construct validity; and (2) characteristics of use, including country of use, sample characteristics (setting, participant job roles, sample size and response rate) and adaptations to the original US version of the HSPSC16 (language, procedure for translation and other changes to the original version of the survey). The framework was guided by previous research examining psychometric properties of PSC surveys.19 26 41
Patient and public involvement
No patients or members of the general public were involved in this study.
The initial search yielded a total of 8604 studies of potential relevance from the four databases. Sixty-two studies met the inclusion criteria and reported psychometric properties on the HSPSC for a total of 67 datasets (one study used the survey in two different countries,35 two studies used the survey in two different languages in the same country42 43 and two studies used the same survey over two different time periods44 45). The screening and filtering process of the database results is shown in figure 2.
Table 1 shows the values of internal reliabilities using Cronbach’s α coefficient for 67 datasets from the 62 studies. An acceptable value of Cronbach’s α is typically taken as 0.70–0.9046–48; however, the first psychometric testing of the HSPSC16 considered levels of α≥0.60 as acceptable. Eleven studies did not report values of Cronbach’s α for one or more of the dimensions in the HSPSC. Of the remaining 769 values of Cronbach’s α given in table 1, there were 355 instances of HSPSC subscales (46.16% of all values) where the internal reliability reported was lower than a Cronbach’s α value of 0.70. The internal reliability for the dimension ‘staffing’ obtained an acceptable level (eg, Cronbach’s α score >0.70) for only 8 of the 62 datasets that reported this dimension. The dimension ‘frequency of events’ was above the level of acceptability for 61 of the 65 datasets that reported the internal reliability for this item. Thirty-five datasets reported Cronbach’s α values of <0.70 for 6 or more of the 12 dimensions in the HSPSC. Two datasets, from Lebanon21 and Turkey,49 reported values of α<0.70 for 10 dimensions, and another used in Oman50 reported values of α<0.70 for 11 dimensions.
Five dimensions demonstrated consistently weak internal reliability (ie, mean α<0.70) across the 67 HSPSC datasets. The dimension staffing was the weakest (mean α=0.56). This was followed by ‘overall perceptions of safety’ (43 datasets, where α<0.7, mean α=0.60), ‘organisational learning’ (47 datasets, where α<0.7, mean α=0.63), ‘communication openness’ (46 datasets where α<0.70, mean α=0.64) and ‘non-punitive response to error’ (37 datasets, where α<0.7, mean α=0.65).
Construct validity using factor analysis
Of the 62 articles, 37 (59.68%) tested construct validity using factor analysis. For a total of 16 articles, both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) had been conducted. For another 3 datasets, EFA had been carried out, and in 18, CFA had been conducted. EFA provides information about the optimal number of factors, whereas CFA determines how well the original 12 factorial structures developed by Sorra and Nieva16 or alternatives fit the dataset. The appropriateness of the CFA model was assessed by measures of global and local fit.46 47 51 The report by Sorra and Nieva16 is one of the 16 articles where both EFA and CFA had been conducted, resulting in a factorial structure of 12 factors. In 11 datasets, the original 12 HSPSC dimensions were confirmed. In six datasets, two factors were combined to create one. One dataset52 based on the CFA resulted in a good overall fit. Online supplementary file 2 provides further details of the use of factor analysis to assess the construct validity for our sample.
Country of use
The study sample consisted of a total of 30 different countries. Eleven studies reported the use of the survey in North America (n=8) and South America (n=3), 30 in Europe, 15 in the Near East and the Middle East, and 7 in the Far East. The growth of the use of the HSPSC from 2004 to 2018 across the world is shown in figure 3.
Setting and participant job roles
The setting within which the survey was used was predominantly hospitals, with 47 of the 62 published studies (75.81%) using the survey in acute general, public, private, psychiatric, community, and secondary and tertiary hospitals. Of the remaining 15 studies, 3 used the survey in long-term care settings. Four studies sampled specific job roles (eg, nursing staff and medical directors of hospitals). The remaining eight studies used the survey within different specialist hospital units or areas (eg, emergency, critical care transport and neonatal intensive care). The job roles and clinical backgrounds of participants across the studies (table 2) indicated that the majority of participants worked as nursing staff or healthcare assistants (mean percentage=50.77%±13.05% across the studies), followed by medical and technical staff (mean percentage=11.67%±7.73%), physicians (mean percentage=9.56%±6.44%), and managers and administrative staff (mean percentage=7.47%±6.37%).
Sample size, response rate and adaptations to the original version of the HSPSC
Sample sizes varied across the studies. In total, data were reported from 408 563 respondents from 1119 healthcare organisations.16 Site sample size ranged from 331 hospitals to studies carried out within a single site. Most studies (85.48%) did not identify the number of hospital units in which respondents worked. Sample sizes ranged from 51 to 111 478 individuals, with a median of 1026. Response rates ranged from 17% to 98% (mean=38.94%±29.54%). Of the 62 studies, the HSPSC instrument was applied in 20 different languages. Vlayen et al 42 43 used a French and a Dutch translation. One study35 used the HSPSC in Japan and Taiwan, translating the survey into Japanese and Mandarin. From the 67 datasets, 16 obtained results through the use of an English version of the survey. Another 27 datasets required the HSPSC to be translated into a language other than English using several different translation procedures, and 24 datasets used the survey translated in previous studies. Two datasets used the original AHRQ translation guidelines. Fourteen datasets reported using forward–backward translation. Twelve datasets reviewed and verified the translation using an independent expert group. Five datasets made no modifications to the original version of the instrument, and 28 datasets (41.79%) did not state whether modifications were made. Online supplementary files 3 and 4 provide details of the sample sizes, response rates and adaptations to the HSPSC in the 67 datasets.
Implications of the review for users of the HSPSC
A clear implication of our review is that researchers and healthcare practitioners should adopt a degree of caution when using the original, unmodified version of the HSPSC16 to measure PSC. Nearly 50% of items from the 12 HSPSC dimensions in our sample demonstrate weak levels of reliability. In some cases, the internal reliabilities of specific dimensions (eg, staffing and overall perceptions of safety) were very weak. By contrast, the internal reliabilities for other dimensions (eg, ‘teamwork across/within units’) were stronger. While most studies (59.68%) that used CFA found a fit with the original 12-factor model,16 inconsistencies in carrying out assessments of construct validity (eg, testing EFA models using CFA and carrying out cross validation) make it very difficult to draw firm conclusions about the validity of the HSPSC. This difficulty is compounded by the fact that only 43% of the datasets in our sample report the outcomes from using factor analysis to assess construct validity. It has been 15 years since the original report describing the HSPSC instrument was released by AHRQ, and one recommendation from our review is that many of the instrument’s dimensions and items need to be revised. Further development and revision of the HSPSC might result in improved psychometric properties and increase its suitability for use across a range of international settings and contexts.
The need for guidance covering adaptations and other changes to the HSPSC
Just under a quarter of the studies in our sample made changes to the original survey items and/or dimensions. Over a third of the sample did not state what modifications, if any, were made to the original US version of the survey. It is well known that changing items and dimensions, alongside other variations in procedure (eg, procedures used to translate the HSPSC and reverse scoring questions), are likely to impact on the psychometric properties of the instrument.53 54 Alterations to the HSPSC also make it difficult to obtain valid measurements and to carry out cross-national comparisons. We would recommend scientific support and guidance for users of the HSPSC covering necessary adaptations in survey items to meet language requirements of different national, regional and healthcare contexts.55 56 Some guidelines already exist but are limited in coverage (eg, focusing solely on translation57) and could be expanded to consider, among other things, advice regarding changes to items and dimensions and the reporting of findings. In addition, the development of guidelines covering the reporting of data from PSC surveys might make it easier for healthcare managers and decision makers to benchmark data within their organisations in order to promote organisational learning processes.
Educational support, resources and the development of standards
We also found a lot of variation across the studies with regard to the reporting of psychometric data, particularly the outcomes of using factor analysis. While recognising that this is something of a ‘grey area’ (eg, agreement with regard to CFA fit indices) and that the analysis and interpretation of psychometric properties require some skills and knowledge, there is room for further improvement in the future, particularly in terms of the provision of educational resources and the development of standards and guidance covering criteria and levels of acceptability for psychometric data. One way forward may be to work towards a set of networks for sharing, comparing and benchmarking the results from using the HSPSC and other PSC instruments and tools. Some of these networks already exist (eg, the web resources provided for the HSPSC by AHRQ) but may be expanded in the same way that other types of collaborative networks and ‘observatories’ work in other domains of safety (eg, road safety58) to include protocols and standards for data collection, analysis and reporting,55 as well as other educational and training resources.
Implications of the review for researchers
The field of PSC is relatively new in comparison to the range of other scientific disciplines that have examined safety and organisational culture over the last 30 or so years (eg, safety science, human factors and organisational psychology59 60). Likewise, the use of safety culture instruments is well established in a variety of industrial sectors outside of healthcare (eg, nuclear energy, aviation and construction). There is much that could be learnt both in terms of theory and methodology from this large body of knowledge and experience. We would point to three specific areas for the future. First, there is a need to understand the influence of a range of other influences on PSC, including the role played by national culture and professional subculture. Wagner et al,61 for example, found clear differences between the Netherlands, the USA and Taiwan in terms of responses to some HSPSC dimensions (eg, communication openness and non-punitive response to error). The large variability in terms of the reliability and validity of the HSPSC across our sample might be due to the influence of professional groups and other subcultures within healthcare.30 62 Many HSPSC items may be very differently interpreted and may elicit very different opinions among different healthcare groups (eg, nursing staff, physicians and managers31 63).
Second, there is a need to expand the range of theoretical constructs that influence and shape PSC. Research on, for example, the relationship between ‘speaking up’ and ‘voice’, levels of management control and PSC is starting to appear64 65; however, there is potential for expansion. Similarly, research on the extent of agreement on levels of PSC among healthcare units and groups (‘climate strength’56) is starting to appear. A range of other constructs influencing PSC, which are well established in other disciplines (eg, organisational psychology59), including leadership styles and organisational justice, might also form part of an agenda for future research. Finally, we know very little about the levels of change in PSC over time, how these levels may decline, improve or stagnate, and what triggers these changes.
Thirdly, we would point to the value of using mixed methods to measure PSC as a means of addressing and taking into account the important role played by diverse social, cultural and subcultural contexts within different healthcare settings. Combining survey data with data drawn from interviews or focus groups, for example, is currently under-represented in the literature on PSC,26 despite being relatively common in studies of PSC in other safety-critical industries (eg, aviation, where interviews and focus groups are often used as a means to check and validate other data, including those from survey instruments66).
Limitations of the review
Our review is the first to systematically examine the psychometric properties of a PSC instrument across a broad range of international contexts; however, it focuses on only one of several instruments currently in use (eg, refs 18 67). We would point to the need for similar systematic reviews of the psychometric properties of these instruments to be carried out in the future. Finally, the diverse ways in which studies in our sample reported details of the procedure, which was used to administer the instrument, as well as how they reported findings from its use, made it difficult to derive firm conclusions about the reliability and validity of the HSPSC.
We thank Professor Paula Griffiths (School of Sport, Exercise and Health Sciences, Loughborough University) for her comments on an earlier draft of the paper. We also thank the reviewers for BMJ Open for their helpful comments on an earlier version of this paper.
Contributors PW was responsible for the conception and design of the study. The paper was primarily written by PW. Data collection and analysis were initially carried out by PW and subsequently completed by E-MC. TM and AH commented on successive drafts of the paper and suggested areas for revision and improvement.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.