Article Text


Measuring a caring culture in hospitals: a systematic review of instruments
  1. G Hesselink1,
  2. E Kuis2,
  3. M Pijnenburg1,
  4. H Wollersheim1
  1. 1Scientific Institute for Quality of Healthcare (IQ healthcare), Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
  2. 2University of Humanistic Studies, Utrecht, The Netherlands
  1. Correspondence to Dr Gijs Hesselink; G.Hesselink{at}


Objective To identify instruments or components of instruments that aim to measure aspects of a caring culture-shared beliefs, norms and values that direct professionals and managers to act caring in hospitals, and to evaluate their psychometric properties.

Design Systematic review.

Data sources PubMed, CINAHL, EMBASE, PsychInfo, Web of Science and the International bibliography of the Social Sciences.

Study selection Peer-reviewed articles describing (components of) instruments measuring aspects of a caring culture in a hospital setting. Studies had to report psychometric data regarding the reliability or validity of the instrument. Potentially useful instruments that were identified after the title and abstract scan were assessed on relevance by an expert panel (n=12) using the RAND-modified Delphi procedure.

Results Of the 6399 references identified, 75 were examined in detail. 7 studies each covering a unique instrument met our inclusion criteria. On average, 24% of the instrument's items were considered relevant for measuring aspects of the hospital's caring culture. Studies showed moderate-to-high validity and reliability scores. Validity was addressed for 6 of the 7 instruments. Face, content (90%) and construct (60%) validity were the most frequently reported psychometric properties described. One study (14%) reported discriminant validity of the instrument. Reliability data were available for all of the instruments. Internal consistency was the most frequently reported psychometric property for the instruments and demonstrated by: a Cronbach's α coefficient (80%), subscale intercorrelations (60%), and item–total correlations (40%).

Conclusions The ultimate standard for measuring a caring culture in hospitals does not exist. Existing instruments provide partial coverage and lack information on discriminant validity, responsiveness and feasibility. Characteristics of the instruments included in this review could provide useful input for the design of a reliable and valid instrument for measuring a caring culture in hospitals.

Statistics from

Article summary

Strengths and limitations of this study

  • We have reviewed an extensive body of literature and consulted a group of experts in multiple rounds (RAND-modified Delphi procedure) to identify relevant instruments.

  • The possible biased inclusion of studies due to diffused research concepts (ie, ‘caring’ and ‘organisational culture’).

  • Most instruments were successfully tested on their reliability as well as face- and construct-validity, but lack data on discriminant validity, feasibility and responsiveness.


Biomedical research and the concept of evidence-based medicine direct the discourse on quality in healthcare. In the pursuit for better, safer and more cost-effective care numerous initiatives to measure, evaluate and improve care have been developed.1–3 These result in a variety of objective quality indicators and ‘good clinical practices’ and contribute to a more rational and standardised way of organising care processes and professional decision-making. However, the focus on a rational and process-oriented view on healthcare improvement leaves another quality aspect undervalued that is of major importance to patients. This paper indicates this aspect as ‘caring’: the sensitivity of healthcare providers and organisations for what patients have to endure and for how patients experience the care they receive, and the art of attuning the care to these experiences.

The concept of ‘caring’ is closely related to the concept of ‘patient-centred care’. In 2001, the Institute of Medicine formulated patient-centeredness as one of the six qualities of care domains to answer the current deficits in the quality of care provision.1 Patient-centeredness is associated with positive outcomes such as patient satisfaction,4 medication adherence5–7 and more efficient use of health services,8 and reduction of costs.9 Although consensus on the definition of patient-centeredness lacks,7 ,10 ,11 authors agree to the following aspects: (1) eliciting and respecting patients’ preferences and values; (2) informing, involving and engaging patients and family members in the care process and (3) providing patients’ physical comfort and emotional support. The concept of caring overlaps with patient-centeredness as it relates to being compassionate,12 and being empathic,13 as a healthcare provider and as an institute. However, caring represents a broader meaning; it encompasses a careful attention to every single, unique patient: with interest, curiosity, concern and openness for what moves or puzzles a patient, and, then, rightly responding to it.14 ,15 Baart and Vosman,16 describe this as the fit or match between the need or wish of the patient and the care provided.

Although many studies focus on measuring aspects of care on a microlevel,6 ,17 ,18 and various studies found associations between caring professionals and improved clinical outcomes,12 ,13 research is limited to the extent to which these aspects are supported on a higher, ‘meso’ level.19 More specifically, hospitals require a caring culture, that is, beliefs, norms and values shared by professionals and the management throughout the organisation,20 ,21 that motivate, facilitate and direct these professionals to structurally act caring to patients and family. Appropriate means can help to diagnose the hospital's caring culture and thereby provide important input to evaluate the quality of care provided in a single hospital and between hospitals, over a period of time. It may also provide insight for further in-depth research and opportunities for improvement. The existence of appropriate instruments to measure a caring culture in hospitals is unknown. Therefore, the aim of this study is to identify possible existing instruments or components of instruments that aim to measure the extent to which hospitals are caring, and to systematically review existing instruments on their psychometric properties, feasibility and responsiveness.


We planned and reported this systematic review in accordance with the preferred reporting items for systematic reviews and meta-analyses (PRISMA).22

Data sources

We searched for English language, peer-reviewed, studies published between 1990 and 1 May 2012 using the following full-text databases: PubMed, CINAHL, EMBASE, PsychInfo, Web of Science and the International bibliography of the Social Sciences (IBSS). A specific search strategy was developed for each database. Online supplementary appendix 1 provides a detailed listing of search terms. The references of the selected studies were manually checked (snowballing) to identify additional relevant studies that were missed.

Study selection

Two reviewers (GH and EK) independently assessed inclusion eligibility of the retrieved studies using the search strategy. The initial selection for inclusion was based on the title and abstract of the study. When the title and/or abstract provided insufficient information to determine the relevance, full-paper copies of the articles were retrieved and reviewed. For the final selection, full-text copies of the studies were examined by GH and EK to determine whether they fulfilled the inclusion criteria. Disagreement about inclusion was solved by discussion. When no consensus could be achieved, a third and fourth reviewer (HW/MP) decided. Studies included in this review had to meet all of the following criteria:

  1. Peer-reviewed studies, published full-text, during the period from January 1990 to 1 May 2012, and with an abstract in English to compare studied instruments in the same language and to avoid misinterpretation of the purpose and the content of instruments due to language barriers.

  2. Describing instruments or components (ie, items or domains) measuring aspects of a caring culture in a hospital setting, that is, beliefs, norms and values shared by professionals and the management throughout the organisation,20 ,21 that motivate, facilitate and direct these professionals and the management to structurally act caring to patients and family. No distinction was made between a ‘caring culture’ and a ‘caring climate’. These constructs are highly inter-related which makes it difficult to determine where culture leaves and climate begins.23 Studies with instruments examined in primary care settings (including nursing homes and rehabilitation centres) or administered to medical or nursing students were excluded.

  3. Reporting psychometric data (ie, reliability or validity) regarding instruments or components to be included.

Systematic reviews, intervention studies or studies measuring patient satisfaction were excluded.

Evaluation of instrument items by a RAND-modified Delphi study

Items of potentially useful instruments, that were included after the title and abstract scan, were evaluated on relevance in a RAND-modified Delphi procedure.24 The RAND-modified Delphi method facilitated a systematic process of evaluating instrument items and reaching a consensus on item relevance by the input of expert opinions. A multidisciplinary panel (n=12) of experts in the field of medical ethics, social and organisational sciences, one patient representative and persons with expertise in patient-centred care, caring and organisational culture were consulted in three rounds.

In step 1, the members of the expert panel received a tabulated list of the instruments included after the title and abstract scan and an additional number of instruments from studies that were excluded from this review. One instrument was included by snowballing references after the Delphi study and therefore not evaluated by the group of experts. Experts were instructed to individually rate on a 5-point scale (1 for lowest, up to 5 for highest) by asking: “Please rate to what extent the item is a good measure for assessing the caring culture in hospitals.” To support their choice, panel members were provided with the source and available psychometric properties of the instrument. The results of step 1 were processed into a summary report to facilitate step 2 (panel consensus meeting). In this report, based on the rating of the experts, the items were ranked on their mean score and categorised into three according to their potential to measure aspects of a caring culture in hospitals: a category of items with high potential, low potential, or uncertain potential (for discussion). Items were considered to be of high potential if the mean score was 4.2 or higher. This cut-off point for high-potential items was chosen to ensure a limited number of selected items, face validity and good reproducibility. A low overall ranking score (low-potential recommendations) included a mean score rating <4.0. For the category of uncertain potential or with dubious results (ie, ratings that were highly conflicting between panel members), the level of agreement between panel members was assessed in the consensus meeting.

In step 2, panel members were invited to the consensus meeting to discuss results from step 1 and to criticise instruments and specific items face-to-face. A personalised summary report provided panel members the opportunity to compare their individual scores to the overall distribution of scores and to discuss reasons for disagreement or conflict situations. The goal of the meeting was not to force consensus, but to distinguish well-founded disagreement and disagreement based on misunderstanding or irrational motives.25 The following options were explained to the panel members: acceptance, rejection or adjustment of an item, or the formulation of a new item.

In step 3, a set of items was identified that passed the first round of individual rating as well as the second-round discussion. This set of items was sent to the expert panel by email. In addition, all panel members, including those not present at the meeting, were asked to rate the adjusted or the newly formulated items once more, were provided with a last opportunity to make remarks and were asked to approve the final set. Comments were discussed by the authors and final revisions were made.

Quality assessment of studies

The seven-criterion appraisal framework of Yu and Kirk,26 based on the work of Greenhalgh et al,27 Russel et al28 and Grange et al29 was modified to six quality criteria and applied to each included study. The total score possible for each instrument ranged from 0 to 12 (see online supplementary appendix 2). Two reviewers (GH and EK) separately assessed each study based on validity (eg, face, content, construct and criterion), reliability (eg, internal consistency, stability and equivalence), responsiveness, user-centeredness, sample size and feasibility. Discrepancies were resolved through discussion. If no consensus was reached, a third reviewer (HW) was consulted.

Data extraction

Data were abstracted into a standard data abstraction form covering general information about the instrument such as the name and source, the study setting (country, type of hospital and population), purpose, the way the instrument is administered to participants, items and scoring of items and subscales. Psychometric properties regarding the validity and reliability of measurement, the response rate, the feasibility in terms of time and cost investment and ease of use of the instrument, and information regarding responsiveness of the instrument were extracted as well. Data extraction was performed independently by two reviewers (GH and EK). Any disagreement was resolved by discussion among the reviewers and a final decision made by the third reviewer (HW).


Search results

Our initial search identified 6399 records (figure 1), of which 1935 were in PubMed, 1127 were in CINAHL, 1900 were in EMBASE, 324 were in PsychInfo, 764 were in Web of Science and 349 were in IBSS. The title and abstract scan resulted in 72 papers that, at first sight, met the inclusion criteria or raised doubt. Sixty-seven papers were excluded after full-text scan and based on the outcome of the Delphi study. Two additional articles were identified by manual review of the reference lists of the original 72 articles and were included after the full-text scan and the Delphi study. Thus, the final set consisted of seven unique studies that underwent full-text abstraction.

Figure 1

Flow diagram of the search process.

General description of the instruments

In total, seven instruments were included in the review (table 1). Two instruments, the Person-centered Climate Questionnaire-staff version (PCQ-S),30 ,31 and the Person-centered Climate Questionnaire-patient version (PCQ-P),32 ,33 were distinctively studied in the Swedish as well as in the English language on their psychometric properties. Instruments comprised between 7 and 76 items.

Table 1

General information of instruments included in the review

Of the seven instruments, two were studied in Australia,31 ,33 two in the UK,34 ,35 two in Sweden,30 ,32 and one in the USA.36 Instruments were studied in one to three hospitals, varying in type: local, district, acute, teaching and tertiary. One instrument was tested in 86 hospital trusts.35 Of the seven instruments, two were tested with only patients or relatives,30 ,31 and five with only hospital (nursing, medical or support) staff.32–36 The sample size for the hospital staff ranged from 52 to 17 949 and for patients or relatives from 108 to 544.

Relevance of items

On average, 24% of the instrument's items were considered relevant for measuring aspects of the hospital's culture of caring. The percentage of relevant items for an instrument ranged between 4% and 47% (see table 2).

Table 2

Relevant items of instruments to measure aspects of a caring culture within hospitals

Quality assessment of studies

Studies fulfilled 3–8 of 12 quality item scores (mean fulfilled criteria (±SD), 5.7 (2.3)); see online supplementary appendix 3).


Validity was addressed in some way (eg, by tests in the study or by referring to previous tests) for all instruments, except for one instrument (table 3).35 For five instruments,30–33 ,36 more than one type of validity was reported. Face or content validity was described for six,30–34 ,36 of the seven instruments. Face or content validity was evaluated by a panel of experts, clinicians or patients. Construct validity was established by principal component analysis for five30–33 ,36 of the seven instruments. Factors accounting for the total variance of the instrument varied between 60% and 72%. For two studies, construct validity was evaluated by confirmatory factor analysis.30 ,32 One study reported the ability of the instrument to detect true differences between hospital units by examining the dispersion of mean scores.32

Table 3

Psychometric properties of the included instruments


Reliability data was available for all of the instruments (table 3). Internal consistency was the most frequently reported psychometric property for the instruments. Internal consistency was demonstrated by:

Most instruments showed an α ranging between 0.87 and 0.93. The α for the subscales of the instruments varied between 0.64 and 0.96. For three instruments the correlations between items and the total scale ranged between 0.24 and 0.71, 0.37 and 0.80, and 0.56 and 0.64. Stability was addressed for four instruments through test–retest reliability with 1 week interval between testing.30–33 Correlation coefficients varied between 0.51 and 0.75.


Healthcare providers were involved to test the face and content validity for four instruments,32–34 ,36 and patients, respectively in two instruments.30 ,31 User views were taken into account in initial item generation for five instruments.30–33 ,36 An initial pool of items was usually generated from literature reviews and empirical research, and guided by theoretical constructs.30–33 ,35

Sample size

Six instruments were tested with a sample size that was suitable for factor analysis based on Kass and Tinsley's37 guideline for a ratio of 5–10 participants per item up to about 300 participants. If the number of participants reaches up to 300, test parameters tend to be stable regardless of the subject to variable ratio.37 The sample size of four instruments,30 ,32 ,35 ,36 was high (ie, above 300) and for one instrument sufficient (ie, 5–10 participants per item).31

Feasibility and responsiveness

All instruments were self- (or peer) administered. Information regarding the time needed for completion, costs, perceived difficulties and training needs or instructions (eg, how to complete the questionnaire) were not reported for any of the studied instruments. Non-response was reported for all instruments, except for one.36 However, the reasons for not participating were not evaluated in any of the studies. An assessment of responsiveness was conducted for none of the instruments.


This is the first systematic review of instruments evaluating aspects of a caring culture in hospital settings. Various instruments (ie, questionnaires) were found measuring aspects of a caring culture in hospitals. Moderate-to-high reliability and validity was reported for most of the instruments. However, the usefulness of these instruments is limited. The instruments consist of a low percentage of relevant items covering one or a few aspects of a caring culture in hospitals, leaving other important aspects unnoticed. Although most instruments were successfully tested on their reliability as well as face- and construct-validity, studies lack data on discriminant validity.38 An instrument should demonstrate significant differences across hospitals if it is to be useful in discriminating between hospitals in terms of their caring culture. Information on feasibility in terms of instructions or training on rating, time investment, costs and non-response evaluation lacked for all of the instruments. Various studies revealed that it was not possible to explore reasons for not participating (by completing the questionnaire), because of the anonymous return and implied consent. Furthermore, the ability of the instrument to detect clinically important changes over time, such as tests for differences between individuals, factors associated with good outcome and treatment effect from group differences, were generally not examined or reported as well. All identified instruments were questionnaires. Questionnaires are useful in providing a first general overview of a hospital's caring culture by the input from a large sample within a short period of time. However, culture and caring are constructs that are difficult to identify and assess by quantitative research alone.14–16 ,39 ,40 Although being more time-consuming, in-depth interviews and observations are needed to identify and assess the underlying social constructions, attitudes and patterns of communication between care providers, patients and family members.39

Our study has several limitations. First of all, this review focused on instruments measuring complex and disputed constructs such as ‘patient-centred culture’ and ‘caring culture’. In the literature, for each of these constructs a widely agreed definition lacks. This hindered us in formulating strict inclusion and exclusion criteria and may have caused subjective selection of studies (and instruments). We tried to reduce the subjectivity on selecting studies by using the RAND-modified Delphi procedure. Cut-off points for selecting relevant instrument items were arbitrarily chosen as standard cut-off points for evaluating items on a 5-point Likert scale. Second, articles with potentially relevant instruments may not have been covered by our search strategy, because they did not describe one of our search terms related to a caring culture. For example, we did not find and examine the American Hospital-level Consumer Assessment of Health Plans Survey in this review. Third, only instruments measuring aspects of a caring culture in hospital settings were included in this review. This narrowed focus possibly left out two potentially useful instruments that were tested in the community care,41 ,42 which we identified in the title and abstract scan. Fourth, the authors did not investigate if instruments were sensitive to measure subcultures (eg, at the department level or among physicians or nurses).38 This may raise questions about the appropriateness of the instruments to measure a caring culture in hospitals.

In conclusion, an ultimate standard for measuring a caring culture in hospitals does not exist. An instrument specifically aimed at measuring the caring culture in hospitals, covering a wide range of caring aspects, does not exist in one single instrument for patients nor for care providers. The items of the studied instruments included in this review that were appraised as relevant for measuring aspects of a caring culture could assist in the design of a comprehensive instrument. In particular, the PCQ-P and PCQ-S are useful, based on their relatively high number of relevant items. Further information on the reliability, validity, feasibility and responsiveness of such an instrument is warranted. A rigorous multimethod approach in which quantitative findings are further explored qualitatively and in-depth is important for providing an adequate diagnosis of a hospital's caring culture or its change over time.


The authors would like to thank the members of the expert panel who participated in the Delphi study: Rianne van den Brink (Dutch Institute for Healthcare Improvement, CBO), Marjan Faber (Scientific Institute for Quality of Healthcare, IQ healthcare), Michel van Slobbe (Utrecht University School of Governance, USG), Guy Widdershoven (VU University Medical Center, VUmc), Evert van Leeuwen (Scientific Institute for Quality of Healthcare, IQ Healthcare), Esther Kuis (University of Humanistic Studies), Hans van Dartel (Leiden University Medical Center, LUMC), Jan den Bakker (Stichting Presentie), Jorke de Witte (Radboud University Nijmegen Medical Centre, RUNMC), Dolf de Boer (Netherlands Institute for Health Services Research, NIVEL), Frans Kingma (The Client Advisory Board for University Hospitals, CRAZ), Harriët Messing (Compassion for Care).


View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors GH, EK, MP and HW were involved in conception and design of the study. GH and EK were responsible for data acquisition. GH and EK analysed and interpreted the data. GH and EK drafted the manuscript, which was critically revised for important intellectual content by all authors.

  • Funding This review was funded by the Dutch healthcare insurance organisation CZ.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.