Objectives To gain consensus on the items that determine adequacy of shift staffing.
Design This was a three-round Delphi study to establish consensus on what defines adequacy of shift staffing in a general hospital ward. A literature review, focus group and five semistructured expert interviews were used to generate items for the Delphi study.
Setting Multicentre study in The Netherlands.
Participants Nurses, head nurses, nursing managers, and capacity consultants and managers working for Dutch hospitals.
Results Twenty-six items were included in the Delphi study. One hundred and sixty-eight, 123 and 93 participants were included in the first, second and third round, respectively. After three rounds, six items were included (mostly related to direct patient care) and nine items were excluded. No consensus was reached on 12 items, including one item that was added after the first round.
Conclusions This is the first study to specify items that determine adequacy of staffing. These items can be used to measure adequacy of staffing, which is crucial for enhancing nurse staffing methods. Further research is needed to refine the items of staffing adequacy and to further develop and psychometrically test an instrument for measuring staffing adequacy.
- human resource management
- health economics
- organisation of health services
Data availability statement
No data are available.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
The Delphi technique applied in this study includes many experts on nurse staffing with varying backgrounds.
The repeated rounds of the Delphi technique offers the participants the opportunity to thoroughly consider and, if necessary, adjust their responses leading to well-considered answers.
The questionnaire technique is inappropriate to question the participants further on their responses on unsure items.
Adequate nurse staffing is important for the well-being of patients and nurses.1–3 Without sufficient time to complete their work, nurses are forced to prioritise activities and ration essential nursing care.4 This reduces the quality of patient care and leads to dissatisfaction among nurses.5–7 Inadequate nurse staffing also reduces financial health of healthcare organisations and creates a poor work environment.8 This may lead to burnout and resignations, leading to costly turnovers.8 On the other hand, overstaffing wastes the already limited nursing capacity. Balancing good-quality care and nurse well-being with financial and capacity constraints relies on adequate nurse staffing, which is why this topic is receiving ongoing attention in scientific research.8
Adequacy of nurse staffing is a complex construct. Nurse staffing is adequate when demand for nursing work matches nurse supply.9 The demand for nursing work is affected by patient factors such as age, length of stay, comorbidities and complications10 and by organisational factors such as patient turnover, interruptions, mandatory registrations for monitoring quality and safety, supervision of students and quality improvement projects.11–13 Nurse supply is affected by the number, skill mix, education and experience of registered nurses and other professionals who provide nursing care, that is, nurse assistants and nurse students.11 14 This complicated interplay of factors makes it difficult to balance nursing work with nurse supply.3 15
To be able to assess adequacy of staffing, we need to know what it is. There are very few definitions for adequate and optimal staffing3 or for inadequate and suboptimal staffing16 and the definitions that do exist are specific to distinct time frames of staff decision making, such as during setting nursing staff establishment, rostering or staff-to-ward allocation.17 (Uncertain) fluctuations in demand for nursing work make the staffing process extremely challenging. Insufficient anticipation on those fluctuations causes (moments of) inadequate staffing. Mark et al18 previously pointed to measurement issues of the longitudinal measure patient days per full-time equivalent; ‘the measure does not take into account the uncertainty and unpredictability of patient care on the unit level, nor does it address differences in perceptions across shifts’ (p.240). Shift-to-shift staffing exposes this ‘capacity killer’ variation.19 20 Therefore, to better understand adequate staffing at all time frames, a definition of adequate shift staffing is needed.
Resource planning tools can be used to determine the demand for nursing work and nurse supply and to identify understaffing, adequate staffing and overstaffing.3 This information can be obtained by time-study methods, optimum outcome methods, and expert opinion methods. The time-study method estimates the demand for nursing work by multiplying the observed time per nursing task or patient group by the number of tasks or patients per group.21 However, this measures time spent instead of time needed.22 Moreover, nursing work is more than a sum of tasks—it contains work on relations, culture and climate.15 16 Allen23 claims that organising nursing work is an essential but neglected element of nursing, possibly because the work processes are not easy to control. Much nursing work is done ‘on the fly and woven through the warp and weft of everyday nursing practice’ (p.37).23 Those elements of nursing work are only noticed once they are gone.16 23 Optimum outcome methods attempt to evaluate whether staffing decisions taken in accordance with prescribed staffing levels of resource planning tools generate optimum outcomes, or at least better outcomes than staffing levels below prescribed levels. Such evidence is very scarce.3 24 Juntilla et al25 found that adequate staffing according to such a tool reduced patient mortality. However, overstaffing reduced mortality even further, suggesting that these tools cannot identify optimum staffing. Nurses’ perceptions have been the gold standard measure of staffing adequacy;26 27 however, these measures are not completely reliable or valid.28 Most measures of perceived adequacy of staffing (PAS) are single items that directly ask for adequacy of staffing.28 These measures do not provide an underlying definition, which is insufficient for such a complex construct.29 30 According to Kramer and Schmalenberg, multiple items are needed to completely define the meaning of the construct to the rater and to facilitate the perceptual process.31 However, these items are currently undefined for daily staffing,28 and mainly address other purposes such as measuring the work environment.32
To develop a reliable and valid instrument for measuring adequacy of staffing and to support nurse staffing decision making, we need an unambiguous definition of adequate nurse staffing.3 28 The purpose of this study is to identify which items determine adequacy of shift staffing.
We performed a three-round Delphi study to identify the items that determine adequacy of staffing. The Delphi technique involves structured distribution of questionnaires over several rounds to gain consensus on a specific topic.33 34 In this technique, group opinion is considered more valid than individual opinion34 and many individuals with varying backgrounds from diverse locations can respond to the survey.34 Our first step was to generate items that determine adequacy of staffing, followed by three Delphi study rounds (figure 1). We followed the recommendations for conducting and reporting of Delphi studies to report the study.35
Generation of items
We conducted a literature review to explore available instruments for measuring PAS.28 We searched PubMed, CINAHL, Business Source Complete and Embase for literature on outcomes, influencing factors, and instruments for measuring PAS. The complete search strategy for each database is presented in online supplemental file 1. From this search, 2609 potentially relevant articles were identified and further screened for eligibility. In the end, 63 studies were included. Twenty-one measurement instruments were identified and the psychometric properties of these instruments were assessed using the consensus-based standards for the selection of health measurement instruments guidelines.36 The results of this review have been published elsewhere.28 Building on these findings, we convened a focus group of nurses and nurse staffing experts to evaluate the instruments and extracted items. Twelve of the 21 instruments were found to be insufficient to measure adequacy of staffing because they were single-item instruments. From the remaining instruments, we were unable to define an unambiguous set of relevant items. Most items did not meet the basic rules of item formulation, such as ‘the item should be specific’ and ‘contain only one question’.30 Moreover, items were used for long-term issues such as budgeted positions rather than more short-term issues of nurse staffing31 and did not necessarily change as the construct changes. For example, the item ‘a satisfactory salary’ of the subscale staffing and resources adequacy37 could be adequate while staffing still is inadequate. Instead, it provided input for a topic list for further generation of items of adequate shift staffing. For example, ‘give quality patient care’ was input for a proper definition of adequate shift-to-shift staffing in terms of ‘being able to complete care activities’. The topic list is a summary of reflective elements of the items from existing PAS instruments and is presented in online supplemental file 2.
To generate a set of potential relevant items for adequate shift staffing from this list, one researcher (CJEMvdM) conducted five semi-structured expert interviews. Interviewees were one nurse manager, two researchers studying nurse staffing and two capacity management consultants whose daily task is to align patient demand and hospital resources on the strategic, tactical or operational level. These high expertise participants were recruited by purposive sampling. Interviews were conducted in person (n=2) or by video call (n=3). The interviews began with an open defining question: what items do you believe reflect adequacy of shift staffing on a general hospital ward? A topic list based on the literature review and focus group discussion was used to guide subsequent questions. We were aiming for a reflective model, so this list consisted exclusively of reflective topics, such as ‘quality patient care’, ‘patient safety’ and ‘job satisfaction’. Reflective items are effect indicators of the construct.30 As opposed to formative items, which are causal and together form the construct, items are interchangeable and may be incomplete for an adequate measure.30 The interviews were recorded and analysed by one researcher, and a list of potential items to define adequacy of staffing was created (CJEMvdM). Three researchers (CJvO, CJEMvdM and JK) reviewed and discussed this list until consensus was reached on the final 26 items for the Delphi study.
Selection of participants
To thoroughly understand these items, the Delphi study employed a panel consisting of nurses, head nurses, nursing managers and internal and external capacity consultants and managers working for Dutch hospitals. These participants were selected for their expertise and, therefore, approached via social media channels or directly invited to complete the first survey by purposive sampling (table 1). Participation was quasi-anonymous as only an email address was used to reach participants for subsequent rounds and only the researchers had access to this contact information.
We used the survey tool Enalyzer Pro version, which processes European Union (EU) residents’ data according to EU standards.38
We applied a three-round e-Delphi, which is similar to a classical Delphi but is administered via an online web survey (Enalyzer).34 Three rounds allow participants to reconsider their judgements without respondent fatigue.39 The questionnaires were administered in Dutch according to the preferences of the target population. The content, flow and clarity of the first draft of the questionnaire was pilot tested by four independent capacity consultants and nurses, and an adjusted draft was then reviewed by four other test persons with comparable backgrounds. In the final first version, the participants were asked to rate the relevance of the items to judging adequacy of staffing using a 10-point scale ranging from one (not at all) to ten (totally). A score of 7–10 was considered an agreement. Two additional open questions asked for feedback on the items and additional relevant items in judging adequacy of staffing.
In the second round, all unsure items (ie, items for which no consensus could be reached on inclusion or exclusion in the first round; see the ‘Data analysis and consensus’ section for details on how consensus was reached) were presented again to the participants. The item list was updated by reformulating or adding items from the qualitative analysis. One open question asked for feedback on the items. After being invited to complete the survey by e-mail, participants were given 2 weeks to respond and reminder emails were sent to non-respondents 3–5 and 7 days after the first invitation. The same procedure, questions and rating scale were used for the third and final round.
Data were collected between March and May 2021.
Data analysis and consensus
Quantitative data on demographic variables and consensus were analysed by two researchers (JK, CJEMvdM) independently using Microsoft Excel and statistical analysis software SPSS Statistics V.26. Descriptive statistics (mean, SD and proportions) were used to present demographic variables and responses for each round. Consensus was reached if ≥80% of participants rated an item 7 or higher following the most regular definition for consensus and applying a slightly higher threshold than the median threshold of 75% that is commonly used.40 Items that >50% of participants rated lower than 7 were excluded and items in between were labelled ‘unsure’. These thresholds were specified a priori. One researcher (CJEMvdM) performed further descriptive analysis of mean scores to analyse similarities and differences between experts with different backgrounds.
The reformulation and additional items were qualitatively analysed by one researcher (JK) using thematic analysis of the open comments. This was cross-checked by a second researcher (CJEMvdM). Three researchers discussed these results until consensus was reached on adjustments to the item list (JK, CJEMvdM and CJvO).
Patient and public involvement
No patient was involved.
In total, 170 experts completed the first Delphi questionnaire. Two participants were excluded as their occupation and organisation did not meet the inclusion criteria, resulting in 168 experts included in the first round. Three experts did not receive the email invitation to participate in the second round. The response rate was 75% (123/165) in the second round and 76% (93/123) in the third round, which exceeded the minimum response rate of 70% for strong research.33 This response rate showed that the Delphi study participants were interested in identifying items that define adequacy of staffing. Participant characteristics were similar over the three rounds (table 1) and most participants were nurses (≥70%), working for diverse types of hospitals (≥97%) and with on average 15–16 years of working experience (table 1).
After the first round, five items were eligible for inclusion with the percentage of scores ≥7 ranging from 80% to 91% (table 2). The item ‘being able to deliver care according to protocol or guideline’ had the highest consensus rate (91%), followed by ‘being able to complete all care activities’ and ‘being able to prepare discharge with patient and family members’ (84%). Five items were excluded and the remaining 16 items were labelled ‘unsure’. Some participants provided feedback on the research question. For example, one expert noted:
The question was not immediately clear to me in relation to the answering options.
To ensure that participants fully understood the research question, we further explained the main question and illustrated it with two examples. We also sharpened the focus of the main question by replacing ‘relevant’ with ‘definable’. After this, all items were provisionally assumed unsure and presented in the second round. Based on participant feedback, we modified item 16 to ‘being able to perform non-professional tasks, such as cleaning, serving meals and filling stocks’ and added item 27 ‘being able to have a joint start of the day and/or evaluate the shift’. Other suggested items were rejected because (1) they were formative (eg, nurses’ working experience or patient acuity), (2) they reflected different timeframes than a shift such as absenteeism and (3) they were related to other aspects of the work environment such as autonomy, teamwork and atmosphere.
The results of the second round did not differ significantly from those of the first round, which gave the research team sufficient confidence to continue with the obtained results. Therefore, we followed the regular Delphi procedure by including and excluding items during each round based on predefined thresholds.34 After the second round, one additional item was included: ‘being able to have breaks’. The item ‘being able to complete all care activities’ had the highest consensus for inclusion with 88%. The item ‘being able to chat with coworkers about non-professional subjects’ exceeded the exclusion threshold and the item ‘being able to perform non-professional tasks, such as cleaning, serving meals and filling stocks’ exceeded the exclusion threshold even further after it was reformulated (64%–84%), so was excluded. No consensus was reached on 15 items, so they remained unsure. Many participants found the item ‘being able to complete all care activities’ too strong, so the word ‘all’ was deleted.
In the third round, the 15 unsure items were presented to the experts again, but none were included. Three additional items exceeded the exclusion threshold and no further adjustments were made to any items. The final list consisted of six included items:
Being able to complete care activities.
Being able to deliver care according to protocol or guideline.
Being able to prepare discharge with patient and family members.
Being able to educate the patient.
Being able to have breaks.
Being able to guide nursing students.
Nine items were excluded and 12 items were unsure. The mean scores of expert groups per occupation are provided in table 3. There was only one capacity manager in round 1, so this participant was not included in the group comparisons.
All expert groups considered items on direct patient care as the best items for defining and judging adequacy of staffing. The item ‘being able to have breaks’ was also considered defining by all expert groups. Capacity consultants scored the item ‘being able to guide nursing students’ lower than the other groups did (by ≥1.2 points). They also scored five items lower than nurses did in at least one round (by ≥1.8 points); these items were ‘training and development’, ‘quality improvements’, ‘work pace’, ‘care coordination with other healthcare organisations’ and ‘chatting with coworkers about non-professional subjects’. Capacity consultants considered the items on training and development, quality improvement and work pace less defining than nurses, head nurses and managers did, although managers gave higher scores in the last round. Capacity consultants also considered ‘coordination of care with other healthcare organisations’ less defining than other experts did. In the second round, head nurses scored the item ‘being able to chat with co-workers about non-professional subjects’ lower than nurses did, while capacity consultants gave this item higher scores in the third round.
The scores given by nurses were less variable than those given by capacity consultants (SD of mean item scores 0.5–0.9 for nurses and 1–1.4 for capacity consultants). The lower scores given by capacity consultants for some items did not affect the final inclusion and exclusion of items because there were not many capacity consultants (5%–9% of total participants).
In this three-round Delphi study, six items were identified that define adequacy of shift staffing. No consensus was reached for 12 items, including the ‘day start and/or evaluation’ item which was added after the first round. These results contribute to the current gap in literature41 and provide an initial outline of a description of adequacy of staffing.3 These defining items for adequacy of shift staffing will be highly relevant in the development of a valid and reliable instrument to measure adequacy of staffing.30 Experts with multiple professions reached consensus. This contributes to a base of support and acceptance for a future instrument to support nurse staffing decision making.
Most of the defining items were related to direct patient care. This reflects the current values and beliefs of nurses in their profession (ie, their ‘professional identity’)42 and shows that direct patient care is the essence of nursing work.43 44 Additionally, breaks are often sacrificed to care for patients,45 so are crucial for judging the adequacy of staffing. Guiding nursing students was also found to be a defining item, but most items not related to direct patient care were labelled unsure, such as the items on quality improvement and training and development. This contradicts The International Council of Nurses’ definition of nursing that described such activities as part of nursing work (https://www.icn.ch/nursing-policy/nursing-definitions). This contradiction is due to nurses setting priorities in a limited time.4 The nursing shortage46 and understaffing6 47 make such work less essential than direct patient care. Direct patient care is more visible and valued by themselves and others.23 This inhibits changes in nurses’ professional identity.48
No consensus was reached on the item ‘being able to chat with a patient for emotional and physical guidance’. Scott et al43 found that direct psychological care such as ‘comfort talk to patients’ is not provided when time is limited, suggesting this item is relevant to adequacy of staffing. However, nurses often feel insufficiently skilled to meet their patients’ emotional and psychological needs and rely on mental health liaison nurses to do this.43 This also reflects a preference to deal with physical needs over basic human needs, possibly because the impact of these biomedical aspects of care is more apparent.49 Consensus was also not reached on the level of work pace. In contrast, van den Oetelaar et al50 found that an increase in workload had a strong impact on work pace; however, their objective measure of workload is validated by the subjective workload measure existing of other aspects such as physical, mental or emotional workload as well. Adequate staffing can contribute to acceptable workload, but this relation differs between workload aspects. In this sense, workload can be excessive while staffing is adequate, or staffing can be inadequate while workload is adequate. This emphasises the need for clear definitions in measurement. The large number of unsure items in our study may be explained by the conservative scores given by nurses. This is in agreement with earlier findings that nurses show conservative response bias, which may lead to fewer ‘correct’ responses.51 This demonstrates a lack of vision on what will be needed to cope with demands in the future. Therefore, unsure items should be discussed further as part of the development of instruments to measure adequacy of staffing.
Judgements from experts who were not nurses did not lead to the inclusion or exclusion of extra items, but judgements differed between expert groups. For example, capacity consultants rated indirect care activities as less defining than nurses did. This could reflect an incomplete understanding of nursing work.52 Capacity consultants support decision making on nurse staffing on a tactical and strategical level, using data analytics and predictive modelling.53 Items that reflect adequacy of shift staffing relate to real-time capacity management,54 so an adequate understanding of daily operations is needed. Moreover, capacity consultants may have had premature assumptions about measuring PAS while judging the items, and may have excluded non-universal items as a result. This is in agreement with the findings of Schmalenberg and Kramer, who excluded the item ‘guiding nursing students’29 because many hospital wards do not employ nursing students. Hence, it is likely that nurses judged items from the perspective of their own work, whereas capacity consultants judged items based on their suitability for a hospital-wide instrument.53 These varying perspectives will have to be considered when developing an instrument to measure adequacy of staffing, particularly the essential expertise of nurses on their daily operations and the understanding of capacity consultants on what is needed for a healthy business operation.30
The participants’ qualitative responses emphasised the relevance of a positive work environment. Those suggestions do not directly reflect adequacy of staffing. Staffing adequacy is one of multiple elements of a positive work environment and is essential for good patient care and nurse satisfaction.55 The work environment also influences how well nurses deal with understaffing,31 which has an indirect effect on nurses’ opinions on staffing adequacy.29
This study has some limitations. First, the sample size decreased from 168 participants in round 1 to 93 participants in round 3 despite two reminder emails being sent during each round. Attrition during the Delphi process is a potential threat to the validity of the results.56 Nevertheless, the response rate was 75% in round 2 and 76% in round 3, which exceed the minimum response rate of 70% for strong research.33 Moreover, samples were similar during the three rounds. Second, this study had a higher proportion of nurses than other nurse-staffing experts, which may have affected the heterogeneity of the responses.34 Nevertheless, defining items need to represent nursing work, so nurses’ judgements were more meaningful in this study. Third, the predefined number of three rounds could have led to early inclusion or exclusion of certain items. The number of rounds varies from 2 to 4 in Delphi studies.39 In our study, three rounds seemed sufficient because similar results were obtained in the second and third rounds and there was no inclusion consensus in the third round. Besides, another round could have affected the response rate and validity of the results. Fourth, the inclusion of nurses was limited to nurses working for general hospital wards. This excludes nurses working on other hospital wards such as paediatrics or outpatient clinics, and nurses working in other than hospital organisations. Although the selected and unsure items concern generic items reflecting nursing work. It has to be explored whether this set of items is relevant in those settings.
Future research directions
Combining the opinions of nurses on adequacy of staffing with operational research techniques used by capacity management could provide new solutions to the nurse staffing problem. Nowadays, data science techniques can generate information on the staffing adequacy of current and upcoming shifts using objective data-based factors extracted from hospital information systems.28 This information is far more informative than occupied beds and fixed nurse-to-patient ratios.57 While the search for objective measurement instruments continues,26 the distinction between objective and subjective measurements is not as clear as it seems.30 A completely objective measurement would be a laboratory test;30 however, nursing is more subjective depending on the ‘context-dependent interaction between nurse and patient’.15 Therefore, adequacy of staffing is subjective but can be objectified by clearly defining the construct. Measuring nurses’ PAS in this manner relies on nurses as—the one and only—expert of daily operations. Research on the relation of perceived adequacy of staffing and outcomes for patient, nurse and organisation could further objectify staffing decisions based on PAS and is a step towards more evidence based decision making on nurse staffing.41 Also, benchmark information on PAS measurements of different wards and hospitals could be helpful to discuss improvements on nurse staffing policies. Nevertheless, a first step in developing and validating the PAS instrument is to rethink unsure items and decide on the list of included and excluded items. While nurses decide on items for PAS based on their experience of nursing rather than what nursing should be, it could be helpful to involve experts on nursing theory and patients and family members/informal caregivers in defining adequacy of staffing.
This research has refined the definition of adequacy of nurse staffing beyond the balance between demand for nursing work and supply. We found 6 relevant and 12 potentially relevant items for defining adequacy of staffing. This is the first step towards developing an instrument to measure staffing adequacy, which will enhance nurse staffing by predicting staffing adequacy based on existing data. This approach goes beyond traditional staffing methods and opens a new era of staffing and staffing adequacy. Further research is needed to refine these items and to further develop and psychometrically test the instrument.
Data availability statement
No data are available.
Patient consent for publication
The Dutch Law on Medical Research Involving Human Subjects (WMO) did not require us to seek ethical approval as the research would not contribute to clinical medical knowledge and no participation by patients or use of patients’ data was involved. Participants gave informed consent to participate in the study before taking part.
The authors thank all participants of the focus groups, interviews, pilot test and Delphi rounds for their willingness to contribute and share their views. We specially thank Bernd van den Akker for his critical review of the Delphi rounds and discussion on the results.
Contributors All authors (CJEMvdM, JK, HV, PHJH and CJvO) were involved in planning the scoping review. CJEMvdM, JK and CJvO were involved in the data collection and analysis and wrote the first draft of the manuscript. All authors revised the manuscript drafts and approved the final manuscript. CJvO acted as a guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.