Objectives The aim of this study was to test the reliability and validity of a new questionnaire for measuring patient experiences with general practitioners (PEQ-GP) following a national survey.
Setting Postal survey among patients on any of 500 GPs patient lists in Norway. GPs were stratified by practice size and geographical criteria.
Participants 4964 patients who had at least one consultation with their regular GP in the foregoing 12 months were included in the study. The patients were randomly selected after the selection of GPs. 2377 patients (49%) responded to the survey.
Primary and secondary outcome measures The items were assessed for missing data and ceiling effects. Factor structure was assessed using exploratory factor analyses. Reliability was tested with item–total correlation, Cronbach’s alpha and test–retest correlations. Item discriminant validity was tested by correlating items with all scales. Construct validity was assessed through associations of scale scores with health status, the patients’ general satisfaction with the services, whether the patient had been incorrectly treated by the GP and whether the patient would recommend the GP to others.
Results Item missing varied from 1.0% to 3.1%, while ceiling effects varied from 16.1% to 45.9%. The factor analyses identified three factors. Reliability statistics for scales based on these three factors, and two theoretically derived scales, showed item–total correlations ranging from 0.63 to 0.85 and Cronbach’s alpha values from 0.77 to 0.93. Test–retest correlation for the five scales varied from 0.72 to 0.88. All scales had the expected association with other variables.
Conclusions The PEQ-GP has good evidence for data quality, internal consistency and construct validity. The PEQ-GP is recommended for use in local, regional and national surveys in Norway, but further studies are needed to assess the instrument’s ability to detect differences over time and between different GPs.
- general practitioners
- patient satisfaction
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
The PEQ-GP was developed and validated according to the standard scientific procedures of the national patient-reported experience programme in Norway
Tests of the validity and reliability of the PEQ-GP was conducted in a large national survey in Norway
The lack of information about non-respondents precluded the possibility of assessing non-response bias
The ability of the PEQ-GP to identify important differences between providers or over time was not assessed
Patients are increasingly being involved in the planning, implementation and evaluation of healthcare services. The latter involves measurements and instruments related to patient-reported quality, including patient-reported experiences, patient-reported outcomes and patient-reported safety. The importance of patient-reported experiences is acknowledged by international organisations like the Organisation for Economic Co-Operation and Development (OECD) and WHO,1 2 and supported by research showing a positive correlation between patient experiences and clinical outcomes and patient safety.3 4 Numerous instruments for the measurement of patient-reported experiences have been validated, and standardised measurements are increasingly being used for high-stake purposes like external quality indicators and pay for performance.
In Norway, the Norwegian Institute of Public Health is responsible for national surveys of patient-reported experiences with healthcare services. The purpose of the surveys is systematic measurement of patient experiences, as basis for accountability, hospital management, quality improvement and patients’ choice of healthcare provider. Hence, the surveys are of interest to both patients and decision makers on all levels, including the national quality indicator system. Following Donabedians perspective on quality5 patient experiences surveys for a great part inform the structural and processual aspects of quality, while outcome measures have been less focused. The institute has developed, validated and published a large number of questionnaires for a wide range of patient groups and healthcare services.6–11 However, most instruments have been developed and validated for patient groups in specialist healthcare, and so far, only one instrument has been developed for primary healthcare services.8 General practitioners (GPs) are a cornerstone of the Norwegian healthcare system, with almost all inhabitants having a regular GP as part of the primary healthcare system. GPs diagnose and treat patients for a number of conditions, refer patients who need it to specialist healthcare and are expected to coordinate and cooperate with other services both in primary and specialist healthcare. Consequently, patient experiences with GPs are of major interest, and a literature review was conducted to identify potential instruments.12 This review identified several relevant instruments, two of which were validated in Norway, the Patient Experience Questionnaire and the EUROPEP.13 14 Both questionnaires were different from the standard format of Norwegian national patient experiences questionnaires, but the topics and questions were included in the development project. The PEQ instrument is narrow in its scope and is designed for evaluating a specific consultation which is not a feasible approach in a national survey. Results from a survey using EUROPEP in Norway indicates that this instrument yielded high proportions of item non-response, large ceiling effects and low GP-level reliability for several items15 and similar criticism was raised in a Danish study.16 In addition, the EUROPEP instrument does not include questions about patient safety nor coordinated care across providers, which are important issues in the Norwegian healthcare debate. On the background of these findings, it was decided to establish a development project based on the standard development and validation methodology used in the national patient experience programme.6–11 Activities in the development project included the review, meetings in a reference group of health personnel and other relevant stakeholders, cognitive interviews with patients and a pilot survey,17 producing a test version of the PEQ-GP for inclusion in a national validation survey in 2014.
The aim of this study was to test the reliability and construct validity of the PEQ-GP based on the national survey comprising 4964 patients across Norway.
All residents in Norway are entitled to a regular GP. Each municipality establishes agreements with GPs to serve their population. The GPs have a list of patients for whom they have medical responsibility. Available data indicate that more than 99% of the Norwegian population are on a regular GP’s patient list.18 Norwegian GPs are gate-keepers for the national insurance scheme, and patients are referred from a GP to specialised medical care when needed. Norwegian GPs’ practices are, in general, small units, employing on average approximately three GPs. Normally, there are also one or more receptionists as well as staff for sampling and analysing simple tests at the GP practice. Other healthcare workers like nurses or physiotherapists are rare in Norwegian GP offices.
Development of the questionnaire
To identify important topics, we assessed reviews of the literature12 19 and conducted a thorough assessment of the content of several questionnaires13 14 20–22 as well as questionnaires used in national programme in Sweden and the USA. Consultations with a reference group comprising GPs, researchers and representatives from health authorities and patient organisations also added to the content of the questionnaire. Several drafts of the questionnaire were discussed with this group.
The questionnaire was tested through cognitive interviews with patients at different stages in the process. Cognitive interviewing is a standard procedure in the development process of questionnaires. The purpose is to find out how the questionnaire functions cognitively in the target group, for instance, how patients interpret items and the adequacy of response categories. First, we conducted eight face-to-face interviews and nine telephone interviews. The questionnaire was revised on the basis of these interviews and discussions with the reference group. The revision was extensive, thus we conducted further face-to-face interviews with 11 patients based on the new draft. The second round showed that the questionnaire functioned well, with some remaining issues to solve. Some patients found it difficult to know who to evaluate when their regular GP was on long-term leave from the practice. Furthermore, questions about cooperation were difficult to answer because only a few patients perceived any need for coordinated care. This led to minor revisions of items and leading texts in the questionnaire. Before the validation survey, the revised version was tested in a pilot study. The pilot sample was 150 patients from each of five randomly selected GPs’ lists. The results of the pilot study were discussed with the expert group, and the questionnaire was revised before the validation survey in 2014.
The PEQ-GP used in the national validation study contained 30 items, covering topics like accessibility, the GP’s medical skills and relationship with the patient, organisation of the GP’s office, coordination of care, general satisfaction, patient enablement and incorrect treatment. Most of the items had a 5-point scale ranging from ‘Not at all’ to ‘To a large extent’. The questionnaire also included background variables like the presence of a long-term medical condition, self-reported health status and whether the patient had received help answering the questionnaire. Finally, we included an open-ended item for comments about the survey (see online supplementary file, English version of the questionnaire).
Only regular GPs and their patients were included in the study. Private primary care physicians and patients not on a regular GP list were excluded. The sampling plan aimed to give a nationally representative sample, and had a three-stage design. First, practices were stratified by number of GPs at the practice and municipality type and randomly selected. Second, we selected up to four GPs from each of the selected practices: if the practice had five or more GPs, four of them were randomly selected; while in smaller practices, all GPs were included. Third, we selected randomly 10 patients from each of the GPs’ lists. All of the patients had to have at least one consultation with their GP between May 2013 and May 2014.
A total of 4964 persons were mailed a questionnaire with a cover letter describing the purpose of the survey, a prepaid return envelope and an option to answer electronically. To ensure that the patients evaluated the intended GP, the cover letter included the name of the patient’s regular GP. For persons below 16 years (age of consent) and other persons who had difficulties filling in the questionnaire, the caregivers/next of kin were asked to respond on their behalf. A reminder including a new questionnaire was sent to those who had not responded within 3 weeks.
Background information about the respondents was collected from public registries. These data included the selected patients’ age, gender, how long the patient had been on the GP’s patient list, number of consultations in the last 12 months, diagnoses in the last 12 months, level of education and country of birth.
Items were assessed for item-missing and ceiling effects, the latter defined as the percentage of patients selecting the most positive response to each item. Patient experience items with less than 20% missing values (item missing and not applicable) were entered into an exploratory factor analysis with promax rotation to assess the underlying structure of the questionnaire.23 Two additional scales were constructed based on theoretical assumptions. The first consisted of three items inspired by the Patient Enablement Instrument.24 These were considered outcome measures and not patient experiences, and were therefore excluded from the initial factor analysis. Two items evaluating coordination of care and cooperation with other health services were excluded from the factor analyses because they were relevant only to some of the participants and therefore had high levels of non-valid responses. However, they were included as a separate scale because this is a core topic in the Norwegian health policy. Three items related to accessibility were excluded from psychometric analysis because they were conceptually distinct and with different response categories than all other items (for instance, waiting time in days/weeks). These are not reported here.
Internal consistency was assessed by item–total correlation and Cronbach’s alpha. The former measures the strength of association between an item and its scale, and levels above 0.4 are considered acceptable.25 The latter assesses the overall correlation between items within a scale. For a scale to be considered reliable, an alpha value of 0.7 is considered acceptable.25 Two hundred and seventy consenting responders were mailed a second questionnaire, enabling us to assess test–retest reliability.
Construct validity was assessed through associations of scale scores with variables known to correlate with patient-reported experiences: self-reported health and patients’ overall satisfaction with the services.6–11 We hypothesised that higher scores on the scales would be associated with higher levels of general satisfaction and patient safety, and with better health, and tested these associations by bivariate correlations. Item discriminant validity was tested by correlating items with all scales, and we expected each item to have a significantly higher correlation with its hypothesised scale than with scales measuring other concepts.26
One hundred and seven persons were excluded because they withdrew from the survey, had an unknown address or had died by the time of the survey. We received 2377 responses to the survey (49%), of which 439 were electronic responses. Mean age of the patients was 40 years and 76% were women (table 1). Persons who answered electronically were younger, had higher education and rated their own health as better than patients filling in the paper form (results not shown). Twelve per cent of the responses were made by caregivers or next of kin. Most items had a low level of item-missing and ceiling effects ranged from 6 to 46 (table 2).
The initial factor analysis showed that three items (1, 18 and 19) had factor loadings less than 0.4. These were removed and the final factor analysis produced a three factors solution, explaining 66% of the variation in the included variables (table 3). The first factor consisted of eight items about the GP’s medical and relational skills. The second factor consisted of three items about the organisation of the practice and assessments of auxiliary staff, and the third factor was two items assessing the acceptability of waiting time for acute and non-acute consultations, respectively. Reliability statistics for these three scales, together with the Enablement and the Cooperation scales, showed acceptable Cronbach’s alphas and item–total correlations: the former ranged from 0.77 for accessibility to 0.93 for the GP scale (table 2). Test–retest correlations for the five scales were high, ranging from 0.72 (accessibility) to 0.88 (GP).
All items had a stronger correlation with their own scale than with any of the other scales (table 4). All correlations were significant (p<0.001). High correlations were also observed between the GP, Cooperation and Enablement scales and the items pertaining to the other of these scales (table 4).
The scales were expected to correlate with general satisfaction with GP, patient safety, recommendation of the GP and with the patients’ health status. The item about general satisfaction with GP had moderate correlations (0.388–0.438) with the Accessibility and Auxiliary staff scales and high correlations (0.693–0.826) with the rest of the scales (table 5). The same pattern applies to recommendation of the GP. Self-reported health status had low but significant correlations with all five scales (0.041–0.212). Overall, all correlations were significant and in the expected direction.
The development of the PEQ-GP followed a standard procedure including a literature review of existing questionnaires, input from an expert panel, cognitive interviews with patients and a pilot study. The national validation study identified five scales with excellent psychometric properties, covering important aspects of the GP service relating to accessibility, evaluation of the GP and auxiliary staff, cooperation between the GP and other services and patient enablement.
Patient-reported experiences usually relate to structures and processes of healthcare, but the PEQ-GP also offers an intermediate outcome indicator through the Enablement scale. Thus, the PEQ-GP includes all aspects of Donabedian’s classical structure-process-outcome framework, offering a broad measurement approach of GP services.5 We are not aware of other patient-reported experience instruments with GPs with the same breadth in scale content. Together with single items not included in the scales, some of which are important in the political discourse, the PEQ-GP covers the most important topics for patients and decision makers on all levels in Norway.
High correlations were found between the GP, Cooperation and Enablement scales. This may be because they all evaluate the work of a particular GP, and thus are closely related. Cooperation was left out of the factor analysis because of high missing values, and Enablement was left out because the items were considered outcome and not experience measures. Although they correlate with the GP scale, they both are conceptually different from it, because they do not assess the relational and patient-oriented work of the GP. Furthermore, patient enablement is a major goal in most western healthcare systems, and thus a useful supplement to the patient experience items. A recent study from England showed the importance of empowerment for patients in a general practice setting.27 However, more research is needed to evaluate the predictive validity of the Enablement scale, particularly the association between the scale and other outcomes like compliance and health outcome.
The focus on cooperation in Norwegian healthcare is strong, with more resources invested in projects and initiatives to improve cooperation and integration of care. Internationally, cooperation and integration of care are also important topics and specific instruments have been developed and validated.28–32 Thus, the Cooperation scale should be a part of the PEQ-GP in Norway. However, the Cooperation scale presents two challenges: (1) the items constituting the scale were relevant only to about 75% of the patients; (2) the national scale score was surprisingly high, given the fact that poor cooperation was the diagnosis before the implementation of the national Cooperation Reform. Regarding the first challenge, one approach would be to include this scale only when a particular interest lies in the cooperation topic, which was the reason for including cooperation in the first place. A statistical recommendation is to increase the sample size by 25% at the GP or practice level to compensate for the high levels of item missing. The second problem is probably related to the fact that the cooperation items are formulated rather generally and are substantially quite closely related to the GP items, the latter documented by the large correlations with the GP scale. Future development and research work is needed to test more specific questions about cooperation, with formulations more focused on the joint responsibility for cooperation between GPs and other services.
The PEQ-GP can be used to inform individual doctors about their patients’ view of their services, providing a basis for quality improvement for practices and individual GPs. Follow-up studies indicate that feedback through patient evaluations may be used for quality improvement in healthcare.33–35 Low missing values indicate that the items are acceptable to the patients, and moderate levels of ceiling effects indicate that the instrument could be useful for detecting changes over time and differences between GPs. However, repeated measurements are required to detect changes; thus we are not able to test the sensitivity of the instrument. Another limitation is the low number of respondents for each GP. This makes it difficult to test the appropriateness of using the PEQ-GP scales as a basis for external quality indicators, including the ability to discriminate between GPs and/or practices. Results from the pilot study indicate that there are substantial differences between GPs, but further studies are needed to assess the PEQ-GP’s benchmarking abilities at both the GP and practice levels.
The total number of respondents is high. This is a strength of the study, but the response rate of 49% is not optimal. Earlier studies indicate small differences between non-responders and responders. Furthermore, response rate is in itself a poor indicator of non-response bias.36–38 The lack of background information about non-respondents in the present study is obviously a weakness, and means that we are not able to rule out the possibility of non-response bias. One possible indicator of bias is the high proportion of women among the responders (76%). As we do not have any information about the non-responders, we cannot relate this result to response propensity. Other studies of patient experiences in primary care have reported that around 2/3 of the responders were women.39 40 In a Danish study, Heje et al stated that this proportion reflects the consultation pattern between the genders.41 In Norway, there is also evidence that a higher proportion of women than men see their GP during any 1 year, and that they have more consultations than men.42 However, there is no clear evidence in the GP field that women evaluate health services differently than men. We therefore assume that the high proportion of women does not cause any substantial problems for the assessment of the properties of the PEQ-GP.
The retest population differs slightly from the rest of the responders on several variables, implying that test–retest results should not automatically be generalised to the total respondent sample. The purpose of the retest was to measure the reproducibility of the results within the same individuals, as a test of the questionnaires reliability. The results supported the test–retest reliability of the PEQ-GP in this sample, but further studies should try to replicate findings to achieve a more robust knowledge base.
Another weakness of this study is that some patients may have had little contact with their GP. The reasons may be numerous, such as no need for medical services, contact with other GPs or staff changes in the GP office. Limited contact with their GP may affect the patients’ ability to evaluate the GP, and furthermore, a long period of time since visiting the GP may cause recall bias or a blurred impression. To reduce this weakness, only patients with at least one consultation in the last 12 months were included in the study.
The PEQ-GP includes important aspects of patient experiences with GPs. The questionnaire has evidence for data quality, reliability and construct validity. The PEQ-GP is recommended for future studies designed to assess patient experiences with GPs in Norway, but further research is needed to test the appropriateness of using the PEQ-GP as a basis for external quality indicators and in other countries.
The authors thank Tomislav Dimoski for managing the data collection, developing the software used in data collection and management (The FS system) and carrying out the technical aspects of the data collection, and Marit Skarpaas and Inger Opedal Paulsrud for administrative help in data collection. We also thank Jon Helgeland for calculating the sample model and Andrew Garratt for advice in the translation process.
Contributors All authors participated in the planning process. OH performed the statistical analyses and drafted the manuscript. HHI, OAB and KD critically revised the manuscript draft and approved the final version of the manuscript. OH was the project manager for the development of the questionnaire. All authors participated in the development process.
Funding The study was financed by the Norwegian Knowledge Centre for the Health Services (merged with the Norwegian Institute of Public Health in 2016).
Competing interests None declared.
Ethics approval The Norwegian Regional Committee for MedicalRegional committee for medical and health research ethics south east.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement The datasets generated and/or analysed during the current study are not publicly available due to protection of personal information but selected data are available from the corresponding author on reasonable request.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.