Article Text
Abstract
Objectives Gain an overview of expected response rates (RRs) to patient-reported outcome measures (PROMs) in clinical quality registry-based studies and long-term cohorts in order to better evaluate the validity of registries and registry-based studies. Examine the trends of RRs over time and how they vary with study type, questionnaire format, and the use of reminders.
Design Literature review with systematic search.
Data sources PubMed, MEDLINE, EMBASE, kvalitetsregistre.no, kvalitetsregister.se and sundhed.dk.
Eligibility criteria Articles in all areas of medical research using registry-based data or cohort design with at least two follow-up time points collecting PROMs and reporting RRs. Annual reports of registries including PROMs that report RRs for at least two time points.
Primary outcome measure RRs to PROMs.
Results A total of 10 articles, 12 registry reports and 6 registry articles were included in the review. The overall RR at baseline was 75%±22.1 but decreased over time. Cohort studies had a markedly better RR (baseline 97%±4.7) compared with registry-based data at all time points (baseline 72%±21.8). For questionnaire formats, paper had the highest RR at 86%±19.4, a mix of electronic and paper had the second highest at 71%±15.1 and the electronic-only format had a substantially lower RR at 42%±8.7. Sending one reminder (82%±16.5) or more than one reminder (76%±20.9) to non-responders resulted in a higher RR than sending no reminders (39%±6.7).
Conclusions The large variation and downward trend of RRs to PROMs in cohort and registry-based studies are of concern and should be assessed and addressed when using registry data in both research and clinical practice.
- registries
- patient reported outcome measure
- PROM
- response rate
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
The first review with a systematic search for published response rates of patient-reported outcome measures in registries and prospective cohorts.
Inclusion of non-peer-reviewed sources such as annual reports from national clinical quality registries.
Lack of standardisation in this area of research makes it difficult to generate a comprehensive search string to identify all relevant articles. Searches were conducted in multiple databases, however, not all relevant articles were identified in these searches and were instead retrieved from publication lists of registries.
Including data from both cohorts and registries makes the data heterogeneous, and therefore, difficult to analyse.
Introduction
Clinical quality registries (CQRs) have played an important role in monitoring and benchmarking patient outcomes, and have thus improved treatment for patients.1–4 A CQR systematically collects data about specific patient groups for a predetermined objective.4 Many health authorities have also increased focus on incorporating patient-reported outcome measures (PROMs) when developing patient-centred healthcare systems.5 PROMs help to quantitatively assess a patient’s health condition and quality of life and are included in many national and international CQRs to follow up patients over time. Data from properly designed and well-executed registries can be analysed to provide a realistic view of clinical practice and patient outcomes, and compare effectiveness of treatments.2 4 6 Applying PROM-data in clinical practice brings many benefits for both patients and physicians, such as more objective and quantifiable measure of symptoms and complaints, more in-depth understanding of the patient’s perspective and better informed decision making through discussions about health-related quality of life.6–8 Furthermore, with increased government spending on healthcare in many countries, CQRs can improve clinical practice at a relatively low cost and have a significant net-positive return on investment.8 9
However, for registries to provide an accurate representation of reality, it is important that there is both a high reporting completeness and a high response rate (RR) to PROM questionnaires, distributed evenly across the patient demographic. Reporting completeness refers to the percentage of all patients meeting inclusion criteria that are reported to, and included in, the registry. RR is the percentage of patients who fill in and submit PROMs at baseline and subsequent follow-ups. High reporting completeness and RR provide a large data sample and increased statistical power, and are therefore essential for society to benefit from registries and for registries to maintain a positive cost–benefit ratio.6 However, there is no standard definition of RR and no uniform formula by which RRs are calculated. The American Association for Public Opinion Research reports six different methods for calculating RR.10 One of the main differences among methods for calculating RR is whether deceased patients or patients who did not receive the questionnaire from the sample are excluded.11
Many registries aim for a reporting completeness of over 90% and an RR of at least 80% but this is difficult to achieve. RRs to PROMs tend to decrease over time, making conclusions less representative in the long term. Previous studies have demonstrated that age, gender and other factors such as socioeconomic status affect RRs.12–14 Furthermore, studies have indicated that patients with higher health-related quality of life scores and satisfaction may be over-represented in PROM studies, even in studies with relatively high RRs.15 16 These potential selection and attrition biases can affect the resulting estimate of the treatment effect, especially in registries that collect data over multiple time points.5 17 There have been several studies on how to increase RR to questionnaires and a number of methods have been identified,18 yet many registries do not employ these methods.
With an increasing number of registries being established and invested in, it is important to understand what kind of RR to PROMs is realistic and achievable, how RRs change over time and what can be done to improve them. The nature of registry data and PROM follow-ups make them especially useful for comparing methods of treatment where randomised controlled trials (RCTs) are not feasible and long-term prognosis is unknown. Registries are often not limited by the strict inclusion or exclusion criteria used in clinical studies; these criteria may result in selection bias and thereby reduce external validity.19 Registry manuscripts are increasingly important for clarifying crucial questions debated among healthcare workers, especially in the field of orthopaedic surgery where pain and health-related quality of life are often the deciding factors for surgical treatment.20
One important issue, therefore, is to define the level of evidence these registry-based studies reveal and identify what a realistic RR may be in manuscripts using registry data to test a hypothesis. RR is not the only indicator of the quality of a registry, but its easily measurable and comparable qualities make it suitable as a primary outcome. There have been numerous studies demonstrating the beneficial effects of CQRs but none, to our knowledge, that review RRs to PROMs in registries. This article aims to review the available literature and gain an overview of the expected RR over time in a registry-based study, including PROMs, in order to better evaluate the validity of future registries and registry-based studies.
Methods
We obtained the data for this study through two systematic searches of peer-reviewed papers, as well as from annual reports from CQRs. The search protocol was adapted from Øglund et al.21 Search 1 was performed on 11 April 2017 in PubMed (www.ncbi.nlm.nih.gov), and search 2 was performed on 12 December 2017 in MEDLINE and EMBASE through Ovid (www.ovid.com). Search terms can be found in table 1.
Terms used in the systematic searches
Inclusion criteria
We included journal articles for studies conducted in humans after 1 January 1990 published in English and Scandinavian languages. Articles in all areas of medical research that used PROMs and reported RR for at least two time points were included in order to understand change in RR over time. We were mainly interested in articles using registry-based data, but also included prospective cohort studies with a follow-up of at least 1 year, as we believe long-term cohorts have a similar logistical administration as registries. For registry-based articles, we accepted any follow-up time. We excluded RCTs as the nature and logistics of an RCT do not reflect the logistics of a registry. We also excluded articles where children under 16 years were respondents, articles with patients unable to answer the PROMs themselves or where the PROMs were filled in at control appointments as these factors can affect the RR. In the case of studies with multiple articles for follow-up time points, all related articles reporting RR were included.
Registry search
As we did not identify a large number of journal articles meeting inclusion criteria in the systematic searches, we supplemented them with annual reports by CQRs. We searched through all CQRs in Scandinavia listed on each country’s national registry websites kvalitetsregistre.no, kvalitetsregister.se and sundhed.dk, as well as all national knee ligament or joint registries known to the authors. We also searched any registries mentioned in articles from the previous systematic searches, as well as the list of published articles from each registry to identify articles with RRs that were not identified in our previous searches. If the RRs to PROMs were described in the annual report but an article reported a different RR, both were included.
Review
The articles were assessed by two independent researchers to ensure inclusion of all relevant articles. We first performed a title review, followed by an abstract review, and lastly, a full-text review. In search 1, these phases were performed in EndNote by both primary researchers. The results were assessed after completion and an agreement was made in case of any differences. In search 2, these phases were conducted in Covidence (www.covidence.org). The annual reports from CQRs were assessed by only one of the researchers. Due to the nature of the annual reports as describing the activity of a registry, we considered it unnecessary for two researchers to assess them. Risk of bias was not assessed as this was deemed not relevant for the main outcome measure.
Data extraction
Relevant data from the included articles were extracted by the same two researchers to ensure accuracy. Data from annual reports were extracted by one researcher as they were presented more clearly. Data extracted from the articles and reports include year of publication, lead author, format of questionnaires, number of reminders sent, and RRs and their time points. We also collected data on which questionnaires were sent at each follow-up and if this changed between follow-ups, but these data were not readily available or specified clearly in all of articles and was therefore not included in this paper. Methods used to calculate RR were not explicitly stated in most of the articles and were therefore not collected. When subgroups within articles or reports had separate RRs reported, these were treated as separate subgroups instead of averaging the RR. For example, some studies reported separate RRs for different types of PROMs or for different patient groups. Where indicated, the RR was calculated by a researcher using the available numbers. The main outcome measure was RRs to PROMs. If the relevant data were not described in the article or report, an email was sent to one of the authors or the registry with a request for the missing information.
Statistics
The results are presented with descriptive statics (average ± SD) and figures using IBM SPSS Statistics V.25. Microsoft Excel was used to visualise the data. No statistical tests were performed to compare the values due to the heterogeneity in the reporting of RRs. No meta-analysis was conducted.
Patient and Public Involvement
Patients or the public were not involved in the designing, conducting, reporting or dissemination of our research.
Results
A total of 5379 articles and 219 registries were identified, and 10 articles from the systematic searches, 12 registry reports and 6 registry articles were included in the final review (figures 1 and 2). Five of the 10 (50%) articles identified in the systematic searches were registry based, and 21 of the 28 (75%) articles and reports included were in the field of orthopaedic surgery. The included articles and annual reports are outlined in table 2. We also looked at the effect of reminders and the format of the questionnaires, with more than one reminder and paper-only format being the most common. Additionally, four studies changed the number of PROMs sent at subsequent follow-ups. With regard to RR calculation, eight articles did not report how RR was calculated, while seven excluded deceased patients or those lost to follow-up and only one study included them. We did not find any mention of how RR was calculated in any of the registry annual reports.
Flow chart of article selection from systematic searchers. PROMs, patient-reported outcome measures; RCT, randomised controlled trial.
Flow chart of registry annual reports and article selection. PROMs, patient-reported outcome measures.
Included articles and annual registry reports with extracted data
Overall RR
The overall RR at baseline starts at 75%±22.1 but decreases over time. This average includes all RRs defined by the author to be baseline RRs and RRs to PROMs collected preintervention. The average RRs at different times are shown in table 3. They demonstrate a wide variation in RRs over time and the trend is visualised in figure 3.
All response rates over time.
Average response rates with SD for time periods
Subgroup RR
Table 3 also shows average RRs with SD for all subgroups. When RR in cohort studies was compared with that in registry-based data, the scatter plot indicates a higher average RR for cohorts (figure 4), and the average for cohort studies at baseline of 97%±4.7 is far above both the goal RR of 80% and the registry average of 72%±21.8. One reminder led to a higher RR at 82%±16.5 and a slower decrease compared with no reminders (39%±6.7) or more than one reminder (76%±20.9). Questionnaires on paper had an RR of 86%±19.4 at baseline and showed better results than electronic at 42%±8.7, or a mix of paper and electronic formats at 71%±15.1.
Comparing response rate over time for cohort studies versus registry-based data.
Change in RR per year
Figure 5 shows the amount of change in RR per year. This was calculated by subtracting the final RR from the baseline or first reported RR and dividing the difference by the total length of follow-up time. The data points appear to approach zero with time, indicating that RR decreases less the longer the follow-up time.
Average change in response rate per year over total follow-up time for each article and report.
Discussion
To our knowledge, this is the first review with systematic searches examining RRs in registry-based articles and cohorts including at least two time points, to gain a better understanding of long-term trends in RRs to PROMs. There is an ongoing debate as to which level of evidence registry-based studies belong in, as there are inherent issues with registries such as a non-homogeneous definition of outcomes and lack of control of confounding factors.22 We believe the current study will be helpful for reviewers assessing registry manuscripts to determine what RR can be expected. Hopefully, this will contribute to an improvement in quality in this field of research.
Considering the goal that many registries have of reaching and maintaining an 80% RR, the average baseline RR for all included articles and reports of 75% is close but with a large SD. As expected, the results depicted a downward trend in RRs over time, but with an unexpected slight increase after the 5 year follow-up time. This was reflected in the 2–5 years RR being the lowest (50%±16.1), and a subsequent increase in the 5–10-years period, with the average reaching 61%±23.0. Most likely, this is due to the large number of studies with only up to 2 years of follow-up, resulting in data that is negatively skewed before the 5 year time point. There are seven studies where RR increased more than 1% at subsequent follow-ups. Of these, two studies only sent questionnaires to responders. Another four are orthopaedic studies, where negative symptoms often increase at later stages post-intervention, which could have motivated patients to fill in the PROMs to report their symptoms. The last study by Porchet et al appears to have only sent questionnaires to responders at subsequent follow-ups and contacted some patients to fill in the questionnaire over the phone.
A good CQR should have a high reporting completeness and therefore track all patients that meet inclusion criteria, while a long-term cohort study follows only a selection of patients meeting inclusion criteria over a specific time period. The higher RR in cohort studies compared with that in registry data raises the question of whether long-term PROM follow-ups of all patients included in registries are worth the monetary investment as compared with performing single long-term cohort studies with PROMs. The latter can be nested within registries by predetermining specific cohorts (eg, a selection of patients annually or a full yearly cohort every 5 years) and investing more in improving long-term RRs to PROMs of these cohorts. This is similar to the model used by the New Zealand Joint Registry for their hip and knee arthroplasty patients, where a random selection of patients is sent the PROMs in order to achieve a 20% RR, which the registry deemed sufficient to provide powerful statistical analyses.23 The average RR for cohorts is well over 80% at both the baseline and the 1-year follow-up, and just below 80% at the 2 years follow-up. However, many registries have only recently started collecting PROMs and few have had multiple 10-year follow-up collections, so it may be too early to suggest nested cohorts. Furthermore, Elkan et al showed that RR did not affect PROM results after lumbar discectomy.24 This indicates that a goal of 80% may not be necessary, but further studies are needed to determine an acceptable limit. The limit may also vary depending on the field, as previous studies have demonstrated significant differences between responders and non-responders both in demographic and in outcomes in various fields.12 15
Another result worthy of further investigation is the higher RR to paper-based questionnaires compared with electronic or mixed formats. With advancements in technology resulting in almost everyone in western populations having easily accessible internet, more and more registries are moving towards web-based PROMs. An online questionnaire requires less time and money to administrate, making it an attractive alternative to collecting paper surveys. However, this approach has been met with problems such as difficulties registering emails or reminders and notifications ending up as spam in already crowded inboxes. Some registries are looking into other alternatives such as social media and mobile applications to better reach patients.25 A recent study found that a mix of electronic and paper surveys achieved a higher RR than only electronic surveys, raising the baseline RR to 100% and the subsequent follow-ups at 3, 6 or 12 months to 83%, from ca. 55%.26 However, this came at an increased cost, which raises the question of whether the additional paper survey costs are justifiable,26 and further highlights the need for identifying an RR that validates PROM data.
With regard to reminders, our results reflect conclusions from previous studies that at least one reminder improves RR.18 However, the lack of studies with a follow-up longer than 3 years with more than one reminder means we cannot suggest with any certainty how many reminders to send for longer follow-ups. We also did not examine the effect of the number of follow-ups on RRs and this could also affect the number of reminders that should be sent. If the follow-ups are more frequent, more than one reminder may negatively affect the RR, while more than one reminder may be necessary to maintain or improve the RR for more spread-out follow-ups.
Figure 5 shows that the change in RR decreases over time and appears to approach zero. This suggests that if a registry starts with a low overall RR that does not decrease significantly at subsequent follow-ups and is consistent within a certain subgroup, those results can potentially still be representative for that subgroup. Thus, the results would still provide important information regarding outcomes and treatment decisions for that specific subgroup, but would not support any conclusions for the general patient population in the registry.
Limitations
There are a number of limitations to this review. RRs were not uniformly calculated and reported across all sources. Fifteen of the studies and registries had insufficient information regarding processes for sending out questionnaires and reminding patients, and eight articles did not report how RRs were calculated. Four studies and registries only sent follow-up questionnaires to previous responders and three changed the number of PROMs at subsequent follow-ups. The heterogeneity of the RR calculation methods makes direct statistical comparison impossible, however, we chose to visualise the data points to better understand the trends in RR over time. In order to improve this research area and create better grounds for future research, we made some general comparisons such as calculating averages despite weak statistical grounds.
Although we wanted to focus on the RRs to PROMs in registries, a large portion of registries did not report the RRs even though they are a key factor in the validity of the data and results in their annual reports. The registries we examined were mainly Scandinavian with the inclusion of only a few other registries from the UK and New Zealand to help supplement the data collected from articles. Further studies are needed to systematically evaluate trends in RRs across Europe, USA and other countries. We did not examine how the number of questions or length of questionnaires affected the RRs as longer questionnaires are known to decrease RR.18 Lastly, we did not consider the effect of the number of follow-ups on RRs as this may also affect how willing a patient is to respond in the long term. This is especially important for some orthopaedic interventions that may give promising results in the short term but lead to decreased quality of life in the long term. We recognise that these limitations restrict the generalisability of the results, but the intention of this review was mainly to gain an overview of this area of research and make recommendations based on the limitations we found to improve the foundation for future research.
Standard terminology and recommendations for registries
Based on the data collected, the literature we have reviewed, and the authors’ experience with registries, we have created a set of recommendations to improve standards for future research in this area. We also noticed a lack of consistent terminology across articles and registries, and therefore, have created a suggested standard terminology as depicted in figure 6.
If a study involves PROMs, RRs should always be reported. If there are grounds for not reporting RR, they should be explained.
All CQRs should clearly report RRs to PROMs according to the recommendations specified in this article.
All studies involving PROMs should specify the number of questionnaires, which (standardised) questionnaires are included and if they have been modified or if only parts or specific questions are used.
The method for calculating RR should always be stated. We also recommend including a flow chart of patient inclusion and response similar to a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart for systematic reviews. Examples can be found in the studies by Gjeilo et al and Olsson et al.
When possible and applicable, RR should always be reported as the percentage of the total number of patients who are able to fill in a questionnaire,who fill in the questionnaire and return it. This total number should not include patients who have passed away or patients who are not reachable (eg, the wrong address), or be affected by whether questionnaires were filled incompletely and/or correctly. These factors are not included so that the RR accurately represents the percentage of all patients who are able to fill in the questionnaire that actively constitute the study sample. However, if the study protocol or relevant statistical analysis does not allow for this formula, we maintain that the method of calculating RR should still be clearly stated.
We recommend sending follow-up questionnaires to all patients able to fill in a questionnaire, and not only to the patients who responded to previous questionnaires. This may help decrease non-response bias, and previous non-responders can still contribute to future follow-up time points.
We recommend all registries send at least one reminder as that has been shown to improve RR.
Although paper format produced a higher RR in this study, we recognise this may notbe sustainable as more registries move to electronic questionnaires. Instead, we recommend baseline PROMs to be filled in pre-intervention at the hospital as this has been shown to increase post-intervention participation.14
Suggested standardised terminology for registries and articles reporting rrs to PROMs. PROMs, patient-reported outcome measures.
Conclusions
The large variation and downward trend of RRs to PROMs over time in cohorts, registry-based studies and registries are of concern. There are actions CQRs can take in order to improve their RRs, such as sending at least one reminder or implementing a mixed electronic and paper approach. However, further research is warranted to clarify an acceptable RR in order for PROM data in registry-based studies to be used to inform treatment decisions.
References
Footnotes
Contributors CNE, RBJ and AÅ devised the project and the main conceptual ideas. CNE and KW performed the searches and data extraction. KW analysed the data and wrote the manuscript with support from CNE and consultation with RBJ and AÅ.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available in a public, open access repository. Extra data can be accessed via the Dryad data repository at http://datadryad.org/ with the doi: 10.5061/dryad.hhmgqnkcp.