Article Text

Original research
Quality and reporting of patient-reported outcomes in elderly patients with hip fracture: a systematic review
  1. Puck van der Vet1,
  2. Sandra Wilson2,
  3. R Marijn Houwert1,
  4. Egbert-Jan Verleisdonk3,
  5. Marilyn Heng2
  1. 1Department of Trauma Surgery, University Medical Centre Utrecht, Utrecht, Netherlands
  2. 2Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, Massachusetts, USA
  3. 3Department of Surgery, Diakonessenhuis, Zeist, Netherlands
  1. Correspondence to Puck van der Vet; puck_vandervet94{at}


Objective To assess how patient-reported outcomes (PROs) are reported and to assess the quality of reporting PROs for elderly patients with a hip fracture in both randomised controlled trials (RCTs) and observational studies.

Design Systematic review.

Data sources Medline, Embase and CENTRAL were searched on 1 March 2013 to 25 May 2021.

Eligibility criteria RCTs and observational studies on geriatric (≥65 years of age) patients, with one or more PRO as outcome were included.

Data extraction and synthesis Primary outcome was type of PRO; secondary outcome and quality assessment was measured by adherence to the Consolidated Standards of Reporting Trials (CONSORT) extension for patient-reported outcomes (CONSORT-PRO). Because of heterogeneity in study population and outcomes, data pooling was not possible.

Results 3659 studies were found in the initial search. Of those, 67 were included in the final analysis. 83.6% of studies did not adequately mention missing data, 52.3% did not correctly report how PROs were collected and 61.2% did not report adequate effect size. PRO limitations were adequately reported in 20.9% of studies and interpretation of PROs was adequately reported in 19.4% of studies. Most Quality of Life (QoL) outcomes were measured by the EuroQol 5-Dimension 3-Levels, and pain as well as patient satisfaction by Visual Analogue Scale.

Conclusion This study found that a high variety of PRO measures are used to evaluate geriatric hip fracture care. In addition, 47.8% of studies examining PROs in elderly patients with hip fracture do not satisfy at least 50% of the CONSORT-PRO criteria. This enables poorly conducted research to be published and used in evidence-based medicine and, consequently, shared decision-making. More efforts should be undertaken to improve adequate reporting. We believe extending the CONSORT-PRO extension to Strengthening the Reporting of Observational Studies in Epidemiology for observational studies would be a valuable addition to current guidelines.

  • Hip
  • Orthopaedic & trauma surgery

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • To our knowledge, this is the first systematic review on the reporting quality of patient-reported outcomes in geriatric hip fracture care.

  • This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement.

  • A limitation is that we did not perform a meta-analysis because we could not pool data.

  • Another limitation is that we did not include study protocols or non-English studies, which may have caused bias.


Hip fractures are a major health problem, affecting approximately 18% of women and 6% of men.1 Mortality and morbidity rates after hip fractures are high and often cause loss of independence in elderly patients. The estimated cost of hip fractures in the USA was 17 billion in 2002.2 In addition, due to the ageing population, the incidence of hip fractures continues to increase and is estimated to be 4.5 million in 2050.3 Consequently, hip fractures are considered one of the biggest challenges of future healthcare.

Traditionally, research on patients with geriatric hip fracture tended to focus on objective, clinician-reported outcomes measures, such as mortality, complication rate and hospital length of stay. However, recent literature stresses the importance of patient-reported outcomes (PROs) to evaluate treatment, estimate cost-effectiveness and improve clinical decision-making and patient-centred care.4

A PRO is generally defined as an outcome of a patient’s health condition or health behaviour that is reported directly by the patient, without interpretation of the patient’s response by a clinician or anyone else.5 6 Patient-reported outcomes measures (PROMs) are instruments to measure these outcomes. Earlier literature on PROs in elderly patients with a hip fracture shows that a hip fracture negatively affects PROs such as Quality of Life (QoL), and that Fear of Falling is increased after a hip fracture, thereby increasing the risk of a secondary fracture.7 PROs are frequently used in recent hip fracture studies researching different types of treatment, or studies investigating the potential benefit of comanaged care pathways (ie, orthogeriatric care pathways).8 9

While there are already many instruments and questionnaires to measure PROs, and though more measures are currently being developed, there is little to no guidance as to which measure should be used and how PROs should be analysed and interpreted.10 Earlier studies showed that the quality of reporting PROs in randomised controlled trials (RCTs) was low and that PROs are prone to risk of bias.11 Therefore, in 2013, the Consolidated Standards of Reporting Trials (CONSORT) extension for patient-reported outcomes (CONSORT-PRO) was published for RCTs to improve quality of reporting.12 So far, no such measurement exists for harbouring the quality of reporting PROs in observational studies. The objective of this study was to assess how PROs are reported and to assess the quality of reporting PROs for elderly patients with a hip fracture in RCTs and observational studies. We hypothesised that most RCTs would not adhere to the CONSORT-PRO extension and observational studies would have low quality of reporting PROs for elderly patients with a hip fracture.


This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement.13 There is no published protocol for this review.

Search strategy and study selection

We conducted a literature search in PubMed/Medline, CENTRAL and Embase for PRO(M) studies on elderly patients (above 65 years of age) with a hip fracture, published after 27 February 2013 (date of the publication of the CONSORT-PRO extension). The last search was conducted on 25 May 2021. Online supplemental table A shows the search syntax. After removing duplicates, two independent reviewers (SW and PV) screened titles and abstracts for eligible studies. Eligible studies were RCTs or observational studies on elderly patients with a hip fracture, with at least one PRO as an outcome measure. Outcome measures such as the Charnley score, which can be measured by both clinician and patient, were included only if reported by the patient and if it was described in the article as PRO.

Hereafter, full texts were screened by the same two reviewers. Inclusion criteria were studies on patients over 65 years of age, who were treated (non)-surgically for a hip fracture, and the study had to report one or more PRO. Exclusion criteria were non-English studies, no available full text, letters, case reports, reviews, conference papers and study protocols. Disagreements on eligibility and inclusion were resolved through discussion and, if needed, consultation of a third researcher. References of the included studies were screened for studies that were not found in the original literature search.

Data extraction

Data were extracted by two independent reviewers (SW and PV). Methodological quality was assessed using the Methodological Index for Non-Randomized Studies (MINORS) criteria for observational studies and the CONSORT 2010 PRO extension tool for randomised trials. MINORS is a 12-item list that corresponds with a scale ranging from 0 to 24 for comparative studies and 0–16 for non-comparative studies (the last 4 criteria are not applicable to non-comparative studies).14 Disagreements were resolved through discussion and consensus. Online supplemental table B shows details of the critical appraisal.

Patient and public involvement

No patients and/or public were involved in the design and conduct of this research.

Primary and secondary outcomes

Primary outcomes included the type of PROM reported and whether it was a primary or a secondary endpoint in the study. Secondary outcome was adherence to the CONSORT-PRO extension. The CONSORT-PRO extension adds five PRO extension items to the CONSORT-2010 Statement and provides PRO elaborations to nine CONSORT criteria, which results in 14 criteria. We based this scoring list on an earlier conducted study on the uptake of CONSORT-PRO in all types of studies.15 Two independent reviewers (PV and SW) scored the adherence to the CONSORT-PRO criteria. For the purpose of this analysis, we scored the criteria ‘2’ if it was adequately described (by which was meant that a criterium (for instance missing data) was mentioned in the article and that the article involved an explanation of how the criterium was assessed), ‘1’ when it was inadequately described (by which is meant that a criterium (for instance missing data) was mentioned, but an explanation of how missing data was handled was missing), and ‘0’ when it was not reported/missing, thus resulting in a score ranging from 0 to 28. Disagreements were resolved through discussion and consensus with a third reviewer. In accordance with the previously mentioned article, adherence was deemed ‘good’ when >80% of criteria were met, ‘moderate’ when 50%–79% of criteria were met, and ‘poor’ when <49% of criteria were met. Before scoring, the reviewers discussed the CONSORT-PRO extension to ensure uniformity. A preformed data sheet was used for data extraction, including basic study characteristics: first author, date of publication, journal of publication, country of study, intervention, sample size and follow-up period. Journals were screened on their websites and their ‘instruction for authors’ page on CONSORT/STROBE endorsement. Journals’ impact factor was recorded through the 2020 Web of Science Journal Citation Report.

Statistical analysis

Statistical analyses were carried out using SPSS Statistics V.27 (IBM Corporation). Results were evaluated descriptively, using total numbers and percentages for categorical variables. Mean values (M) along with the standard deviation (SD) were used for numeric, normally distributed data. The Shapiro-Wilks test was used to assess normal distribution. We used the χ2-test to assess differences in categorical values. Since this systematic review contained little missing data (<5%), and the data were missing completely at random, we performed complete case analysis. Multivariable logistic regression analysis was used to observe the relation between (in)adequate reporting of CONSORT-PRO criteria and the study characteristics. To correct for possible overfitting, we included only those variables that were notable at an alpha level of 0.10 in the bivariate analyses. The models standardly included study design and journal. P values with a significance level of 0.05 were considered statistically significant. A subgroup analysis was performed according to study design to assess differences between experimental and observational research.



A flow chart of the literature search and study inclusion is shown in figure 1. After title and abstract screening, 93 studies were assessed for eligibility, of which 67 studies were included in the final analysis. Forty-two were RCTs and 25 were observational cohort studies. A total of 44 journals were included, with impact factors ranging from 0.471 to 10.668 (table 1).

Table 1

Characteristics of journals with the highest number of included studies and highest impact factors

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analysis flow diagram presenting the search and selection of studies evaluating PRO(M) in elderly patients with hip fracture.

Study characteristics

The 67 included studies analysed a total of 53 931 patients, of which 15 124 were men. Weighted mean age was 83.90±1.98 (range 71.0–99). Follow-up ranged from 0 to 144 months. Study characteristics are displayed in table 2 and detailed study characteristics are shown in online supplemental table C. Most studies originated from Scandinavia, Europe and China. Figure 2 displays demographic distribution.

Table 2

Characteristics of included studies

Figure 2

Demographic distribution of the included studies. Count: number of studies from each country. Light blue=1, orange=4, 5, dark blue=8.

PRO reporting

In total, 48 studies reported QoL. Of those, 32 (66.7%) used the EQ-5D-3L as measurement. Twenty-one studies included pain as an outcome measurement. Of those 21 studies, 13 (61.9%) used a VAS as measurement and only one study used a validated pain scale (the PAINAD). Patient-reported functional outcomes were measured using different outcome measurements, most of which were not PROs (ie, Harris Hip score). ADLs were most commonly measured using the Barthel index (45.8%). Details on PROM reporting are shown in table 3.

Table 3

PRO(M) outcomes

Consolidated Standards of Reporting Trials extension for Patient-Reported Outcomes

A total of 10 articles (15.4%) mentioned adherence to the CONSORT/STROBE statement. There were no studies that mentioned the CONSORT-PRO, or any other PRO-reporting guideline. Figure 3, table 4 and online supplemental table D show the adherence to the CONSORT-PRO criteria. Overall reporting mean was 16.6±5.8 (range 1–26).

Figure 3

Number of studies adhering to the Consolidated Standards of Reporting Trials extension for Patient-Reported Outcomes (CONSORT-PRO) criteria. Red: no description, yellow: inadequate description, green: adequate description.

Table 4

Modified Consolidated Standards of Reporting Trials extension for Patient-Reported Outcomes (CONSORT-PRO) extension

Thirty (44.8%) studies adequately described the PRO in their abstract and validity and reliability were cited in 76.1% of studies. Adequate reporting was also noted on criteria C15 (baseline outcomes, 76.1%) and C16 (denominator, 83.6%).

Inadequate reporting on PRO hypothesis was noted in 85.1% of studies (n=57). Statistical approaches to missing data were not adequately reported in 83.6% of studies (n=56). 61.2% of studies did not adequately report an effect size or a precision measure, 80.6% did not adequately report interpretation of the PRO results and 79.1% did not adequately report on any PRO-related limitations.

Subgroup analyses

RCT versus observational

Overall, CONSORT-PRO criteria were reported more adequately in RCTs than in observational cohort studies (p=0.026). A significant difference between RCTs and observational research was found in adequate reporting of sample size calculations (p=0.002).

Predictors of higher CONSORT-PRO score

A significant predictor of higher adherence to the CONSORT-PRO score was the study design (coefficient: 1.240, 95% CI 0.165 to 2.333, p=0.026, adjusted R2=0.279). The second significant predictor for higher CONSORT-PRO scores was if CONSORT-adherence was stated in the methods of an article (coefficient: 1.925, 95% CI 0.422 to 3.429, p=0.012, adjusted R2=0.279). Journal, country and whether PRO was reported as a primary or secondary outcome did not significantly predict CONSORT-PRO adherence.


To our knowledge, this is the first systematic review that analyses (the quality of) reporting PROs in studies on geriatric patients with a hip fracture. Though PROs are often reported in hip fracture studies, the quality of reporting PROs is not monitored. We found high variability in outcome measures. EQ-5D-3L was most commonly used for measuring QoL, VAS for pain and patient satisfaction, KATZ for ADL. Second, we found that while the overall quality of reporting seems to improve compared with previous literature, the reporting of missing data, PRO limitations and PRO interpretation continues to be suboptimal.

Earlier studies have also reported high variation in outcome measures. A recent scoping review from the National Trauma Research Action Plan group also found high heterogenicity in outcome measures.16 Standardisation of PROMs should be pursued to adequately analyse and fully comprehend health outcomes in the trauma population. A promising initiative to make PRO reporting more uniform is the PRO measurement information system (PROMIS), a reporting system that was developed in 2007 with the support of the National Institute of Health. This study, however, finds that PROMIS is still infrequently used in research on elderly patients with a hip fracture.

Compared with previous literature, this study generally found higher CONSORT-PRO adherence. In 2015, Bylicki et al investigated CONSORT-PRO adherence in randomised clinical trials evaluating cancer therapy. They used a modified version of the CONSORT-PRO questionnaire and found lower adherence scores in 8 out of 13 criteria.17 Stevens et al conducted a systematic review on PRO reporting in RCTs of unplanned general surgeries and also found lower adherence scores in 7 of the 11 CONSORT-PRO criteria that they reported.18 Mercieca-Bebber et al also studied uptake of the CONSORT-PRO in studies of high-impact journals in 2017 for the International Society for Quality of Life Research (ISOQOL) and they found similar reporting rates (67.6% in their study vs 67.3% in this study; in studies who did not report CONSORT adherence) and (77.7% vs 80.0% in studies who reported CONSORT adherence).15

A possible explanation for the difference in reporting rates could be that our study was conducted at least 4 years later than the earlier systematic reviews on PRO reporting. Knowledge of the CONSORT-PRO extension may be more prominent now, eight years after publication of the CONSORT-PRO extension, which may also improve reporting of PROs in general over time. It could also mean that reporting guidelines (CONSORT, CONSORT-PRO, STROBE) are effective in improving reporting quality. Still, it remains notable since we also include observational studies, for which no PRO guideline exists.

Nonetheless, adherence to several important CONSORT-PRO criteria remained poor. Most included articles in this study did not adequately state their statistical approaches for dealing with missing data. Missing data are common, and sometimes unavoidable in PRO-related research.19 Yet, it is commonly known that missing data can potentially lead to biased estimates and loss of information, study power and interpretability.19 A number of approaches for dealing with missing data are known, and they have been published by ISOQOL to assist researchers in dealing with missing data. It is worrying that so few studies reported their (methods for dealing with) missing data and that so few journals, regardless of their impact factors, corrected it during peer review. Another finding was that 61.2% of studies did not report appropriate effect size and precision measurement. Most studies only reported p values. Though a p value expresses statistical significance; it provides no information about the magnitude of a possible effect. Reporting p values by themselves can easily lead to misinterpretation of the results and should therefore be avoided.20 21 An effect size (ie, OR, relative risk, Pearson correlation) shows the magnitude of the difference between groups and is therefore more helpful in interpreting results. Along with a precision measure, they are now required in most reporting standards.

Other important PRO criteria that were rarely met were ‘interpretation of PRO findings’ and ‘PRO limitations’ (both were met in only ~20% of cases). Failure to report limitations and failure to adequately interpret results can lead to unwarranted conclusions. Since outcomes of PRO studies are used as important arguments in shared-decision making, high reporting quality is important. Nonetheless, this study shows that not only authors, but also journals still fail in adequate reporting. This limits the objectiveness of medical research and more efforts should be made to improve quality of reporting.

Observational studies in this systematic review had significantly lower reporting rates than RCTs, as hypothesised. Moreover, the study design (RCT vs observational study) was a significant predictor for CONSORT adherence. No reporting guidelines for PROs currently exist for observational studies. Hence, when interpreting the results, it should be kept in mind that the observational studies were scored with a scoring system that was not specifically designed for that study design. This could be a reason for the lower adherence scores. RCTs are often favoured over observational studies due to the possible unmeasured bias or confounding that can occur in observational research. However, surgical RCTs are difficult to conduct and, as shown in this article, can lead to misleading results in case of inadequate reporting. Recent literature thus stresses the usefulness of observational studies in trauma surgery.22 We therefore believe that extending the CONSORT-PRO extension to STROBE (an example of which is found in online supplemental table E) would be valuable for future research.

This study has several limitations. Though we made a thorough search strategy along with an experienced librarian, a publication bias could have occurred along with potential bias due to the exclusion of non-English studies and studies with patients<65 years of age. However, since we included a relatively large amount of studies from different databases, we do not think additional studies would have significantly changed our conclusion. Another limitation is that we included studies from April 2013 and the protocols of these studies may have been written before CONSORT-PRO publication. This may have caused adherence to be low. Third, we reviewed published articles and online supplemental material of included studies but not study protocols. Some of the CONSORT-PRO criteria may have been addressed in the study protocol. However, CONSORT-PRO advises that the information should be noted in the final report as well. Despite these limitations, this study adds to the existing literature as it is the first to provide a broad overview of quality of reporting PRO in this study population and shows current shortcomings of PRO research. It also serves as a benchmark to monitor quality of PRO reporting in future orthopaedic trauma studies.


This systematic review shows that a wide variety of PRO measures are used to evaluate care in geriatric patients with a hip fracture. To better understand health outcomes, we encourage attempts at standardisation of PRO measures, linking of PRO measures, and creation of standard outcome sets. In addition, while we found slightly higher CONSORT-PRO adherence than previous literature, the reporting of missing data, reporting PRO limitations, and—interpretation remains poor, which impedes adequate interpretation by clinicians and the use of PROs in clinical practice. We therefore advise researchers, reviewers and journal editors to ensure proper PRO reporting. To improve quality of PRO reporting in observational studies, we recommend extending the CONSORT-PRO studies to the STROBE guidelines for observational studies.

Data availability statement

All data relevant to the study are included in the article or uploaded as supplementary information.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.


Lisa Philpotts, Treadwell Library, Massachusetts General Hospital, Boston, MA, for her guidance in the development of this systematic review’s search strategy.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @marilyn_heng

  • Contributors Conceptualisation was performed by: PvdV, SW, E-JV and RMH. Data curation was performed by: PvdV, SW. Formal analysis, investigation, project administration, visualisation, and writing - original draft were performed by: PvdV, SW, MH. Funding acquisition is not applicable. Methodology was performed by: PvdV, MH. Resources and supervision were performed by: MH, RMH, E-JV. Software is not applicable. Validation was performed by: PvdV, SW, MH, RMH. Writing - review and editing was performed by: all authors. PvdV takes full responsibility for the finished work and/or the conduct of the study, had access to the data, and controlled the decision to publish.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.