Comprehensive review of statistical methods for analysing patient-reported outcomes (PROs) used as primary outcomes in randomised controlled trials (RCTs) published by the UK’s Health Technology Assessment (HTA) journal (1997–2020)

Objectives To identify how frequently patient-reported outcomes (PROs) are used as primary and/or secondary outcomes in randomised controlled trials (RCTs) and to summarise what statistical methods are used for the analysis of PROs. Design Comprehensive review. Setting RCTs funded and published by the United Kingdom’s (UK) National Institute for Health Research (NIHR) Health Technology Assessment (HTA) Programme. Data sources and eligibility HTA reports of RCTs published between January 1997 and December 2020 were reviewed. Data extraction Information relating to PRO use and analysis methods was extracted. Primary and secondary outcome measures The frequency of using PROs as primary and/or secondary outcomes; statistical methods that were used for the analysis of PROs as primary outcomes. Results In this review, 37.6% (114/303) of trials used PROs as primary outcomes, and 82.8% (251/303) of trials used PROs as secondary outcomes from 303 NIHR HTA reports of RCTs. In the 114 RCTs where the PRO was the primary outcome, the most used PRO was the Short-Form 36 (8/114); the most popular methods for multivariable analysis were linear mixed model (45/114), linear regression (29/114) and analysis of covariance (13/114); logistic regression was applied for binary and ordinal outcomes in 14/114 trials; and the repeated measures analysis was used in 39/114 trials. Conclusion The majority of trials used PROs as primary and/or secondary outcomes. Conventional methods such as linear regression are widely used, despite the potential violation of their assumptions. In recent years, there is an increasing trend of using complex models (eg, with mixed effects). Statistical methods developed to address these violations when analysing PROs, such as beta-binomial regression, are not routinely used in practice. Future research will focus on evaluating available statistical methods for the analysis of PROs.

Among PROs accepted for the study the Authors also report non validated "self-developed measures by researchers alongside trials". I believe that the use of non validated instruments should be discouraged and I guess whether or not the inclusion of such non validated instruments affects the quality of the study.
The very ample time window 1997-2020 is another bivalent characteristic of this study: on one side, it introduces variability and confusion, on the other side it allows to test time trends in the type of statical methods used in the trials. On average, I find very interesting the time-trend analysis. However, a comment on the potential heterogeneity deriving from the large time-window might be added among the limitations. I would move the paragraph related to missing management on top of description of results, because this is a critical point to be addressed before any statistical analysis (from line 27-33 page 12 to around line 23 page 11).

GENERAL COMMENTS
Manuscript identifier: bmjopen-2021-051673 Title: A comprehensive review of statistical methods for analysing patient-reported outcomes (PROs) used as primary outcomes in randomised controlled trials (RCTs) published by the United Kingdom's Health Technology Assessment (HTA) Journal (1997-2020) The topic of the article is timely and important. I have few suggestions to merely discuss further in the article.
Introduction, section 2: In addition to the three sentences, I would remind readers that the RCT methodology does not bypass the possible systematic error from sources such as invalid measurements (e.g. poorly or mistakenly filled PRO forms, ceiling or floor effects of the PRO scales), publication bias or selective reporting of statistical analyses. Introduction, section 3: In addition to the remind of the Normality assumption, for linear regression more important assumptions are: measurements are valid (see above), the relationship between measurement (PRO scores) and ones objective (how well patients are doing) is truly linear, and thirdly the dependent variables are additive in relation to the independent variable. Also, you could provide reference to some article that corrects some of the common misunderstandings about the normality assumption, e.g. [1]. Methods: Nothing to add, clearly written. Results, section 3: The most trials measured and reported the baseline assessment of the PRO scale but you haven't reported how many of the RCTs adjusted for this baseline score in the statistical analyses. Often the baseline and post-baseline assessments are correlated within individuals and not taking this into account (in my experience, quite common among small RCTs) increases the SEs of the estimands. See e.g. [2,3,12,13,[4][5][6][7][8][9][10][11]. I'd suggest you report how often the baseline score of the PRO was adjusted in the sample of 114 trials. Also, the presence of covariate adjustment would be interesting to see among the sample, but it could enlarge the article too big, so make your own choice about that.
Discussion: Overall the discussion section is balanced, although I'd prefer to see more educational or more elaboration scope in it. The current situation among medical RCTs regarding statistical analyses is suboptimal in many aspects: non-publication bias [14,15], selective reporting [16], secrectly HARKing [17], poor conduct and analysis [18], overconfidence about responder analyses [19][20][21][22], poor comprability of the results due to endless amount of PRO scales [23][24][25], and lastly, inadequate understanding of the basic statistics concepts among medical researchers and other readers (clinicians) [26][27][28][29][30][31] just to name a few examples with overlapping themes with this study. Briefly, I would add to the discussion section some content on obstacles about PROs and statistical analyses in the current atmosphere of medical research.

VERSION 1 -AUTHOR RESPONSE
Comments from Reviewer 1 Comment 1: The aim of this study are clearly presented and the background is solid. The paper is very well written and for a clinical reviewer like me (not formally a statistician) it was easy to read and interesting. Data interpretation and discussion is honest and sound. Response: Thanks for your appreciation.  'We used a broad definition for a PRO and a small number of trials (seven) used PROs that were specifically developed for the trial and were not validated in another external study. The inclusion of such non-validated instruments as primary outcomes should be discouraged, and may have affected the results, although the characteristics of these PROs (Likert or VAS) are similar to those of the PROs that have been formally validated. We believe that it is not unreasonable to assume that the statistical analysis of such outcomes would be similar to the analysis of validated PROs.' 'Second, PRO data is likely to be discrete, skewed, and bounded (i.e. with ceiling effect and floor effects). [5] When analysing PRO data using a general linear model (including t-test, ANOVA, ANCOVA and linear regression), there are a number of assumptions[6]:

1.
The values of the outcome variable should have a Normal distribution for each value of the explanatory variable. This assumption means that the residuals are Normally distributed and should have a mean of zero;

2.
Constant variance or homoscedasticity of the outcome variable at each value of the explanatory variable;

3.
The relationship between the outcome variable and the explanatory variable should be linear;

4.
Independent observations in the sample. These assumptions such as Normality of residuals and linear relationship between outcome independent variable and explanatory dependent variables are likely to be violated. [7,8] Also, the application of statistical methods might vary according to PRO data and the aim of statistical analysis , but these features (multidimensional, discrete, skewed, and bounded) of PRO data may obscure the decision on what statistical methods need to be applied for the data analysis.' Methods: Nothing to add, clearly written. Comment 8: Results, section 3: The most trials measured and reported the baseline assessment of the PRO scale but you haven't reported how many of the RCTs adjusted for this baseline score in the statistical analyses. Often the baseline and postbaseline assessments are correlated within individuals and not taking this into account (in my experience, quite common among small RCTs) increases the SEs of the estimands. See e.g. [2,3,12,13,[4][5][6][7][8][9][10][11]. I'd suggest you report how often the baseline score of the PRO was adjusted in the sample of 114 trials. Also, the presence of covariate adjustment would be interesting to see among the sample, but it could enlarge the article too big, so make your own choice about that.
Response: This information about the number of trials adjusted for the baseline score in the statistical analysis is available on page 13 line 5-9 (Results -Statistical methods for the primary analysis of PROs: paragraph 6). Please note that the sentence is rephrased in the revised copy. 'Baseline values of the PRO were used to adjust analyses in 85 trials. Among them, 85 trials adjusted for baseline score of the PRO'. We decide not to extend the article by adding more details of the covariates in the sample, and we keep the initial description for the covariate adjustment which is shown on page 13 line 35 (Results -Statistical methods for the primary analysis of PROs: paragraph 6). 'There were 106 trials that used multivariable methods, of which (98/106) clearly reported the covariates adjusted in the primary analysis.' Comment 9: Discussion: Overall the discussion section is balanced, although I'd prefer to see more educational or more elaboration scope in it. The current situation among medical RCTs regarding statistical analyses is suboptimal in many aspects: non-publication bias [14,15], selective reporting [16], secrectly HARKing [17], poor conduct and analysis [18], overconfidence about responder analyses [19][20][21][22], poor comparability of the results due to endless amount of PRO scales [23][24][25], and lastly, inadequate understanding of the basic statistics concepts among medical researchers and other readers (clinicians) [26][27][28][29][30][31] just to name a few examples with overlapping themes with this study. Briefly, I would add to the discussion section some content on obstacles about PROs and statistical analyses in the current atmosphere of medical research. Response: Thank you for providing these points to enrich the discussion. We have covered publication bias and selective reporting in the study limitation [page 17 line 7-23 (Discussion: paragraph 8)]; and the problem of poor reporting of PRO outcomes is covered on page 18 line 3-10 (Discussion: paragraph 12). We have added a paragraph in the discussion to stress some of these obstacles [page18 line 26-38 (Discussion: paragraph 14)].