Analysis and reporting of adverse events in randomised controlled trials: a review

Objective To ascertain contemporary approaches to the collection, reporting and analysis of adverse events (AEs) in randomised controlled trials (RCTs) with a primary efficacy outcome. Design A review of clinical trials of drug interventions from four high impact medical journals. Data sources Electronic contents table of the BMJ, the Journal of the American Medical Association (JAMA), the Lancet and the New England Journal of Medicine (NEJM) were searched for reports of original RCTs published between September 2015 and September 2016. Methods A prepiloted checklist was used and single data extraction was performed by three reviewers with independent check of a randomly sampled subset to verify quality. We extracted data on collection methods, assessment of severity and causality, reporting criteria, analysis methods and presentation of AE data. Results We identified 184 eligible reports (BMJ n=3; JAMA n=38, Lancet n=62 and NEJM n=81). Sixty-two per cent reported some form of spontaneous AE collection but only 29% included details of specific prompts used to ascertain AE data. Numbers that withdrew from the trial were well reported (80%), however only 35% of these reported whether withdrawals were due to AEs. Results presented and analysis performed was predominantly on ‘patients with at least one event’ with 84% of studies ignoring repeated events. Despite a lack of power to undertake formal hypothesis testing, 47% performed such tests for binary outcomes. Conclusions This review highlighted that the collection, reporting and analysis of AE data in clinical trials is inconsistent and RCTs as a source of safety data are underused. Areas to improve include reducing information loss when analysing at patient level and inappropriate practice of underpowered multiple hypothesis testing. Implementation of standard reporting practices could enable a more accurate synthesis of safety data and development of guidance for statistical methodology to assess causality of AEs could facilitate better statistical practice.

3 however only 35% of these reported whether the withdrawals were due to AEs. Results presented and analysis performed was predominantly on 'patients with at least 1 event' with 84% of studies ignoring repeated events. Despite a lack of power to undertake formal hypothesis testing, 47% performed such tests for binary outcomes.

Conclusions
This review highlighted that the collection, reporting and analysis of AE data in clinical trials is inconsistent and RCTs as a source of safety data are underutilised. Areas to improve include reducing information loss when analysing at patient level and inappropriate practice of underpowered multiple hypothesis testing. Implementation of standard reporting practices could enable a more accurate synthesis of safety data and development of guidance for statistical methodology to assess causality of AEs could facilitate better statistical practice.

Keywords
Randomised controlled trials; adverse events; harm data; adverse drug reactions; systematic review; investigational drug.

Strengths and Limitations of this study
1. This is the first systematic review to examine and quantify analysis practices for AEs in RCTs.
2. This review characterises what those leading the field in clinical trials are doing and provides some examples of good practice that could be adopted. 3. Articles included in this review were published in four of the top ranked medical journals therefore results are likely to be biased towards better findings than we would expect if we included all RCTs. 4. At present there is no guidance as to the best statistical methodology to assess causality of AEs in RCTs.

INTRODUCTION
The methods to analyse and report beneficial effects from randomised controlled trials (RCTs) are well developed but this progress has not been matched for adverse event (AE) outcomes. An adverse event is defined as 'any untoward medical occurrence that may present during treatment with a pharmaceutical product but which does not necessarily have a causal relationship with this treatment'. 1 An adverse drug reaction (ADR) is defined as 'a response to a drug which is noxious and unintended …' where a causal relationship is 'at least a reasonable possibility'. 1,2 RCTs provide an opportunity to compare rates of AEs between arms allowing causality to be evaluated. However current analysis and reporting practices are inadequate.
Previous studies have examined the methods for AE collection and presentation, and highlighted the inadequacies in AE reporting in journal articles. [3][4][5][6][7][8][9][10][11][12] In 2004 the Consolidated Standards of Reporting Trials (CONSORT) Group produced an extension to their guidelines for reporting trial results to cover the reporting of harms, however implementation of these guidelines has been shown to be poor. 6 on what should be reported in journal articles and how it should be displayed to ensure transparency and aid clinical interpretation. They promote the use of clinical judgement in reporting rather than mandatory guidance. 14 However there remains uncertainty about best practice for reporting, analysing and presenting AE data.
There are many challenges associated with analysing and reporting AEs in clinical trials. RCTs are typically designed to determine the efficacy of an intervention but are often underpowered to detect important differences in AEs between arms which may suggest an ADR. Often large numbers of AEs are reported during a study, sometimes exceeding the number of patients in the clinical trial. Performing hypothesis tests on these AEs would lead to issues of multiplicity, however any adjustment for multiplicity would make a 'finding untenable'. 15,16 The use of hypothesis testing may result in the medicinal product being deemed unsafe and a trial being halted too early due to a chance imbalance, or conversely deemed safe and not stopped early enough resulting in more patients than necessary suffering an ADR. 15,17,18 Unlike efficacy outcomes which are well defined and restricted in number at the planning stage of a RCT, we collect numerous, undefined AEs in RCTs. Furthermore, AE collection requires additional information to be obtained on factors such as severity, timing and duration, number of occurrences and outcome, which for our efficacy outcomes would have all been predefined.
The aim of this review was to evaluate current best practice for collection, reporting and analysis of AEs in RCTs. The aim being to identify and promote any areas of good practice, whilst highlighting any areas for improvement.

Search strategy
Four high impact medical journals that publish clinical trials of drug interventions were selected: The

Selection criteria
The inclusion criteria were phase II-IV RCTs of drug interventions where the primary outcome was efficacy of the intervention. We did not restrict according to number of treatment arms and included both parallel and cluster RCTs. We excluded cross-over RCTs, RCTs with adaptive randomisation, observational studies, case reports, editorials and letters. We also excluded RCTs where the intervention was not a drug product (i.e. not classified as a clinical trial of an investigational medicinal product (CTIMP)). As the study aimed to assess how authors report and analyse AEs in efficacy trials, trials that were specifically designed to investigate safety as a primary outcome were not included.

Data extraction
Potentially eligible articles were identified based on titles and abstracts and the full text of these studies were retrieved. Supplementary material was also reviewed if readers were referred here from the main article for further results. Supplementary Table A1 lists all data items captured with guidance given to the reviewers for extraction. We focused on the following areas: how AE data was collected (mode of collection, timing) and defined (coding, attribution); how AEs were assessed in terms of severity of the event or relatedness to the medical intervention; if there was any planned AE analysis (final and interim monitoring plans and analysis populations); how events were selected for inclusion in the journal article; how summary event information was presented in the journal article and how AEs were analysed. 7 The data extraction sheet was piloted and then single data extraction was performed by three reviewers (RP, VC and LH) with 10% independent check of a randomly sampled subset to verify quality. Where specific items were flagged for poor agreement these were re-extracted. Agreement between authors was over 80%. Any queries during data extraction were shared and disagreements between reviewers were resolved through discussion.

Data analysis
The proportion of trials reporting each item, 3-4 and 8-34 in supplementary Table A1 were calculated and summary statistics (median and ranges) were calculated for items 5-7. All analyses were performed in Stata version 15. 19

Study characteristics
The search identified 184 eligible trial reports (BMJ n=3; JAMA n=38, Lancet n=62; and NEJM n=81) in which a total of 496911 participants were randomised with a median of 556 participants per trial (range 30, 205513; interquartile range (IQR) 281, 1704). The median trial follow-up was 52 weeks (range 48 hours to 10 years; IQR 24, 104 weeks) and 93% were multi-centre trials. Fifty-percent of studies had an active comparator and over 50% of trials received some element of industry funding (Table 1).  examinations and/or laboratory results presented) included details on the timing of such assessments) it was often clear from the results presented that participants had undergone these assessments (83% and 79% of studies reported clinical and laboratory results respectively) ( Table 3).

Prespecified analysis
Thirty-one percent of reports provided information on the planned analysis for AEs in the statistical analysis section of the paper and 45% pre-specified a safety population ( Table 2, examples 3-4 and   Table 3). 22, 23 A quarter of trials reported planned interim analysis with stopping criteria (Table 3), five (2.7%) of which included specific criteria on stopping for a harmful event (Supplement Table A2).

Selection of AEs and reporting practices
Five trials did not report any information on AEs. Two of these reports made the following statements "there were no significant adverse events related to the procedure" and "no excess in mortality or major adverse events were found…" and three made no mention of AEs. [24][25][26][27][28] Twenty-four (13%) trials only provided a summary of the number of AEs or serious AEs rather than listing the actual AEs that occurred. For example "Six serious adverse events occurred in the acetaminophen group and 12 in the ibuprofen group." 29 Of these 24 trials, 10 did provide specific details of the types of events in an appendix. This means 8% of trials either did not report AEs or only included a summary (Table 4).  ), 2 reports that provide generic statements regarding AE data and 1 report that only reported continuous outcomes f This includes 3 reports with no AE data and 2 reports that provide generic statements regarding AE data (as per footnote e ) g 6 papers specifically state that no serious adverse events occurred Eighty-nine percent of trials reported a subset of all the AEs they collected. How AEs are 'selected' for inclusion in the article was not consistent or clear, and in 3% of studies it was impossible to discern how the authors had selected the AEs they presented for inclusion. Twenty-six percent of reports selected events based on a frequency threshold e.g. events experienced by greater than x% in any group; 9% of reports used a measure of severity to select events e.g. AEs of grade 3 or higher; 23% of reports included events based on seriousness; and 8% included AEs based on relatedness to treatment (percentages are not independent as the majority of reports used several different criteria for selection). Supplementary Table A3 provides full details of selection criteria used.  16 We found that 41% of trials analysed AEs in participants that received at least one dose, 29% of trials used all randomised participants and 9% did not specify the analysis population (Table 4). Further details on analysis populations used are given in supplementary Table A5.
Nearly 80% of trials reported the number of participants who withdrew from the trial; of these 35% (51 of 146 reports) reported whether the withdrawals were due to AEs and of these 24% (12 of 51 reports) reported the actual events that caused withdrawals. Results presented and analysis performed was predominantly on 'patients with at least 1 event' with 84% of reports providing no information on the number of events occurring. An example of how to incorporate information on number of events is presented in 30 . Forty-one percent of trials reported information on the severity of AEs. Five percent of trials include a report of at least one event with duration, but presenting such data is limited in the main report. The trials that did present this information did so in a variety of ways. For example incorporating the information into the AE table with summary statistics such as the mean duration of certain events or presenting it for a subgroup of events in the footnotes of AE tables e.g. "One event of non-serious squamous cell carcinoma (day 210, resolved on day 215; adalimumab treatment was not interrupted)." [31][32][33] Twenty-eight percent of reports included information on the timing of AEs (Table 4).
Serious adverse events were typically well documented (73%) and six reports (3%) explicitly stated that no serious events had occurred. However for forty-four reports (24%) it was not possible to discern if no serious events had occurred or whether they were simply omitted from the report.
Forty-two percent (57 of 134 reports) of reports included details on whether the events had been classified as related to the intervention (Table 4).
Despite a lack of power to undertake formal hypothesis testing, 47% reported p-values for binary outcomes. For example "There were no between-group differences in the rate of patients with at least 1 adverse event (16.7% [14 patients] in the clopidogrel group vs 21.8% [19 patients] in the placebo group; difference, −5.2% [95% CI, −17% to 6.6%]; P = . 44)." However with a total safety population of 171 such a test would have only had 13% power to detect such a difference and was therefore substantially underpowered. The conclusion that "No significant increase in adverse events was observed" makes no reference to the 95% confidence interval presented which indicates that the findings were in fact compatible with a 17% decrease in experiencing at least on AE as well as a near 7% increase. 34

Continuous
There was a pervasive practise (59%) of categorising continuous clinical and laboratory outcomes.
Of the trials that did not dichotomise continuous AE data nearly 70% performed some form of statistical significance testing (Table 4). Whilst continuous outcomes do not suffer to the same degree regarding lack of power, multiple testing is still a problem, however no multiplicity corrections for continuous outcomes were performed.
Of the trials that performed statistical significance testing on AE data, only three made an adjustment for multiplicity of tests (all three on dichotomised outcomes). 31  Twelve percent of reports used graphs to illustrate AE data ( Table 4). The CONSORT extension highlighted the value of graphs for summarising such data, especially for conveying information on time-to-event outcomes. 37 An example of such a plot is included in the supplement of 38 (eFigure 2).

DISCUSSION
The safety profile of a medicinal product is established through evidence collected from several sources including clinical trials, observational studies and spontaneous reports. 39 The advantage of clinical trial data is that these provide a controlled comparison of the rate of AEs allowing causality to be evaluated but have the disadvantage that the sample size is often not large enough to detect rare ADRs.
To ensure that a useful and comprehensive picture of the safety profile is provided to all relevant parties clear reporting of AEs from clinical trials is required. Current research has shown the quality of reporting is substandard. [3][4][5][6][7][8][9][10][11][12] The aim of this study was to review best practice across four leading medical journals for AE collection, analysis and reporting practices, highlighting any areas for improvement and examples of good practice.

Principal Findings
Collection and assessment methods The CONSORT extension to harm was developed with the aim to improve reporting of safety data in RCTs. 37 Of the ten recommendations many were not well reported. 13 This suggests that the CONSORT extension is not being routinely adopted by authors to aid their reporting. Most journals now request that authors include a completed CONSORT checklist when they submit their article but we are not aware of any journal that request the CONSORT harm extension to also be submitted. Of the four journals in this review the Lancet is the only journal that makes specific reference to the harm extension in their guidelines to authors. The CONSORT statement contains a single item related to safety, item 19: 'all important harms or unintended effects in each group' should be reported. 37 This may explain why some items listed on the CONSORT extension for harm were reported by so few trials. The adoption of CONSORT harms by journals may support better reporting.
We found that the method of AE collection was poorly reported. This has important implications for the type and frequency of AEs reported with "passive collection resulting in fewer recorded AEs". 40,41 Where the method was given the timing of collection was typically also reported and we would recommend continuation of this practice. The frequency of AE collection has further important implications on the number of events reported. More frequent assessment and longer follow-up will result in more AEs reported. 13 It is important to consider these factors when making conclusions about the safety profile.
The method of attribution between drug and AE was another area where reporting practice was inadequate. However the joint pharmaceutical/journal collaboration indicate that such attribution has 'limited value' given the 'inherent subjectivity in such attribution'. 14  We found that formal assessments of AEs regarding stopping for emerging ADRs utilising statistical rules was rare. Subjective assessments of overwhelming amounts of data could easily lead to potential signals of harm being missed. There could be benefits to incorporating more objective methods alongside clinical review to monitor AE information, both for interim analysis by data monitoring committees (DMCs) and final trial analysis to help better identify drug harm relationships. Graphical displays have gone some way towards aiding interpretation. [42][43][44][45][46] Selection of AEs and reporting practices Due to space constraints in journal reports AE information is often included in the appendix. Whilst we encourage use of appendices and supplementary material for including additional detail on AEs, we caution authors against depositing all AE data into such documents without attempting to present a summary of the AE profile in the main article. It is important that the main report strikes a balance between efficacy and harm therefore allowing a risk-benefit assessment to be made solely from the article.
The failure to report any information on AEs restricts interpretation and prevents a risk-benefit assessment. We identified two reports that made generic summaries of the overall safety profile and it was clear in both that there had been harmful effects however the authors did not include any further information. Three reports contained no information leaving readers uninformed as to any additional information these studies may provide on the safety profile. Ambiguous reporting prevents building an accurate picture of the safety profile. As such profiles are developed on accumulating evidence, it is important that each study report to the same standard and information is not wasted.
We found that the selection criteria used by authors to decide what AEs to include in the report were arbitrary and inconsistent. This will have important implications when synthesising data across studies to construct safety profiles. Authors would benefit from guidance to facilitate consistency but currently research in this area is lacking. Lineberry et al. recommended clinically relevant events that should always be reported (deaths, SAEs and events leading to discontinuation of intervention) and criteria that should be considered when deciding what other AEs to report e.g. interest based on the disease(s) under investigation, comorbidities of the study population, intervention mechanism, trial duration. 14 Standard outcomes for a drug class would be one potential solution to avoid issues of inconsistency suggested by Cornelius et al. 7 CONSORT recommend that AE analyses should be performed on the intention-to-treat (ITT) population to maintain the random assignment. 13 However it is clear from our review that this population label is not always appropriately and consistently applied. There is a tendency for studies to make modifications to the ITT population. Using the ITT or modified-ITT population is likely to underestimate the risk by inflating the denominator with participants who may have never received the study drug. 47 Such estimates are appropriate for health economic evaluations where estimates of the cost-effectiveness will inform policy level decisions regarding how to treat the population.  Proxy outcomes can be used as a measure of the impact of AEs on patients. Examples include the number of withdrawal due to any reason, withdrawals due to AEs, the number of events an individual experiences, the severity of the AE and the duration. A high proportion of trials reported withdrawal for any reason and this is likely to be as a result of the CONSORT recommendations. 37 The other outcomes were not frequently reported and increasing this could facilitate interpretation. 13 This information would permit better evaluation of the impact of AEs and the tolerability of the intervention to inform patients' and clinicians' treatment decisions. Reporting numbers that experience at least one event only and not providing information on repeated events masks valuable information that may be important to the patient and the cost-effectiveness evaluation. For example, chronic, repeated headaches over an extended duration will have an important impact for patients compared to a single headache or headaches over a short duration but it is not possible to distinguish between these two scenarios when reported as 'at least one event'. 14 Severity of events was also an important aspect that was often not differentiated. For example there would be different impact on patients' quality-of-life with mild compared to severe nausea and which could lead to changes in dosing regimens. Displaying such information for all AEs in tables would soon become overwhelming and make interpretation difficult. Graphical approaches have been suggested as a solution to aid review. Examples of such a plots can be found in 48 . Online appendices and supplementary material provide more opportunity to include this important information.
For serious adverse events information on the time of likely onset can be useful information to inform patient monitoring plans. For example the documented risk of suicide and suicidal ideation within the first few weeks of starting anti-depressant allows patients and prescribers to remain alert and monitor closely for this period. Nearly a third of reports included such information and we would encourage authors to adopt this practice. The majority of trials in this review included a balanced report of AEs alongside benefit. However many included generic statements regarding the safety profile such as 'the intervention was well tolerated' or 'the intervention exhibited a good safety profile' and these were frequently based on post-hoc statistical tests. Guidelines caution against such tests. 14 The results of which are difficult to interpret as a lack of significance does not indicate that the intervention is safe and conversely multiple testing without adjustment will increase the number of significant differences due to chance. 49,50 Graphs are an efficient method to convey and interpret large amounts of data and can make it easier to flag potential safety signals. 45,46,48 Twenty-two studies included in the review used graphs to present AE data and an example of one such report is given in the supplementary eTable of 51 .

Limitations of trials
Trials are a valuable source for high quality adverse event data but compared to observational studies have smaller sample size, follow-up periods and generalisability which restrict the ability to detect rare ADRs, ADRs with long latency and drug interactions in complex populations. The typical duration of a trial means there is often insufficient follow-up to fully characterise the safety profile as it provides limited information on long-term exposure. Stringent inclusion criteria restrict the

Limitations of this study
Articles included in this review were published in four of the top ranked medical journals therefore results are likely to be biased towards better findings than we would expect if we included all RCTs.
However this review characterises what those leading the field are doing and provides some examples of good practice that could be adopted.

Conclusions and recommendations for future work
RCTs are a valuable source of information establishing the safety profile of medicinal products. Our review has demonstrated that data is not currently being fully utilised. Analysis of AE data is frequently inappropriate and reports often provide insufficient and inconsistent information to allow a comprehensive summary of the safety profile to be established. This research has identified two areas that would benefit from future research. i) Improving the consistency of reporting important AE outcomes across trials to facilitate comparison and synthesis. This is in line with work from the COMET Initiative group (http://www.comet-initiative.org/). The development of CORE safety outcomes by drug class could be considered. 7 ii) Evaluation of methods to analyse AEs in RCTs.

FUNDING
This research was supported by the NIHR grant number DRF-2017-10-131.

DATA SHARING STATEMENT
No additional data are available.

AUTHOR STATEMENT
RP conceived the idea for this review, conducted the search, carried out data extraction and analysis, and wrote the manuscript. VC conceived the idea for the review, performed data extraction and critical revision of the manuscript. LH performed data extraction and critical revision of the manuscript. OS performed critical revision of the manuscript.    Describe who undertook the assessment of attribution to study drug: blinded assessor, unblinded assessor or not specified.

Planned analysis
Details of any plans for analysing AE outcomes 12 Describe analysis for AE outcomes in the statistical methods.
Reference must be made to harmful events e.g. AEs or a specific harm event, this cannot be simply how binary events will be analysed.

13
Define a 'safety' population for analysis.
• Increased treatment toxicity in either treatment group deemed excessive. Toxicity is defined as moderate or severe myalgias. • Increased severity of adverse events deemed "Probably Related" or "Possibly Related" to study intervention in either treatment group. Itemized adverse event reports separated by treatment will be provided.
• Increased AKI incidence in either treatment group deemed excessive. • Increased incidence of stroke or hemodialysis requirement in either group (secondary endpoints) deemed excessive." Beardsley et al. 55 "An independent data and safety monitoring committee oversaw trial safety and analyzed unblinded data after every 50 deaths, according to its charter ..." "The Haybittle-Peto boundary, requiring p<0.001 at interim analysis to consider stopping for efficacy, will be used as guidance. A level of significance of 1% will be used as a guide for stopping the trial early because of a detected harm of dexamethasone. In addition, the DMEC will receive conditional power curves to assess whether it remains realistic that the trial will demonstrate superiority of dexamethasone conditional on the data accrued up to the point of the interim analysis. Importantly, the DMEC recommendations will not be based purely on statistical tables but will also use clinical judgment." Kor et al. 56 "In addition to statistical criteria for significance, the study included a priori "go-no-go" definitions for recommending continuation to phase 3 study ... Briefly, continuation to phase 3 would occur with a positive primary outcome finding along with an acceptable safety profile. An acceptable safety profile was defined as a serious adverse event profile for aspirin that was not statistically worse than placebo (95% CI for the relative risk of any serious adverse event covers the null value of relative risk = 1.0). The "no-go decision" was defined as early termination by the data and safety monitoring board for safety or unfavorable risk/benefit ratio. An indeterminate case in which there was a non-statistically significant effect but this effect was in a clinically meaningful direction was also defined."  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   5 Nichol et al. 57 We used a group sequential statistical approach to do two equally spaced pre-planned interim analyses (at 33% and 67% of total recruitment) to assess accumulated safety data (differential proportions of deep venous thrombosis and total mortality). This approach was chosen to provide for early stopping for probable harm or strong evidence of benefit. We applied the Haybittle-Peto criterion (|Zk|≥3) for early stopping at these analyses.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y 6

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.
In your methods section, say that you used the PRISMA reporting guidelines, and cite them as: To ascertain current approaches to the collection, reporting and analysis of adverse events (AEs) in randomised controlled trials (RCTs) with a primary efficacy outcome.

Design
A review of clinical trials of drug interventions from four high impact medical journals.

Data sources
Electronic contents

Methods
A pre-piloted checklist was used and single data extraction was performed by three reviewers with independent check of a randomly sampled subset to verify quality. We extracted data on collection methods, assessment of severity and causality, reporting criteria, analysis methods and presentation of AE data.

Conclusions
This review highlighted that the collection, reporting and analysis of AE data in clinical trials is inconsistent and RCTs as a source of safety data are underutilised. Areas to improve include reducing information loss when analysing at patient level and inappropriate practice of underpowered multiple hypothesis testing. Implementation of standard reporting practices could enable a more accurate synthesis of safety data and development of guidance for statistical methodology to assess causality of AEs could facilitate better statistical practice.

Keywords
Randomised controlled trials; adverse events; harm data; adverse drug reactions; review; investigational drug.

Strengths and Limitations of this study
1. This is the first review to examine and quantify AE analysis practice in RCTs published in high impact journals.
2. This review identifies weakness that need to be addressed as well as good practice that could be adopted.
3. Articles included in this review were published in four of the top ranked general medical journals therefore results are likely to be biased towards better practice.

INTRODUCTION
The methods to analyse and report outcomes to measure benefit from randomised controlled trials (RCTs) are well developed but this progress has not been matched for adverse event (AE) outcomes. An adverse event is defined as 'any untoward medical occurrence that may present during treatment with a pharmaceutical product but which does not necessarily have a causal relationship with this treatment'. 1 An adverse drug reaction (ADR) is defined as 'a response to a drug which is noxious and unintended …' where a causal relationship is 'at least a reasonable possibility'. 1,2 RCTs provide an opportunity to compare rates of AEs between arms allowing causality to be evaluated. However current analysis and reporting practices are inadequate.
There are many challenges associated with analysing and reporting AEs in clinical trials. RCTs are typically designed to determine the efficacy of an intervention but are often underpowered to detect important differences in AEs between arms which may suggest an ADR. Often large numbers of AEs are reported during a study, sometimes exceeding the number of patients in the clinical trial. Performing hypothesis tests on these AEs would lead to issues of multiplicity, however any adjustment for multiplicity would make a 'finding untenable'. 3,4 The use of hypothesis testing may result in the medicinal product being deemed unsafe and a trial being halted too early due to a chance imbalance, or conversely deemed safe and not stopped early enough resulting in more patients than necessary suffering an ADR. 3,5,6 Unlike efficacy outcomes which are well defined and restricted in number at the planning stage of a RCT, we collect numerous, undefined AEs in RCTs. Furthermore, AE collection  18 Whilst this work has been undertaken there remains uncertainty about practice for reporting and presenting AE data, and in addition the analysis practice for AEs remains a neglected area for review.
The aim of this review was to evaluate current practice for collection, reporting and analysis of AEs in RCTs where the primary outcome was efficacy. The aim being to identify and promote any areas of good practice, whilst highlighting any areas for improvement.

Selection criteria
The inclusion criteria were phase II-IV RCTs of drug interventions where the primary outcome was efficacy of the intervention. We did not restrict according to number of treatment arms and included both parallel and cluster RCTs. We excluded cross-over RCTs, RCTs with adaptive randomisation, observational studies, case reports, editorials and letters. We also excluded RCTs where the intervention was not a drug product (i.e. not classified as a clinical trial of an investigational medicinal product (CTIMP)). As the study aimed to assess how authors report and analyse AEs in studies where the primary outcome was efficacy, trials that were specifically designed to investigate safety as a primary outcome were not included. how summary event information was presented in the journal article and how AEs were analysed. 11 The items to be extracted were based on the work by Cornelius et al. and the CONSORT Harms extension with new items added to capture more specific information on analysis practices. 11,17 A data extraction sheet was piloted and then single data extraction was performed by three reviewers (RP, VC and LH) with 10% independent check of a randomly sampled subset to verify quality. Queries were also informally discussed between reviewers on an ongoing basis. Where specific items were flagged for poor agreement these were re-extracted. Any queries during data extraction were shared and disagreements between reviewers were resolved through discussion.

Data analysis
The proportion of trials reporting each item, 3-4 and 8-34 in supplementary Table A1 were calculated and summary statistics (median and ranges) were calculated for items 5-7. All analyses were performed in Stata version 15. 19 A risk of bias assessment was not undertaken as this study aimed to describe best practice and not evaluate outcomes.

Patient and public involvement
This review forms part of a wider research project that was developed with input from a range of patient representatives. There were no study participants directly involved in this review but the original proposal and patient and public involvement (PPI) strategy were reviewed by service user representatives (with experience as clinical trial participants and PPI advisors) who provided advice specifically with regard to communication and dissemination to patient and public groups.

Data extraction
A total of 585 items were extracted twice across all three reviewers to check the quality of the data extraction. A total of 95 discrepancies were identified. This gave agreement of 84%. During this independent check several items were flagged for potential poor agreement. These items were 100% independently extracted by one author and verified. The items were: study duration; the AE collection method; timing of collection; how binary harm outcomes were summarised; whether continuous outcomes were dichotomised; if continuous outcomes were left as continuous how they were analysed.  reports with clinical examinations and/or laboratory results presented) included details on the timing of such assessments) it was often clear from the results presented that participants had undergone these assessments (83% and 79% of studies reported clinical and laboratory results respectively) ( Table 3).

Prespecified analysis
Thirty-one percent of reports provided information on the planned analysis for AEs in the statistical analysis section of the paper and 45% pre-specified a safety population ( Table 2, examples 3-4 and Table   3). 22, 23 A quarter of trials reported planned interim analysis with stopping criteria (Table 3), five (2.7%) of which included specific criteria on stopping for a harmful event (Supplement Table A2 24-28 ).

Selection of AEs and reporting practices
Two reports only made generic statements regarding AE data: "there were no significant adverse events related to the procedure" and "no excess in mortality or major adverse events were found…". Three reports made no mention of AEs throughout the manuscript. [29][30][31][32][33] Twenty-four (13%) trials only provided a summary of the number of AEs or serious AEs rather than listing the actual AEs that occurred. For example "Six serious adverse events occurred in the acetaminophen group and 12 in the ibuprofen group." 34 Of these 24 trials, 10 did provide specific details of the types of events in an appendix. This means 8% of trials either did not report AEs or only included a summary (Table 4).  a Make no reference to the appendix b 3 reports made no reference to AE data throughout the article c 5 reports indicate no withdrawals d 6 reports specify the number of withdrawals and reasons but none of the reasons are related to AEs e This includes 3 reports with no AE data (as per footnote b ), 2 reports that provide generic statements regarding AE data and 1 report that only reported continuous outcomes f This includes 3 reports with no AE data and 2 reports that provide generic statements regarding AE data (as per footnote e ) g 6 papers specifically state that no serious adverse events occurred Eighty-nine percent of trials reported a subset of all the AEs they collected. How AEs are 'selected' for inclusion in the article was not consistent or clear, and in 3% of studies it was impossible to discern how the authors had selected the AEs they presented for inclusion. Twenty-six percent of reports selected events based on a frequency threshold e.g. events experienced by greater than x% in any group; 9% of reports used a measure of severity to select events e.g. AEs of grade 3 or higher; 23% of reports included events based on seriousness; and 8% included AEs based on relatedness to treatment (percentages are Tables A3 and A4 provide full details of selection criteria used. We found that 41% of trials analysed AEs in participants that received at least one dose, 29% of trials used all randomised participants and 9% did not specify the analysis population (Table 4). Further details on analysis populations used are given in supplementary Table A5.
Nearly 80% of trials reported the number of participants who withdrew from the trial; of these 35% (51 of 146 reports) reported whether the withdrawals were due to AEs and of these 24% (12 of 51 reports) reported the actual events that caused withdrawals. Results presented and analysis performed was predominantly on 'patients with at least 1 event' with 84% of reports providing no information on the number of events occurring. An example of how to incorporate information on number of events is presented in 35 . Forty-one percent of trials reported information on the severity of AEs. Five percent of trials include a report of at least one event with duration, but presenting such data is limited in the main report. The trials that did present this information did so in a variety of ways. For example incorporating the information into the AE table with summary statistics such as the mean duration of certain events or presenting it for a subgroup of events in the footnotes of AE tables e.g. "One event of non-serious squamous cell carcinoma (day 210, resolved on day 215; adalimumab treatment was not interrupted)." [36][37][38] Twenty-eight percent of reports included information on the timing of AEs (Table 4).
Serious adverse events were typically well documented (73%) and six reports (3%) explicitly stated that no serious events had occurred. However for forty-four reports (24%) it was not possible to discern if no  (Table 4).
Despite a lack of power to undertake formal hypothesis testing, 47% reported p-values for binary outcomes. For example "There were no between-group differences in the rate of patients with at least 1 adverse event ( Of the trials that performed statistical significance testing on AE data, only three made an adjustment for multiplicity of tests (all three on dichotomised outcomes). 36,40,41 Two of which used a Bonferroni correction and adjusted for the number of pairwise comparisons between each of the treatment groups for each individual event rather than the total number of significance tests performed. As such both analyses would have still been effected by issues of multiple testing.
Twelve percent of reports used graphs to illustrate AE data ( Table 4). The CONSORT extension highlighted the value of graphs for summarising such data, especially for conveying information on timeto-event outcomes. 42 An example of such a plot is included in the supplement of reference 43 (eFigure 2).
We assessed any reference to the CONSORT Harm extension and found that none of the included studies mentioned it. Of the four journals included in the review, the Lancet was the only journal that made specific reference to the harm extension in their guidelines to authors.

DISCUSSION
The safety profile of a medicinal product is established through evidence collected from several sources including clinical trials, observational studies and spontaneous reports. 44 The advantage of clinical trial data is that these provide a controlled comparison of the rate of AEs allowing causality to be evaluated but have the disadvantage that the sample size is often not large enough to detect rare ADRs.
To ensure that a useful and comprehensive picture of the safety profile is provided to all relevant parties clear reporting of AEs from clinical trials is required. Current research has shown the quality of reporting is substandard. [7][8][9][10][11][12][13][14][15][16] The aim of this study was to review current practice across four leading medical journals for AE collection, analysis and reporting practices, highlighting any areas for improvement and examples of good practice.

Collection and assessment methods
The CONSORT extension to harm was developed with the aim to improve reporting of safety data in RCTs. 42 None of the included studies referenced the CONSTORT HARM extension and of the items in our review that are covered in CONSORT many were not well reported. 17 This suggests that the CONSORT extension is not being routinely adopted by authors to aid their reporting. Most journals now request that authors include a completed CONSORT checklist when they submit their article but we are not aware of any journal that request the CONSORT harm extension to also be submitted. Of the four journals in this review the Lancet is the only journal that makes specific reference to the harm extension in their guidelines to authors. The CONSORT statement contains a single item related to safety, item 19: 'all important harms or unintended effects in each group' should be reported. 42 This may explain why some items listed on the CONSORT extension for harm were reported by so few trials. The adoption of CONSORT harms by journals may support better reporting.  22 We found that the method of AE collection was poorly reported. This has important implications for the type and frequency of AEs reported with "passive collection resulting in fewer recorded AEs". 45,46 Where the method was given the timing of collection was typically also reported and we would recommend continuation of this practice. The frequency of AE collection has further important implications on the number of events reported. More frequent assessment and longer follow-up will result in more AEs reported. 17 It is important to consider these factors when making conclusions about the safety profile.
The method of attribution between drug and AE was another area where reporting practice was inadequate. However the joint pharmaceutical/journal collaboration indicate that such attribution has 'limited value' given the 'inherent subjectivity in such attribution'. 18 Prespecified analysis We found that formal assessments of AEs regarding stopping for emerging ADRs utilising statistical rules was rare. Subjective assessments of overwhelming amounts of data could easily lead to potential signals of harm being missed. There could be benefits to incorporating more objective methods alongside clinical review to monitor AE information, both for interim analysis by data monitoring committees (DMCs) and final trial analysis to help better identify drug harm relationships. Graphical displays have gone some way towards aiding interpretation. [47][48][49][50][51] Selection of AEs and reporting practices Due to space constraints in journal reports AE information is often included in the appendix. Whilst we encourage use of appendices and supplementary material for including additional detail on AEs, we caution authors against depositing all AE data into such documents without attempting to present a summary of the AE profile in the main article. It is important that the main report strikes a balance between efficacy and harm therefore allowing a risk-benefit assessment to be made solely from the article.
The failure to report any information on AEs restricts interpretation and prevents a risk-benefit assessment. We identified two reports that made generic summaries of the overall safety profile and it was clear in both that there had been harmful effects however the authors did not include any further information. Three reports contained no information leaving readers uninformed as to any additional information these studies may provide on the safety profile. Ambiguous reporting prevents building an accurate picture of the safety profile. As such profiles are developed on accumulating evidence, it is important that each study report to the same standard and information is not wasted.
We found that the selection criteria used by authors to decide what AEs to include in the report were arbitrary and inconsistent. This will have important implications when synthesising data across studies to construct safety profiles. Authors would benefit from guidance to facilitate consistency but currently research in this area is lacking. Lineberry  CONSORT recommend that AE analyses should be performed on the intention-to-treat (ITT) population to maintain the random assignment. 17 However it is clear from our review that this population label is not always appropriately and consistently applied. There is a tendency for studies to make modifications to the ITT population. Using the ITT or modified-ITT population is likely to underestimate the risk by inflating the denominator with participants who may have never received the study drug. 52  headaches over a short duration but it is not possible to distinguish between these two scenarios when reported as 'at least one event'. 18 Severity of events was also an important aspect that was often not differentiated. For example there would be different impact on patients' quality-of-life with mild compared to severe nausea and which could lead to changes in dosing regimens. Displaying such information for all AEs in tables would soon become overwhelming and make interpretation difficult.
Graphical approaches have been suggested as a solution to aid review. Examples of such a plots can be found in 53 . Online appendices and supplementary material provide more opportunity to include this important information.
For serious adverse events information on the time of likely onset can be useful information to inform patient monitoring plans. For example the documented risk of suicide and suicidal ideation within the first few weeks of starting anti-depressant allows patients and prescribers to remain alert and monitor closely for this period. Nearly a third of reports included such information and we would encourage authors to adopt this practice.

Analysis of AE outcomes
The majority of trials in this review included a balanced report of AEs alongside benefit. However many included generic statements regarding the safety profile such as 'the intervention was well tolerated' or 'the intervention exhibited a good safety profile' and these were frequently based on post-hoc statistical tests. Guidelines caution against such tests. 18 The results of which are difficult to interpret as a lack of Graphs are an efficient method to convey and interpret large amounts of data and can make it easier to flag potential safety signals. 50,51,53 Twenty-two studies included in the review used graphs to present AE data and an example of one such report is given in the supplementary eTable of 56 .

Limitations of trials
Trials are a valuable source for high quality adverse event data but compared to observational studies have smaller sample size, follow-up periods and generalisability which restrict the ability to detect rare ADRs, ADRs with long latency and drug interactions in complex populations. The typical duration of a trial means there is often insufficient follow-up to fully characterise the safety profile as it provides limited information on long-term exposure. Stringent inclusion criteria restrict the population the intervention is assessed in and so limited information on drug-interactions is obtained. 5

Limitations of this study
Articles included in this review were published in four of the top ranked medical journals therefore results are likely to be biased towards better findings than we would expect if we included all RCTs and are only for year 2015-2016. We also acknowledge that only completing 10% independent check of

FUNDING
This research was supported by the NIHR grant number DRF-2017-10-131.

DATA SHARING STATEMENT
No additional data are available.

Planned analysis
Details of any plans for analysing AE outcomes 12 Describe analysis for AE outcomes in the statistical methods.
Reference must be made to harmful events e.g. AEs or a specific harm event, this cannot be simply how binary events will be analysed.

13
Define a 'safety' population for analysis.
14 Specify a planned interim analysis with stopping criteria: based on efficacy, based on safety, based on both efficacy and safety, yes but no other details given, no planned interim analysis or unclear Criteria for stopping must be set out, it is not enough to say that the DMC reviewed the data.  "O'Brien-Fleming stopping boundaries were used to assess efficacy, and a less stringent boundary was used to assess harm." Billings et al. 25 "The data and safety monitoring board (DSMB) reviewed patient recruitment practices, safety reporting, and data quality after 30 patients completed the study; performed an interim analysis after 277 patients … had completed the study to assess safety of the intervention; and performed a second interim analysis after 546 patients … had completed the study to assess the safety, efficacy, and futility of the intervention. The DSMB made recommendations based on qualitative assessments of the safety, efficacy, and futility of the intervention…" "Suspend enrolment in any study arm … due to safety concerns based on study intervention. Safety concerns include: • Increase in in-hospital all-cause mortality in subjects randomized to A or B such that the DSMB deems the increase is excessive compared to A or B.
• Increased treatment toxicity in either treatment group deemed excessive. Toxicity is defined as moderate or severe myalgias. • Increased severity of adverse events deemed "Probably Related" or "Possibly Related" to study intervention in either treatment group. Itemized adverse event reports separated by treatment will be provided.
• Increased AKI incidence in either treatment group deemed excessive. • Increased incidence of stroke or hemodialysis requirement in either group (secondary endpoints) deemed excessive." Beardsley et al. 26 "An independent data and safety monitoring committee oversaw trial safety and analyzed unblinded data after every 50 deaths, according to its charter ..." "The Haybittle-Peto boundary, requiring p<0.001 at interim analysis to consider stopping for efficacy, will be used as guidance. A level of significance of 1% will be used as a guide for stopping the trial early because of a detected harm of dexamethasone. In addition, the DMEC will receive conditional power curves to assess whether it remains realistic that the trial will demonstrate superiority of dexamethasone conditional on the data accrued up to the point of the interim analysis. Importantly, the DMEC recommendations will not be based purely on statistical tables but will also use clinical judgment." Kor et al. 27 "In addition to statistical criteria for significance, the study included a priori "go-no-go" definitions for recommending continuation to phase 3 study ... Briefly, continuation to phase 3 would occur with a positive primary outcome finding along with an acceptable safety profile. An acceptable safety profile was defined as a serious adverse event profile for aspirin that was not statistically worse than placebo (95% CI for the relative risk of any serious adverse event covers the null value of relative risk = 1.0). The "no-go decision" was defined as early termination by the data and safety monitoring board for safety or unfavorable risk/benefit ratio. An indeterminate case in which there was a non-statistically significant effect but this effect was in a clinically meaningful direction was also defined."  28 We used a group sequential statistical approach to do two equally spaced pre-planned interim analyses (at 33% and 67% of total recruitment) to assess accumulated safety data (differential proportions of deep venous thrombosis and total mortality). This approach was chosen to provide for early stopping for probable harm or strong evidence of benefit. We applied the Haybittle-Peto criterion (|Zk|≥3) for early stopping at these analyses.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.
In your methods section, say that you used the PRISMA reporting guidelines, and cite them as: To ascertain current approaches to the collection, reporting and analysis of adverse events (AEs) in randomised controlled trials (RCTs) with a primary efficacy outcome.

Design
A review of clinical trials of drug interventions from four high impact medical journals.

Data sources
Electronic contents

Methods
A pre-piloted checklist was used and single data extraction was performed by three reviewers with independent check of a randomly sampled subset to verify quality. We extracted data on collection methods, assessment of severity and causality, reporting criteria, analysis methods and presentation of AE data.

Results
We identified 184 eligible reports (BMJ n=3; JAMA n=38, Lancet n=62; and NEJM n=81). Sixty-two percent reported some form of spontaneous AE collection but only 29% included details of specific prompts used to ascertain AE data. Numbers that withdrew from the trial were well reported (80%),

Conclusions
This review highlighted that the collection, reporting and analysis of AE data in clinical trials is inconsistent and RCTs as a source of safety data are underutilised. Areas to improve include reducing information loss when analysing at patient level and inappropriate practice of underpowered multiple hypothesis testing. Implementation of standard reporting practices could enable a more accurate synthesis of safety data and development of guidance for statistical methodology to assess causality of AEs could facilitate better statistical practice.

Keywords
Randomised controlled trials; adverse events; harm data; adverse drug reactions; review; investigational drug.

Strengths and Limitations of this study
1. This is the first review to examine and quantify AE analysis practice in RCTs published in high impact journals.
2. This review identifies weakness that need to be addressed as well as good practice that could be adopted.

INTRODUCTION
The methods to analyse and report outcomes to measure benefit from randomised controlled trials (RCTs) are well developed but this progress has not been matched for adverse event (AE) outcomes.
An adverse event is defined as 'any untoward medical occurrence that may present during treatment with a pharmaceutical product but which does not necessarily have a causal relationship with this treatment'. 1 An adverse drug reaction (ADR) is defined as 'a response to a drug which is noxious and unintended …' where a causal relationship is 'at least a reasonable possibility'. 1, 2 RCTs provide an opportunity to compare rates of AEs between arms allowing causality to be evaluated.
However, current analysis and reporting practices are inadequate.
There are many challenges associated with analysing and reporting AEs in clinical trials. RCTs are typically designed to determine the efficacy of an intervention but are often underpowered to detect important differences in AEs between arms which may suggest an ADR. Often large numbers of AEs are reported during a study, sometimes exceeding the number of patients in the clinical trial.
Performing hypothesis tests on these AEs would lead to issues of multiplicity, however any adjustment for multiplicity would make a 'finding untenable'. 3,4 The use of hypothesis testing may result in the medicinal product being deemed unsafe and a trial being halted too early due to a chance imbalance, or conversely deemed safe and not stopped early enough resulting in more patients than necessary suffering an ADR. 3, 5, 6 Unlike efficacy outcomes which are well defined and restricted in number at the planning stage of a RCT, we collect numerous, undefined AEs in RCTs.
Furthermore, AE collection requires additional information to be obtained on factors such as severity, timing and duration, number of occurrences and outcome, which for our efficacy outcomes would have all been predefined.  18 Whilst this work has been undertaken there remains uncertainty about practice for reporting and presenting AE data, and in addition the analysis practice for AEs remains a neglected area for review.
The aim of this review was to evaluate current practice for collection, reporting and analysis of AEs in RCTs where the primary outcome was efficacy. The aim being to identify and promote any areas of good practice, whilst highlighting any areas for improvement.

Search strategy
The top four general medical journals as ranked by impact factors that publish clinical trials of drug

Selection criteria
The inclusion criteria were phase II-IV RCTs of drug interventions where the primary outcome was efficacy of the intervention. We did not restrict according to number of treatment arms and included both parallel and cluster RCTs. We excluded cross-over RCTs, RCTs with adaptive randomisation, observational studies, case reports, editorials and letters. We also excluded RCTs where the intervention was not a drug product (i.e. not classified as a clinical trial of an investigational medicinal product (CTIMP)). As the study aimed to assess how authors report and analyse AEs in studies where the primary outcome was efficacy, trials that were specifically designed to investigate safety as a primary outcome were not included.

Data extraction
Potentially eligible articles were identified based on titles and abstracts and the full text of these studies were retrieved. Supplementary material was also reviewed if readers were referred here from the main article for further results. Supplementary Table A1 lists all data items captured with guidance given to the reviewers for extraction. The items to be extracted were based on the work by were selected for inclusion in the journal article; how summary event information was presented in the journal article and how AEs were analysed. 11 A more detailed rationale for the choice of items extracted is provided in the supplementary material (Table A2).
A data extraction sheet was piloted and then single data extraction was performed by three reviewers (RP, VC and LH) with 10% independent check of a randomly sampled subset to verify quality. Queries were also informally discussed between reviewers on an ongoing basis. Where specific items were flagged for poor agreement these were re-extracted. Any queries during data extraction were shared and disagreements between reviewers were resolved through discussion.

Data analysis
The proportion of trials reporting each item, 3-4 and 8-34 in supplementary Table A1 were calculated and summary statistics (median and ranges) were calculated for items 5-7. All analyses were performed in Stata version 15. 19 A risk of bias assessment was not undertaken as this study aimed to describe best practice and not evaluate outcomes.

Patient and public involvement
This review forms part of a wider research project that was developed with input from a range of patient representatives. There were no study participants directly involved in this review but the original proposal and patient and public involvement (PPI) strategy were reviewed by service user representatives (with experience as clinical trial participants and PPI advisors) who provided advice specifically with regard to communication and dissemination to patient and public groups.

Data extraction
A total of 585 items were extracted twice across all three reviewers to check the quality of the data extraction. A total of 95 discrepancies were identified. This gave agreement of 84%. During this independent check several items were flagged for potential poor agreement. These items were 100% independently extracted by one author and verified. The items were: study duration; the AE collection method; timing of collection; how binary harm outcomes were summarised; whether continuous outcomes were dichotomised; if continuous outcomes were left as continuous how they were analysed.

Study characteristics
The 10 studies had an active comparator and over 50% of trials received some element of industry funding (Table 1).

Prespecified analysis
Thirty-one percent of reports provided information on the planned analysis for AEs in the statistical analysis section of the paper and 45% pre-specified a safety population (Supplementary Table A3,   examples 3-4 and Table 2). 22, 23 A quarter of trials reported planned interim analysis with stopping criteria ( Table 2), five (2.7%) of which included specific criteria on stopping for a harmful event (Supplement Table A4 24-28 ).

Selection of AEs and reporting practices
Two reports only made generic statements regarding AE data: "there were no significant adverse events related to the procedure" and "no excess in mortality or major adverse events were found…".
Three reports made no mention of AEs throughout the manuscript. [29][30][31][32][33] Twenty-four (13%) trials only provided a summary of the number of AEs or serious AEs rather than listing the actual AEs that occurred. For example "Six serious adverse events occurred in the acetaminophen group and 12 in the ibuprofen group." 34 Of these 24 trials, 10 did provide specific details of the types of events in an appendix. This means 8% of trials either did not report AEs or only included a summary (Table 3).   17 We found that 41% of trials analysed AEs in participants that received at least one dose, 29% of trials used all randomised participants and 9% did not specify the analysis population (Table 3). Further details on analysis populations used are given in supplementary Table A7.
Nearly 80% of trials reported the number of participants who withdrew from the trial; of these 35% (51 of 146 reports) reported whether the withdrawals were due to AEs and of these 24% (12 of 51 reports) reported the actual events that caused withdrawals. Results presented and analysis performed was predominantly on 'patients with at least 1 event' with 84% of reports providing no information on the number of events occurring. An example of how to incorporate information on number of events is presented in reference 35 . Forty-one percent of trials reported information on the severity of AEs. Five percent of trials include a report of at least one event with duration, but presenting such data is limited in the main report. The trials that did present this information did so in a variety of ways. For example incorporating the information into the AE table with summary statistics such as the mean duration of certain events or presenting it for a subgroup of events in the footnotes of AE tables e.g. "One event of non-serious squamous cell carcinoma (day 210, resolved on day 215; adalimumab treatment was not interrupted)." [36][37][38] Twenty-eight percent of reports included information on the timing of AEs (Table 3).
Serious adverse events were typically well documented (73%) and six reports (3%) explicitly stated that no serious events had occurred. However, for forty-four reports (24%) it was not possible to discern if no serious events had occurred or whether they were simply omitted from the report.
Forty-two percent (57 of 134 reports) of reports included details on whether the events had been classified as related to the intervention (Table 3).
Despite a lack of power to undertake formal hypothesis testing, 47% reported p-values for binary outcomes. For example "There were no between-group differences in the rate of patients with at

Continuous
There was a pervasive practise (59%) of categorising continuous clinical and laboratory outcomes.
Of the trials that did not dichotomise continuous AE data nearly 70% performed some form of statistical significance testing (Table 3). Whilst continuous outcomes do not suffer to the same degree regarding lack of power, multiple testing is still a problem, however no multiplicity corrections for continuous outcomes were performed.
Of the trials that performed statistical significance testing on AE data, only three made an adjustment for multiplicity of tests (all three on dichotomised outcomes). 36,40,41 Two of which used a Bonferroni correction and adjusted for the number of pairwise comparisons between each of the Twelve percent of reports used graphs to illustrate AE data ( Table 3). The CONSORT extension highlighted the value of graphs for summarising such data, especially for conveying information on time-to-event outcomes. 42 An example of such a plot is included in the supplement of reference 43 (eFigure2).
We assessed any reference to the CONSORT harm extension and found that none of the included studies mentioned it. Of the four journals included in the review, the Lancet was the only journal that made specific reference to the harm extension in their guidelines to authors.

DISCUSSION
The safety profile of a medicinal product is established through evidence collected from several sources including clinical trials, observational studies and spontaneous reports. 44 The advantage of clinical trial data is that these provide a controlled comparison of the rate of AEs allowing causality to be evaluated but have the disadvantage that the sample size is often not large enough to detect rare ADRs.
To ensure that a useful and comprehensive picture of the safety profile is provided to all relevant parties clear reporting of AEs from clinical trials is required. Current research has shown the quality of reporting is substandard. [7][8][9][10][11][12][13][14][15][16] The aim of this study was to review current practice across four Collection and assessment methods The CONSORT extension to harm was developed with the aim to improve reporting of safety data in RCTs. 42 None of the included studies referenced the CONSTORT harm extension and of the items in our review that are covered in CONSORT many were not well reported. 17 This suggests that the CONSORT extension is not being routinely adopted by authors to aid their reporting. Most journals now request that authors include a completed CONSORT checklist when they submit their article but we are not aware of any journal that request the CONSORT harm extension to also be submitted. Of the four journals in this review the Lancet is the only journal that makes specific reference to the harm extension in their guidelines to authors. The CONSORT statement contains a single item related to safety, item 19: 'all important harms or unintended effects in each group' should be reported. 42 This may explain why some items listed on the CONSORT extension for harm were reported by so few trials. The mandatory submission of CONSORT harms by journals may support better reporting.
We found that the method of AE collection was poorly reported. This has important implications for the type and frequency of AEs reported with "passive collection resulting in fewer recorded AEs". 45,46 Where the method was given the timing of collection was typically also reported and we would recommend continuation of this practice. The frequency of AE collection has further important implications on the number of events reported. More frequent assessment and longer follow-up will result in more AEs reported. 17 It is important to consider these factors when making conclusions about the safety profile. The method of attribution between drug and AE was another area where reporting practice was inadequate. However, the joint pharmaceutical/journal collaboration indicate that such attribution has 'limited value' given the 'inherent subjectivity in such attribution'. 18 Prespecified analysis We found that formal assessments of AEs regarding stopping for emerging ADRs utilising statistical rules was rare. Subjective assessments of overwhelming amounts of data could easily lead to potential signals of harm being missed. There could be benefits to incorporating more objective statistical methods alongside clinical review to assist the evaluation of AE information to help better identify drug harm relationships. Graphical displays have gone some way towards aiding interpretation. [47][48][49][50][51] Selection of AEs and reporting practices Due to space constraints in journal reports AE information is often included in the appendix. Whilst we encourage use of appendices and supplementary material for including additional detail on AEs, we caution authors against depositing all AE data into such documents without attempting to present a summary of the AE profile in the main article. It is important that the main report strikes a balance between efficacy and harm therefore allowing a risk-benefit assessment to be made solely from the article. The failure to report any information on AEs restricts interpretation and prevents a risk-benefit assessment. We identified two reports that made generic summaries of the overall safety profile and it was clear in both that there had been harmful effects. However, the authors did not include any further information. Three reports contained no information leaving readers uninformed as to any additional information these studies may provide on the safety profile. Ambiguous reporting prevents building an accurate picture of the safety profile. As such profiles are developed on accumulating evidence, it is important that each study report to the same standard and information is not wasted.
We found that the selection criteria used by authors to decide what AEs to include in the report were arbitrary and inconsistent. This will have important implications when synthesising data across studies to construct safety profiles. Authors would benefit from guidance to facilitate consistency but currently research in this area is lacking. Lineberry et al. recommended clinically relevant events that should always be reported (deaths, SAEs and events leading to discontinuation of intervention) and criteria that should be considered when deciding what other AEs to report e.g. interest based on the disease(s) under investigation, comorbidities of the study population, intervention mechanism, trial duration. 18 Standard outcomes for a drug class would be one potential solution to avoid issues of inconsistency suggested by Cornelius et al. 11 CONSORT recommend that AE analyses should be performed on the intention-to-treat (ITT) population to maintain the random assignment. 17 However it is clear from our review that this population label is not always appropriately and consistently applied. There is a tendency for studies to make modifications to the ITT population. Using the ITT or modified-ITT population is likely to underestimate the risk by inflating the denominator with participants who may have never received the study drug. 52 Such estimates are appropriate for health economic evaluations where estimates However, a more appropriate population for AE analysis to inform prescriber and patient decisions may be those that receive at least one dose. It is important that authors clearly define and specify a suitable safety analysis population and consider how this affects their conclusions.
Proxy outcomes can be used as a measure of the impact of AEs on patients. Examples include the number of withdrawals due to any reason, withdrawals due to AEs, the number of events an individual experiences, the severity of the AE and the duration. A high proportion of trials reported withdrawal for any reason and this is likely to be as a result of the CONSORT recommendations. 42 The other outcomes were not frequently reported and increasing this could facilitate interpretation. 17 This information would permit better evaluation of the impact of AEs and the tolerability of the intervention to inform patients' and clinicians' treatment decisions. Reporting numbers that experience at least one event only and not providing information on repeated events masks valuable information that may be important to the patient and the cost-effectiveness evaluation. For example, chronic, repeated headaches over an extended duration will have an important impact for patients compared to a single headache or headaches over a short duration but it is not possible to distinguish between these two scenarios when reported as 'at least one event'. 18 Severity of events was also an important aspect that was often not differentiated.  within the first few weeks of starting anti-depressant allows patients and prescribers to remain alert and monitor closely for this period. Nearly a third of reports included such information and we would encourage authors to adopt this practice.

Analysis of AE outcomes
The majority of trials in this review included a balanced report of AEs alongside benefit. However many included generic statements regarding the safety profile such as 'the intervention was well tolerated' or 'the intervention exhibited a good safety profile' and these were frequently based on post-hoc statistical tests. Guidelines caution against such tests. 18 The results of which are difficult to interpret as a lack of significance does not indicate that the intervention is safe and conversely multiple testing without adjustment will increase the number of significant differences due to chance. 54,55 Graphs are an efficient method to convey and interpret large amounts of data and can make it easier to flag potential safety signals. 50,51,53 Twelve percent of studies included in the review used graphs to present AE data and an example of one such report is given in the supplementary eTable of reference 56 .
Recommendations for consideration for immediate adoption by the clinical trial community are summarised in Table 4.

Analysis
Use graphical approaches to help summarise large amounts of data.
Report adverse event data according to the CONSORT harm checklist.
Increase the uptake of mandatory submission of CONSORT harm by journals.

Reporting
Include a relevant summary of the adverse event profile in the main article. Resist depositing all adverse event data into appendices without summarising.

Limitations of trials
Trials are a valuable source for high quality adverse event data but compared to observational studies have smaller sample size, follow-up periods and generalisability, which restrict the ability to detect rare ADRs, ADRs with long latency and drug interactions in complex populations. The typical duration of a trial means there is often insufficient follow-up to fully characterise the safety profile as it provides limited information on long-term exposure. Stringent inclusion criteria restrict the population the intervention is assessed in and so limited information on drug-interactions is obtained. 5

FUNDING
This research was supported by the NIHR grant number DRF-2017-10-131.

DATA SHARING STATEMENT
No additional data are available.

Planned analysis
Details of any plans for analysing AE outcomes 12 Describe analysis for AE outcomes in the statistical methods.
Reference must be made to harmful events e.g. AEs or a specific harm event, this cannot be simply how binary events will be analysed.

13
Define a 'safety' population for analysis.
14 Specify a planned interim analysis with stopping criteria: based on efficacy, based on safety, based on both efficacy and safety, yes but no other details given, no planned interim analysis or unclear Criteria for stopping must be set out, it is not enough to say that the DMC reviewed the data.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46 F o r p e e r r e v i e w o n l y 3 Details of how AEs were summarised and presented -continuous outcomes 24 Were continuous outcomes dichotomised: Yes for all, yes for some, no or not applicable? This includes measures that will have been captured as continuous and then dichotomised for example blood levels, blood pressure etc.

25
If continuous outcomes were analysed as continuous what analysis was performed: differences in measures of central tendency, significance tests, other? (Select all that apply) Details of how AEs were summarised and presented 26 Were signal detection methods used? 27 Were any graphical summaries of AEs presented?

28
Were severity ratings given: Yes for all, yes for some, no or not applicable?

29
Were numbers of serious events presented: Yes by treatment arm, yes overall, no or not applicable?
If death is reported as part of the efficacy outcome it is not enough to constitute reporting serious events.

30
Were serious events coded as treatment related: Yes for all, yes for some, no or not applicable?  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45   Variation in the collection and definition of events could explain differences in the incidence of observed events. 13,14 For example specifically asking participants about an event of interest in one treatment group whilst relying on patient report in another is likely to lead to a disparity in incidence of events unlikely to be related to the medicinal product.   "O'Brien-Fleming stopping boundaries were used to assess efficacy, and a less stringent boundary was used to assess harm." Billings et al. 25 "The data and safety monitoring board (DSMB) reviewed patient recruitment practices, safety reporting, and data quality after 30 patients completed the study; performed an interim analysis after 277 patients … had completed the study to assess safety of the intervention; and performed a second interim analysis after 546 patients … had completed the study to assess the safety, efficacy, and futility of the intervention. The DSMB made recommendations based on qualitative assessments of the safety, efficacy, and futility of the intervention…" "Suspend enrolment in any study arm … due to safety concerns based on study intervention. Safety concerns include: • Increase in in-hospital all-cause mortality in subjects randomized to A or B such that the DSMB deems the increase is excessive compared to A or B.
• Increased treatment toxicity in either treatment group deemed excessive. Toxicity is defined as moderate or severe myalgias. • Increased severity of adverse events deemed "Probably Related" or "Possibly Related" to study intervention in either treatment group. Itemized adverse event reports separated by treatment will be provided.
• Increased AKI incidence in either treatment group deemed excessive. • Increased incidence of stroke or hemodialysis requirement in either group (secondary endpoints) deemed excessive." Beardsley et al. 26 "An independent data and safety monitoring committee oversaw trial safety and analyzed unblinded data after every 50 deaths, according to its charter ..." "The Haybittle-Peto boundary, requiring p<0.001 at interim analysis to consider stopping for efficacy, will be used as guidance. A level of significance of 1% will be used as a guide for stopping the trial early because of a detected harm of dexamethasone. In addition, the DMEC will receive conditional power curves to assess whether it remains realistic that the trial will demonstrate superiority of dexamethasone conditional on the data accrued up to the point of the interim analysis. Importantly, the DMEC recommendations will not be based purely on statistical tables but will also use clinical judgment." Kor et al. 27 "In addition to statistical criteria for significance, the study included a priori "go-no-go" definitions for recommending continuation to phase 3 study ... Briefly, continuation to phase 3 would occur with a positive primary outcome finding along with an acceptable safety profile. An acceptable safety profile was defined as a serious adverse event profile for aspirin that was not statistically worse than placebo (95% CI for the relative risk of any serious adverse event covers the null value of relative risk = 1.0). The "no-go decision" was defined as early termination by the data and safety monitoring board for safety or unfavorable risk/benefit ratio. An indeterminate case in which there was a non-statistically significant effect but this effect was in a clinically meaningful direction was also defined." Initiate Phase III Study: Demonstrated efficacy signal in addition to adequate safety profile Criteria: Early termination for benefit at interim analysis or p<0.08885 at final analysis (alpha=0.10 for study). Serious adverse event profile of ASA not statistically worse than placebo (95% confidence interval for the relative risk of any SAE covers the null value of RR=1.0). Further Development Potentially Required: Weak efficacy signal Criteria: Primary endpoint did not achieve a priori level of significance but there were at least a general consistency of secondary endpoints indicating propensity for efficacy with a larger sample size and/or more specific primary endpoint.  28 We used a group sequential statistical approach to do two equally spaced pre-planned interim analyses (at 33% and 67% of total recruitment) to assess accumulated safety data (differential proportions of deep venous thrombosis and total mortality). This approach was chosen to provide for early stopping for probable harm or strong evidence of benefit. We applied the Haybittle-Peto criterion (|Zk|≥3) for early stopping at these analyses.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59    AEs in greater than x% in treatment group & SAEs in greater than y% in any group 1 0.54 AEs and SAEs occurring more often in treatment group than control 1 0.54 AEs in greater than x% in treatment group & occurred more often in treatment group than control & predefined/special interest AEs 1 0.54 AEs in greater than x% in any group & frequency between groups differed by more than y%, SAEs in greater than z% in any group & all grade >=3 AEs 1 0.54 AEs in greater than x% patients & more than y% difference between treatment groups & AEs leading to treatment discontinuation/interruption & most common SAEs (no criteria specified    1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y Reporting checklist for systematic review and meta-analysis.
Based on the PRISMA guidelines.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.

Design
A review of clinical trials of drug interventions from four high impact medical journals.

Data sources
Electronic contents table of

Methods
A pre-piloted checklist was used and single data extraction was performed by three reviewers with independent check of a randomly sampled subset to verify quality. We extracted data on collection methods, assessment of severity and causality, reporting criteria, analysis methods and presentation of AE data.

Results
We identified 184 eligible reports (BMJ n=3; JAMA n=38, Lancet n=62; and NEJM n=81). Sixty-two percent reported some form of spontaneous AE collection but only 29% included details of specific prompts used to ascertain AE data. Numbers that withdrew from the trial were well reported (80%),

Conclusions
This review highlighted that the collection, reporting and analysis of AE data in clinical trials is inconsistent and RCTs as a source of safety data are underutilised. Areas to improve include reducing information loss when analysing at patient level and inappropriate practice of underpowered multiple hypothesis testing. Implementation of standard reporting practices could enable a more accurate synthesis of safety data and development of guidance for statistical methodology to assess causality of AEs could facilitate better statistical practice.

Keywords
Randomised controlled trials; adverse events; harm data; adverse drug reactions; review; investigational drug.

Strengths and Limitations of this study
1. This is the first review to examine and quantify the methods used for AE analysis in RCTs published in high impact general medical journals.
2. This review identifies methodological weakness that need to be addressed as well as good practice that could be adopted.

3.
Articles included in this review were published in four of the top ranked general medical journals therefore results are likely to be biased towards better practice.

4.
Included articles are only for year 2015-2016 and as such may not reflect current practice.

INTRODUCTION
The methods to analyse and report outcomes to measure benefit from randomised controlled trials (RCTs) are well developed but this progress has not been matched for adverse event (AE) outcomes.
An adverse event is defined as 'any untoward medical occurrence that may present during treatment with a pharmaceutical product but which does not necessarily have a causal relationship with this treatment'. 1 An adverse drug reaction (ADR) is defined as 'a response to a drug which is noxious and unintended …' where a causal relationship is 'at least a reasonable possibility'. 1, 2 RCTs provide an opportunity to compare rates of AEs between arms allowing causality to be evaluated.
However, contemporary analysis and reporting practices are inadequate.
There are many challenges associated with analysing and reporting AEs in clinical trials. RCTs are typically designed to determine the efficacy of an intervention but are often underpowered to detect important differences in AEs between arms which may suggest an ADR. Often large numbers of AEs are reported during a study, sometimes exceeding the number of patients in the clinical trial.
Performing hypothesis tests on these AEs would lead to issues of multiplicity, however any adjustment for multiplicity would make a 'finding untenable'. 3,4 The use of hypothesis testing may result in the medicinal product being deemed unsafe and a trial being halted too early due to a chance imbalance, or conversely deemed safe and not stopped early enough resulting in more patients than necessary suffering an ADR. 3,5,6 Unlike efficacy outcomes which are well defined and restricted in number at the planning stage of a RCT, we collect numerous, undefined AEs in RCTs.
Furthermore, AE collection requires additional information to be obtained on factors such as severity, timing and duration, number of occurrences and outcome, which for our efficacy outcomes would have all been predefined. there remains uncertainty about practice for reporting and presenting AE data, and in addition the analysis practice for AEs remains a neglected area for review.
The aim of this review was to evaluate contemporary practice for collection, reporting and analysis of AEs in RCTs where the primary outcome was efficacy. The aim being to identify and promote any areas of good practice, whilst highlighting any areas for improvement.

Search strategy
The

Selection criteria
The inclusion criteria were phase II-IV RCTs of drug interventions where the primary outcome was efficacy of the intervention. We did not restrict according to number of treatment arms and included both parallel and cluster RCTs. We excluded cross-over RCTs, RCTs with adaptive randomisation, observational studies, case reports, editorials and letters. We also excluded RCTs where the intervention was not a drug product (i.e. not classified as a clinical trial of an investigational medicinal product (CTIMP)). As the study aimed to assess how authors report and analyse AEs in studies where the primary outcome was efficacy, trials that were specifically designed to investigate safety as a primary outcome were not included.

Data extraction
Potentially eligible articles were identified based on titles and abstracts and the full text of these studies were retrieved. Supplementary material was also reviewed if readers were referred here from the main article for further results. Supplementary Table A1 lists all data items captured with guidance given to the reviewers for extraction. The items to be extracted were based on the work by were selected for inclusion in the journal article; how summary event information was presented in the journal article and how AEs were analysed. 11 A more detailed rationale for the choice of items extracted is provided in the supplementary material (Table A2).
A data extraction sheet was piloted and then single data extraction was performed by three reviewers (RP, VC and LH) with 10% independent check of a randomly sampled subset to verify quality. Queries were also informally discussed between reviewers on an ongoing basis. Where specific items were flagged for poor agreement these were re-extracted. Any queries during data extraction were shared and disagreements between reviewers were resolved through discussion.

Data analysis
The proportion of trials reporting each item, 3-4 and 8-34 in supplementary Table A1 were calculated and summary statistics (median and ranges) were calculated for items 5-7. All analyses were performed in Stata version 15. 19 A risk of bias assessment was not undertaken as this study aimed to describe best practice and not evaluate outcomes.

Patient and public involvement
This review forms part of a wider research project that was developed with input from a range of patient representatives. There were no study participants directly involved in this review but the original proposal and patient and public involvement (PPI) strategy were reviewed by service user representatives (with experience as clinical trial participants and PPI advisors) who provided advice specifically with regard to communication and dissemination to patient and public groups.

Data extraction
A total of 585 items were extracted twice across all three reviewers to check the quality of the data extraction. A total of 95 discrepancies were identified. This gave agreement of 84%. During this independent check several items were flagged for potential poor agreement. These items were 100% independently extracted by one author and verified. The items were: study duration; the AE collection method; timing of collection; how binary harm outcomes were summarised; whether continuous outcomes were dichotomised; if continuous outcomes were left as continuous how they were analysed.

Study characteristics
The 10 studies had an active comparator and over 50% of trials received some element of industry funding (Table 1).

Prespecified analysis
Thirty-one percent of reports provided information on the planned analysis for AEs in the statistical analysis section of the paper and 45% pre-specified a safety population (Supplementary Table A3,   examples 3-4 and Table 2). 22, 23 A quarter of trials reported planned interim analysis with stopping criteria ( Table 2), five (2.7%) of which included specific criteria on stopping for a harmful event (Supplement Table A4 [24][25][26][27][28] ).

Selection of AEs and reporting practices
Two reports only made generic statements regarding AE data: "there were no significant adverse events related to the procedure" and "no excess in mortality or major adverse events were found…".
Three reports made no mention of AEs throughout the manuscript. [29][30][31][32][33] Twenty-four (13%) trials only provided a summary of the number of AEs or serious AEs rather than listing the actual AEs that occurred. For example "Six serious adverse events occurred in the acetaminophen group and 12 in the ibuprofen group." 34 Of these 24 trials, 10 did provide specific details of the types of events in an appendix. This means 8% of trials either did not report AEs or only included a summary (Table 3).   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n l y 17 We found that 41% of trials analysed AEs in participants that received at least one dose, 29% of trials used all randomised participants and 9% did not specify the analysis population (Table 3). Further details on analysis populations used are given in supplementary Table A7.
Nearly 80% of trials reported the number of participants who withdrew from the trial; of these 35% (51 of 146 reports) reported whether the withdrawals were due to AEs and of these 24% (12 of 51 reports) reported the actual events that caused withdrawals. Results presented and analysis performed was predominantly on 'patients with at least 1 event' with 84% of reports providing no information on the number of events occurring. An example of how to incorporate information on number of events is presented in reference 35 . Forty-one percent of trials reported information on the severity of AEs. Five percent of trials include a report of at least one event with duration, but presenting such data is limited in the main report. The trials that did present this information did so in a variety of ways. For example incorporating the information into the AE table with summary statistics such as the mean duration of certain events or presenting it for a subgroup of events in the footnotes of AE tables e.g. "One event of non-serious squamous cell carcinoma (day 210, resolved on day 215; adalimumab treatment was not interrupted)." [36][37][38] Twenty-eight percent of reports included information on the timing of AEs (Table 3).
Serious adverse events were typically well documented (73%) and six reports (3%) explicitly stated that no serious events had occurred. However, for forty-four reports (24%) it was not possible to discern if no serious events had occurred or whether they were simply omitted from the report.

Continuous
There was a pervasive practise (59%) of categorising continuous clinical and laboratory outcomes.
Of the trials that did not dichotomise continuous AE data nearly 70% performed some form of statistical significance testing (Table 3). Whilst continuous outcomes do not suffer to the same degree regarding lack of power, multiple testing is still a problem, however no multiplicity corrections for continuous outcomes were performed.
Of the trials that performed statistical significance testing on AE data, only three made an adjustment for multiplicity of tests (all three on dichotomised outcomes). 36,40,41 Two of which used a Bonferroni correction and adjusted for the number of pairwise comparisons between each of the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   19 treatment groups for each individual event rather than the total number of significance tests performed. As such both analyses would have still been effected by issues of multiple testing.
Twelve percent of reports used graphs to illustrate AE data ( Table 3). The CONSORT extension highlighted the value of graphs for summarising such data, especially for conveying information on time-to-event outcomes. 42 An example of such a plot is included in the supplement of reference 43 (eFigure2).
We assessed any reference to the CONSORT harm extension and found that none of the included studies mentioned it. Of the four journals included in the review, the Lancet was the only journal that made specific reference to the harm extension in their guidelines to authors.

DISCUSSION
The safety profile of a medicinal product is established through evidence collected from several sources including clinical trials, observational studies and spontaneous reports. 44 The advantage of clinical trial data is that these provide a controlled comparison of the rate of AEs allowing causality to be evaluated but have the disadvantage that the sample size is often not large enough to detect rare ADRs.
To ensure that a useful and comprehensive picture of the safety profile is provided to all relevant parties clear reporting of AEs from clinical trials is required. Recent research has shown the quality of reporting is substandard. [7][8][9][10][11][12][13][14][15][16] The aim of this study was to review contemporary practice across four leading medical journals for AE collection, analysis and reporting practices, highlighting any areas for  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  Collection and assessment methods The CONSORT extension to harm was developed with the aim to improve reporting of safety data in RCTs. 42 None of the included studies referenced the CONSTORT harm extension and of the items in our review that are covered in CONSORT many were not well reported. 17 This suggests that the CONSORT extension is not being routinely adopted by authors to aid their reporting. Most journals now request that authors include a completed CONSORT checklist when they submit their article but we are not aware of any journal that request the CONSORT harm extension to also be submitted. Of the four journals in this review the Lancet is the only journal that makes specific reference to the harm extension in their guidelines to authors. The CONSORT statement contains a single item related to safety, item 19: 'all important harms or unintended effects in each group' should be reported. 42 This may explain why some items listed on the CONSORT extension for harm were reported by so few trials. The mandatory submission of CONSORT harms by journals may support better reporting.
The failure to report any information on AEs restricts interpretation and prevents a risk-benefit assessment. We identified two reports that made generic summaries of the overall safety profile and it was clear in both that there had been harmful effects. However, the authors did not include any further information. Three reports contained no information leaving readers uninformed as to any additional information these studies may provide on the safety profile. Ambiguous reporting prevents building an accurate picture of the safety profile. As such profiles are developed on accumulating evidence, it is important that each study report to the same standard and information is not wasted.
We found that the selection criteria used by authors to decide what AEs to include in the report were arbitrary and inconsistent. This will have important implications when synthesising data across studies to construct safety profiles. Authors would benefit from guidance to facilitate consistency but research in this area is lacking. Lineberry et al. recommended clinically relevant events that should always be reported (deaths, SAEs and events leading to discontinuation of intervention) and criteria that should be considered when deciding what other AEs to report e.g. interest based on the disease(s) under investigation, comorbidities of the study population, intervention mechanism, trial duration. 18 Standard outcomes for a drug class would be one potential solution to avoid issues of inconsistency suggested by Cornelius et al. 11 CONSORT recommend that AE analyses should be performed on the intention-to-treat (ITT) population to maintain the random assignment. 17 However it is clear from our review that this  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   23 population label is not always appropriately and consistently applied. There is a tendency for studies to make modifications to the ITT population. Using the ITT or modified-ITT population is likely to underestimate the risk by inflating the denominator with participants who may have never received the study drug. 52 Such estimates are appropriate for health economic evaluations where estimates of the cost-effectiveness will inform policy level decisions regarding how to treat the population.
However, a more appropriate population for AE analysis to inform prescriber and patient decisions may be those that receive at least one dose. It is important that authors clearly define and specify a suitable safety analysis population and consider how this affects their conclusions.
Proxy outcomes can be used as a measure of the impact of AEs on patients. Examples include the number of withdrawals due to any reason, withdrawals due to AEs, the number of events an individual experiences, the severity of the AE and the duration. A high proportion of trials reported withdrawal for any reason and this is likely to be as a result of the CONSORT recommendations. 42 The other outcomes were not frequently reported and increasing this could facilitate interpretation. 17 This information would permit better evaluation of the impact of AEs and the tolerability of the intervention to inform patients' and clinicians' treatment decisions. Reporting numbers that experience at least one event only and not providing information on repeated events masks valuable information that may be important to the patient and the cost-effectiveness evaluation. For example, chronic, repeated headaches over an extended duration will have an important impact for patients compared to a single headache or headaches over a short duration but it is not possible to distinguish between these two scenarios when reported as 'at least one event'. 18 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   24 have been suggested as a solution to aid review. Examples of such a plots can be found in reference 53 . Online appendices and supplementary material provide more opportunity to include this important information.
For serious adverse events information on the time of likely onset can be useful information to inform patient monitoring plans. For example, the documented risk of suicide and suicidal ideation within the first few weeks of starting anti-depressant allows patients and prescribers to remain alert and monitor closely for this period. Nearly a third of reports included such information and we would encourage authors to adopt this practice.

Analysis of AE outcomes
The majority of trials in this review included a balanced report of AEs alongside benefit. However many included generic statements regarding the safety profile such as 'the intervention was well tolerated' or 'the intervention exhibited a good safety profile' and these were frequently based on post-hoc statistical tests. Guidelines caution against such tests. 18 The results of which are difficult to interpret as a lack of significance does not indicate that the intervention is safe and conversely multiple testing without adjustment will increase the number of significant differences due to chance. 54,55 Graphs are an efficient method to convey and interpret large amounts of data and can make it easier to flag potential safety signals. 50,51,53 Twelve percent of studies included in the review used graphs to present AE data and an example of one such report is given in the supplementary eTable of reference 56 .  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   25 Recommendations for consideration for immediate adoption by the clinical trial community are summarised in Table 4.

Analysis
Use graphical approaches to help summarise large amounts of data.
Report adverse event data according to the CONSORT harm checklist.
Increase the uptake of mandatory submission of CONSORT harm by journals.

Reporting
Include a relevant summary of the adverse event profile in the main article. Resist depositing all adverse event data into appendices without summarising.

FUNDING
This research was supported by the NIHR grant number DRF-2017-10-131.

DATA SHARING STATEMENT
No additional data are available.

Planned analysis
Details of any plans for analysing AE outcomes 12 Describe analysis for AE outcomes in the statistical methods.
Reference must be made to harmful events e.g. AEs or a specific harm event, this cannot be simply how binary events will be analysed.

13
Define a 'safety' population for analysis.
14 Specify a planned interim analysis with stopping criteria: based on efficacy, based on safety, based on both efficacy and safety, yes but no other details given, no planned interim analysis or unclear Criteria for stopping must be set out, it is not enough to say that the DMC reviewed the data.  Variation in the collection and definition of events could explain differences in the incidence of observed events. 13,14 For example specifically asking participants about an event of interest in one treatment group whilst relying on patient report in another is likely to lead to a disparity in incidence of events unlikely to be related to the medicinal product. For example, the intention-to-treat population is likely to underestimate the AE risk by inflating the denominator. Therefore, this needs to be considered when making conclusions about a drug's safety profile.
How events were selected for inclusion in the journal article.
Due to the space constraints in journal articles it is not always feasible to report all AEs experienced by participants. Therefore, articles often only report a subset of AEs and how these are selected for inclusion has important implications for the safety evaluation. Arbitrary selection criteria can lead to inconsistencies in what is presented across trials for the same disease and/or drug. This prevents an accurate overview of the AEs experienced and invalidates any potential systematic review of events.
How and what summary event information was presented in the journal article.
For example, the number of events and duration of events provides insight into the impact of AEs, with repeated or longer events potentially having far wider clinical implications than a single, shorter event for both patients and prescribers.
How AEs were analysed. There are many challenges to be considered when analysing AEs in clinical trials. For example, inappropriate statistical testing can lead to misleading conclusions e.g. failure to find a statistically significant result leading authors to conclude that the medicinal product is safe or chance imbalance could lead the authors to erroneously stopping a trial too early. [3][4][5][6]    "O'Brien-Fleming stopping boundaries were used to assess efficacy, and a less stringent boundary was used to assess harm." Billings et al. 25 "The data and safety monitoring board (DSMB) reviewed patient recruitment practices, safety reporting, and data quality after 30 patients completed the study; performed an interim analysis after 277 patients … had completed the study to assess safety of the intervention; and performed a second interim analysis after 546 patients … had completed the study to assess the safety, efficacy, and futility of the intervention. The DSMB made recommendations based on qualitative assessments of the safety, efficacy, and futility of the intervention…" "Suspend enrolment in any study arm … due to safety concerns based on study intervention. Safety concerns include: • Increase in in-hospital all-cause mortality in subjects randomized to A or B such that the DSMB deems the increase is excessive compared to A or B.
• Increased treatment toxicity in either treatment group deemed excessive. Toxicity is defined as moderate or severe myalgias. • Increased severity of adverse events deemed "Probably Related" or "Possibly Related" to study intervention in either treatment group. Itemized adverse event reports separated by treatment will be provided.
• Increased AKI incidence in either treatment group deemed excessive. • Increased incidence of stroke or hemodialysis requirement in either group (secondary endpoints) deemed excessive." Beardsley et al. 26 "An independent data and safety monitoring committee oversaw trial safety and analyzed unblinded data after every 50 deaths, according to its charter ..." "The Haybittle-Peto boundary, requiring p<0.001 at interim analysis to consider stopping for efficacy, will be used as guidance. A level of significance of 1% will be used as a guide for stopping the trial early because of a detected harm of dexamethasone. In addition, the DMEC will receive conditional power curves to assess whether it remains realistic that the trial will demonstrate superiority of dexamethasone conditional on the data accrued up to the point of the interim analysis. Importantly, the DMEC recommendations will not be based purely on statistical tables but will also use clinical judgment." Kor et al. 27 "In addition to statistical criteria for significance, the study included a priori "go-no-go" definitions for recommending continuation to phase 3 study ... Briefly, continuation to phase 3 would occur with a positive primary outcome finding along with an acceptable safety profile. An acceptable safety profile was defined as a serious adverse event profile for aspirin that was not statistically worse than placebo (95% CI for the relative risk of any serious adverse event covers the null value of relative risk = 1.0). The "no-go decision" was defined as early termination by the data and safety monitoring board for safety or unfavorable risk/benefit ratio. An indeterminate case in which there was a non-statistically significant effect but this effect was in a clinically meaningful direction was also defined." Initiate Phase III Study: Demonstrated efficacy signal in addition to adequate safety profile Criteria: Early termination for benefit at interim analysis or p<0.08885 at final analysis (alpha=0.10 for study). Serious adverse event profile of ASA not statistically worse than placebo (95% confidence interval for the relative risk of any SAE covers the null value of RR=1.0). Further Development Potentially Required: Weak efficacy signal Criteria: Primary endpoint did not achieve a priori level of significance but there were at least a general consistency of secondary endpoints indicating propensity for efficacy with a larger sample size and/or more specific primary endpoint.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   7 Nichol et al. 28 We used a group sequential statistical approach to do two equally spaced pre-planned interim analyses (at 33% and 67% of total recruitment) to assess accumulated safety data (differential proportions of deep venous thrombosis and total mortality). This approach was chosen to provide for early stopping for probable harm or strong evidence of benefit. We applied the Haybittle-Peto criterion (|Zk|≥3) for early stopping at these analyses.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.