Systematic comparison of Mendelian randomisation studies and randomised controlled trials using electronic databases

Objective To scope the potential for (semi)-automated triangulation of Mendelian randomisation (MR) and randomised controlled trials (RCTs) evidence since the two methods have distinct assumptions that make comparisons between their results invaluable. Methods We mined ClinicalTrials.Gov, PubMed and EpigraphDB databases and carried out a series of 26 manual literature comparisons among 54 MR and 77 RCT publications. Results We found that only 13% of completed RCTs identified in ClinicalTrials.Gov submitted their results to the database. Similarly low coverage was revealed for Semantic Medline (SemMedDB) semantic triples derived from MR and RCT publications –36% and 12%, respectively. Among intervention types that can be mimicked by MR, only trials of pharmaceutical interventions could be automatically matched to MR results due to insufficient annotation with Medical Subject Headings ontology. A manual survey of the literature highlighted the potential for triangulation across a number of exposure/outcome pairs if these challenges can be addressed. Conclusions We conclude that careful triangulation of MR with RCT evidence should involve consideration of similarity of phenotypes across study designs, intervention intensity and duration, study population demography and health status, comparator group, intervention goal and quality of evidence.


Supplementary Box
It is anticipated that the longer duration of exposure differences seen in MR studies will generate larger effect sizes than will relatively short-term modification of the exposure in RCTs.For example, the genetic variants related to non-HDL cholesterol (henceforth "cholesterol", the target of cholesterol lowering drugs such as the statins) have been shown to relate to relatively stable differences from early childhood to late adulthood, thus generating a lifetime of differential exposure to circulating cholesterol.The RCTs of cholesterol lowering drugs designed to show effects on coronary heart disease (CHD) events last ~5 years.Atherosclerosis is a disease process that develops from childhood onwards, and the CHD it generates would not be expected to be abolished by a few years of cholesterol lowering in middle age or older (the usual age included in the RCTs).As there are several cholesterol lowering drugs which target different genes it is possible to compare MR studies using genetic variants in those genes that are robustly related to cholesterol with RCTs of drugs that target those genes.Figure SB1 below combines data from such MR analyses with the RCTs.
As anticipated the RCTs produce about 40% of the risk reduction seen with a lifetime difference in exposure levels 1 .This scaling of the effects predicted from MR studies and seen in the matching trials can be applied to MR/RCT comparisons for other exposures, as the time course of effects being produced may be quicker than seen in the case of cholesterol, or take longer, or indeed there may be no effect in the RCTs if the effect of the exposure acts during a critical period in earlier life and sets in train a disease process that is not reversible by later modification of the exposure 2 .

ClinicalTrials.Gov Data filtering
We only included studies with most common designs suitable for comparison with published MR studies: Parallel Assignment and Crossover Assignment (eliminating: Single Group Assignment, Sequential Assignment, Factorial Assignment) intervention model in the Designs table with Randomized allocation to allow for selection of RCT.We further used the study type field (=Interventional) with a minimum number of arms = 2 in the Studies table as an additional filtering criterion to arrive at a set of RCT studies.The next stage of filtering concerned background information and study results.We first filtered on the presence of a study description in the Brief Summaries table.The key criterion was then presence of results in the Outcome Analyses table, where we selected the variables: param_type, param_value, p_value and method.Next, we needed all conditions to have at least Additionally, we extracted a subset of studies which did not supply any results in the Outcome Analyses table and therefore did not contribute to the Main dataset above but were RCT studies with published literature records in the Study References table (reference type = result).

Overview of top SemMed triples for MR and RCT studies
An overview of the top subjects in MR (Figure S4a) and RCTs (Figure S4b) revealed a high number of terms related to adiposity (obesity, body mass index, adiponectin), lipid biology (lipoproteins, hydroxymethylglutaryl-CoA reductase inhibitors, PCKSK9, high density lipoproteins) as well as type 2 diabetes and vitamin D, and recently COVID-19.Terms related to type 2 diabetes (insulin, metformin, diabetes) and vitamin D were also found in RCT triples.As expected by the preponderance of drug interventions in ClinicalTrials.Gov, there was a noticeable bias towards pharmaceutical preparations amongst the top 10 subjects in RCT triples with terms such as methotrexate, aspirin, clomiphene citrate, dexamethasone, prednisolone and cyclosporine.SemMedDB identified associated with as a top predicate in MR studies (Figure S4c), followed by predisposes, coexists with and affects which underlines the skew of current MR studies towards identifying risk factors for disease.On the other hand, RCT studies lean towards identifying treatments (Figure S4d) which is reflected in the top 1 predicate treats (n=7,791, second-best coexists with n=2,149).Among the top objects in MR studies (Figure S4e), we found cardiovascular diseases (coronary arteriosclerosis, coronary heart disease, myocardial ischemia/infarction, hypertensive disease, ischemic stroke).High frequency of a smaller number of cardiovascular disease terms (Figure S4f) was also found among RCT objects (hypertensive disease, cardiovascular disease).Both MR and

Figure SB1 .
Figure SB1.Drug treatment (circles) and genetic proxy (squares) effects on reducing cholesterol levels and the corresponding reduction in risk of from matching drug RCTs and MR analyses.The colours indicate the gene from which variants are taken in the MR studies and the target of the drug used in the named RCTs.Abbreviations: CETP, cholesteryl ester transfer protein; HMGCR, 3hydroxy-3-methylglutaryl-CoA reductase; NPC1L1, Niemann-Pick C1-like protein 1; PCSK9, proprotein convertase subtilisin/kexin type 9.Figure reproduced with permission from 3 .

Figure S1 .Figure S2 .Figure S3 .Figure S4 .
Figure S1.Overview of RCT general features in the Main dataset by their frequency: a) 1 Medical Subject Headings (MeSH) term assigned in all_conditions view to facilitate automatic comparison with external data sources.We dropped that requirement for interventions, as especially behavioural interventions could not be assigned a MeSH term (see Results).Finally, a range of basic reference and eligibility criteria were required: brief_title, study_type, overall_status, phase, number_of_arms, enrolment in the Studies table, gender, criteria in the Eligibilities table and outcome title and type in the Outcomes table.

Table S4
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)

Table S5
BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)