Sensitivity analysis for mistakenly adjusting for mediators in estimating total effect in observational studies

Objectives In observational studies, epidemiologists often attempt to estimate the total effect of an exposure on an outcome of interest. However, when the underlying diagram is unknown and limited knowledge is available, dissecting bias performances is essential to estimating the total effect of an exposure on an outcome when mistakenly adjusting for mediators under logistic regression. Through simulation, we focused on six causal diagrams concerning different roles of mediators. Sensitivity analysis was conducted to assess the bias performances of varying across exposure-mediator effects and mediator-outcome effects when adjusting for the mediator. Setting Based on the causal relationships in the real world, we compared the biases of varying across the effects of exposure-mediator with those of varying across the effects of mediator-outcome when adjusting for the mediator. The magnitude of the bias was defined by the difference between the estimated effect (using logistic regression) and the total effect of the exposure on the outcome. Results In four scenarios (a single mediator, two series mediators, two independent parallel mediators or two correlated parallel mediators), the biases of varying across the effects of exposure-mediator were greater than those of varying across the effects of mediator-outcome when adjusting for the mediator. In contrast, in two other scenarios (a single mediator or two independent parallel mediators in the presence of unobserved confounders), the biases of varying across the effects of exposure-mediator were less than those of varying across the effects of mediator-outcome when adjusting for the mediator. Conclusions The biases were more sensitive to the variation of effects of exposure-mediator than the effects of mediator-outcome when adjusting for the mediator in the absence of unobserved confounders, while the biases were more sensitive to the variation of effects of mediator-outcome than those of exposure-mediator in the presence of an unobserved confounder.


Open Access
AbstrAct Objectives In observational studies, epidemiologists often attempt to estimate the total effect of an exposure on an outcome of interest. However, when the underlying diagram is unknown and limited knowledge is available, dissecting bias performances is essential to estimating the total effect of an exposure on an outcome when mistakenly adjusting for mediators under logistic regression. Through simulation, we focused on six causal diagrams concerning different roles of mediators. Sensitivity analysis was conducted to assess the bias performances of varying across exposure-mediator effects and mediator-outcome effects when adjusting for the mediator. setting Based on the causal relationships in the real world, we compared the biases of varying across the effects of exposure-mediator with those of varying across the effects of mediator-outcome when adjusting for the mediator. The magnitude of the bias was defined by the difference between the estimated effect (using logistic regression) and the total effect of the exposure on the outcome. results In four scenarios (a single mediator, two series mediators, two independent parallel mediators or two correlated parallel mediators), the biases of varying across the effects of exposure-mediator were greater than those of varying across the effects of mediator-outcome when adjusting for the mediator. In contrast, in two other scenarios (a single mediator or two independent parallel mediators in the presence of unobserved confounders), the biases of varying across the effects of exposure-mediator were less than those of varying across the effects of mediator-outcome when adjusting for the mediator. conclusions The biases were more sensitive to the variation of effects of exposure-mediator than the effects of mediator-outcome when adjusting for the mediator in the absence of unobserved confounders, while the biases were more sensitive to the variation of effects of mediatoroutcome than those of exposure-mediator in the presence of an unobserved confounder.

IntrOductIOn
Estimating the total effect of the exposure (E) on the outcome (D) is a great challenge in epidemiology studies because confounders are commonly confused with mediators. [1][2][3] If confounders and mediators are misclassified, the ability to control confounders in the estimation of the total effect of the exposure on the outcome is hampered. In fact, various strategies are used to eliminate confounding bias in observational studies. The conventional approaches include multivariate regression, stratification, standardisation and inverse-probability weighting. 4 5 Furthermore, causal diagrams provide a formal conceptual framework for identifying and selecting confounders, 6 7 so that analysis can avoid falling into analytic pitfalls. 8 In practice, even the underlying causal diagrams and the role of covariates (mediator, confounder, collider and instrumental variable) are not completely understood, as investigators usually adjust for the covariates that are associated with the outcome and exposure. 9-12 Therefore, our paper focuses on the biases of varying across the effects of exposure-mediator (E→M) and mediator-outcome (M→D) when mistakenly adjusting for mediators under the logistic regression model.
Several causal inference studies have made considerable contributions to mediation analysis by providing definitions for direct and indirect effects that allow for the Sensitivity analysis for mistakenly adjusting for mediators in estimating total effect in observational studies Open Access decomposition of a total effect into a direct and an indirect effect. [13][14][15][16][17][18][19][20][21] Arbitrarily adjusting for a mediator would generally bias the estimate of the total effect of the exposure on the outcome. 8 22 23 Practically, it can mistakenly identify a non-confounding risk factor as a confounder.
In the perspective of causal diagrams, little attention has been paid to the biases when adjusting for mediators under the logistic regression model in estimating the total effect of E on D. Hence, we focused on the sensitivity analysis technique to assess the biases of varying across the effects of E→M and M→D when adjusting for the mediator.
In this paper, six typical causal diagrams corresponding to causal correlation are given in figure 1: a single mediator (figure 1A), two series mediators (figure 1B), two independent parallel mediators (figure 1C), two correlated parallel mediators (figure 1D), a single mediator with an unobserved confounder (figure 1E) and two parallel mediators with an unobserved confounder (figure 1F). The paper aimed to explore the sensitivity of biases to the variation of the effects of E→D and M→D when adjusting for the mediator. Hence, both theoretical proofs and quantitative simulations were performed to dissect the bias of varying across the effect of E→M and varying across the effect of M→D when adjusting for mediators under the logistic regression model.

MethOds
A directed acyclic graph (DAG) is composed of variables (nodes) and arrows (directed edges) between nodes such that the graph is acyclic. The causal diagrams are formalised as DAGs, providing investigators with powerful tools for bias assessment. 24 It provides a device for deducing the statistical associations implied by causal relations. Furthermore, given a set of observed statistical associations, a researcher knowledgeable about causal diagrams theory can systematically characterise all causal structures compatible with the observations. 25 26 The total effect of the exposure on the outcome can be calculated based on the do-calculus and back-door criterion proposed by Pearl. 27 28 For exposure X and outcome Y, a set of variables Z satisfies the back-door path criterion with respect to (X, Y) if no variable in Z is a descendant of X and Z blocks all back-door paths from X to Y. Then, the effect of X on Y is given by the following formula: Note that the expression on the right hand side of the equation is simply a standardised mean. The difference is taken as the definition of 'causal effect', where x ′ and x ′′ are two distinct realisations of X. 23 The interventional distribution, such as that corresponding to . It stands for the probability of Y = y when the exposure X is set to level x. The ignorability assumption Y ( x ) ⊥X states that, if we happen to have information on the exposure variable, it does not give us any information about the outcome Y after the intervention do ( x ) was performed. In addition, it can be shown that if ignorability holds for Y(x) and X (alternatively if there are no back-door paths from X to Y in the corresponding causal DAGs), . 29 30 Let D e and M e denote the values of the outcome and mediator that would have been observed had the exposure E been set to level e, respectively. On the OR (OR TE E→D ) scale, the total effect Open Access comparing exposure level e with e*, is given as the following 20 21 : of adjusting for mediator M by the logistic regression model can be given as the following: denotes the probability of D = 1 when the exposure E and mediator M have been set to level e and m, respectively. Taking figure 1A as an example, the logistic regression is as follows: Therefore, the total effect of exposure E on outcome D on the scale of logarithm OR was equal to The effect estimation (β ED|M (m)) of adjusting for mediator M by the logistic regression model was equal to: denotes the probability of D=1 when the exposure E and mediator M have been set to level e=1 and m, respectively. Additionally, P ( denotes the probability of D=1 when the exposure E and mediator M have been set to level e*=0 and m, respectively. The theoretical results of other causal diagrams in figure 1 have been shown (online supplementary Appendix). Note that the bias was defined by taking a difference between effect estimation by adjusting for the mediator using logistic regression and the total effect of exposure E on outcome D, that is, bias = E[β ED|M (m)] − β TE E→D . We dissected the behaviour of the biases by varying across the effects of E→M and M→D when mistakenly adjusting for the mediator under the framework of the logistic regression model.
sIMulAtIOn Six scenarios are designed to dissect the sensitivity of bias to the variation of the effects of exposure-mediator and mediator-outcome when adjusting for mediators under the framework of the logistic regression model; these DAGs are shown in figure 1. We made the following assumptions for the simulation: (1) all variables were binary, following a Bernoulli distribution; and (2) the effects from parent nodes to their child node were positive and log-linearly additive. Taking figure 1A as an example, we randomly generated the exposure following a Bernoulli distribution (ie, let P(e = 1) = π). Then, we used to calculate the distribution probability of child node M from its parent node E. Simil a r l y, generated the distribution probability of D, where the parameters α 0 and α 1 denoted the intercept of M and D, respectively, and effect parameters β 0 , β 1 , β 2 referred to the effects of the parent node on their corresponding child node using a log OR scale.
After generating data, we dissected the behaviour of the biases between the effects of E→M and M→D when mistakenly adjusting for mediators under the logistic regression model. In scenario 1 (figure 1A), we compared performances by varying across the effects of E→M and M→D. Similarly, in scenario 2 (figure 1B), the effects of E→M 1 , M 1 →M 2 and M 2 →D were explored. In scenario 3 ( figure 1C), we dissected the effects of E→M 1 (E→M 2 ) and M 1 →D (M 2 →D). The comparison of scenario 4 ( figure 1D) was the same as scenario 3 ( figure 1C). In scenario 5 (figure 1E), the effects of E→M and M→D were excavated. Scenario 6 (figure 1F) was identical to scenario 3. We explored the biases when adjusting for mediators under the logistic regression model and thus identified the sensitivity of biases to the variation of the effects of exposure-mediator and mediator-outcome.
For each of the six simulation scenarios, we observed the biases of varying across distinct effects when adjusting for mediators using the logistic regression model with 1000 simulation repetitions. All simulations were conducted using software R from CRAN (http:// cran. rproject. org/). results scenario 1: one single mediator In figure 1A , E has a direct (E→D) effect and an indirect (E→M→D) effect on D. Figure 2A depicted that the bias of varying across the effect of E→M was clearly greater than the bias of varying across the effect of M→D. That is, the sensitivity of bias to the variation of the effect E→M was greater than that of the effect of M→D when adjusting for the mediator M using the logistic regression model. In particular, if the effect of E→M was specified to zero in figure 2B, M would be associated with D conditional on E and unconditionally independent with E, and M would become an independent risk factor of the outcome, as adjusting for M would obtain a positive 'bias'. Such bias was a consequence of the non-collapsibility of the OR, and the M-conditional ORs must be farther from 1 than the unconditional ORs. 31 32 In fact, both adjustment and non-adjustment for M should yield unbiased causal effect estimates. Certainly, in this case, both the marginal OR and conditional OR obtained from standardisation and inverse-probability weighting were equal to the total effect. 33 Moreover, figure 2A indicated that adjusting for mediator M was indeed biased to the total effect of the exposure on the outcome.
The total effect The effect (β ED|M (m)) of adjusting for mediator M by the logistic regression model can be given as follows: β 0 denotes coefficient of E adjusting for M using the logistic regression model. Furthermore, the effect of adjusting for M was equal to the controlled direct effect. 19 Therefore, the bias of adjusting for the mediator using the logistic regression model could be obtained that is bias = β ED|M (m) − β TE E→D . We added signs to the edges of the DAG to indicate the presence of a particular positive or negative effect in figure 3. Therefore, we gained bias<0 under the condition of β 1 *β 2 >0 (the effect E→M β 1 and the effect M→D β 2 ), indicating that the total effect of E on D was biased when adjusting for M using the logistic regression model in figure 3A, B, E and F. In addition, the bias was less than zero when the effect E→M (β 1 ) and the effect M→D (β 2 ) shared same signs (ie, both the effects E→M (β 1 >0) and M→D (β 2 >0) were a positive sign or both the effects E→M (β 1 <0) and M→D (β 2 <0) were a negative sign). Furthermore, we obtained bias>0, if β 1 *β 2 <0, suggesting that the total effect of E on D was biased when adjusting for M in figure 3C, D, G and H. In addition, the bias was greater than zero when the signs of the effects E→M (β 1 ) and M→D (β 2 ) were the opposite. The results illustrated that the bias was less than zero in the case in which the effects of exposure-mediator and mediator-outcome shared the same sign; the bias was greater than zero under the circumstance in which the effects of exposure-mediator and mediator-outcome had opposite signs. We also illustrated the case of figure 3C with the effects E→M and E→D as greater than zero and the effect M→D as less than zero in online supplementary B. More details of theoretical derivation can be found in online supplementary appendix. scenario 2: two series mediators Figure 1B is a depiction through two series mediators, decomposing total effects into direct effect (E→D) and indirect effect (E→M 1 →M 2 →D). The bias of varying across the effect of E→M 1 was greater than that of varying across the effect of M 2 →D under adjustment for M 1 , M 2 and M 1 M 2 together in figure 4, respectively. In this situation, the correlation of series mediators was strong enough to prevent M 2 from becoming an independent cause of the outcome. Figure 1C shows that the exposure E independently causes M 1 and M 2 and indirectly influences the outcome Open Access   scenario 4: two correlated parallel mediators There exist five paths from E to D: E→D, E→M 1 →D, E→M 2 →D, E→M 1 ←M 2 →D and E→M 2 →M 1 →D. In particular, the path E→M 1 ←M 2 →D is a blocked path, due to M 1 being a collider node. Figure 6A indicated that the bias of varying across the effect of E→M 1 was clearly greater than Open Access       Figure 1E provides a causal diagram representing the relationship among exposure E, outcome D, mediator M and unobserved confounder U. It revealed that the bias of varying across the effect of E→M was lower than that of varying across the effect of M→D. An unobserved confounder distorted the association between the exposure and outcome (E←U→D) in figure 7. scenario 6: two parallel mediators with an unobserved confounder As described above, figure 1F is a depiction of two parallel mediators M 1 and M 2 with an unobserved confounder U. For figure 8, the bias of varying across the effect of E→M 1 was clearly less than that of varying across the effect of M 1 →D under the adjustment for M 1 in figure 8A. However, the bias of varying across the effect of E→M 2 was greater than that of varying across the effect of M 2 →D under the identical model adjusting for M 1 . A similar result can also be obtained in figure 8B. In addition, biases of varying across the effects of E→M 1 and E→M 2 were distinctly less than those of varying across the effects of M 1 →D and M 2 →D under the common model of adjusting for M 1 and M 2 in figure 8C.

ApplIcAtIOn
In this analysis, we evaluated two statistical models (unadjusted and M adjusted) to assess the effect of diabetes on cardiovascular diseases under scenario 1. Information from 22 900 individuals were collected from the Health Management Centre of Shandong Provincial Hospital. All individuals were urban Han Chinese and more than 20 years of age, and they underwent a physical examination in 2013. Many studies focused on the associations between diabetes and metabolic syndrome 34 and between metabolic syndrome and cardiovascular disease. 35 The Open Access diagnosed with metabolic syndrome and takes a value of 0 otherwise. After adjusting for age and gender, using the logistic regression model obtained the total effect of diabetes E on cardiovascular diseases D equal to β=0.598 (95% CI 0.307 to 0.877). Then, the effect of adjusting for metabolic syndrome M was equal to β M =0.429 (95% CI 0.113 to 0.736). Therefore, the bias was, β M −β=−0.169<0, suggesting that the effect of E on D was underestimated when adjusting for the mediator M. This bias can have negative implications on the interpretation of the effects of diabetes on cardiovascular diseases. The adjustment for the mediator produced biased estimates, and adjustment was thus inappropriate and should have been avoided. A specific example was the adjustment for time-varying confounders that are also mediators using methods including standardisation, inverse-probability weighting and G-estimation. 36 That is, investigators should remember to consider biological and clinical information when specifying a statistical model.

dIscussIOn
In the paper, we dissected the sensitivity of bias to the variation of the effects of exposure-mediator and mediator-outcome when adjusting for mediators under the framework of the logistic regression model. In four scenarios (a single mediator in figure 1A of scenario 1, two series mediators in figure 1B of scenario 2, two independent parallel mediators in figure 1C of scenario 3 or two correlated parallel mediators in figure 1D of scenario 4), the bias of varying across the effect of exposure-mediator was greater than that of varying across the effect of mediator-outcome when adjusting for the mediator (figures 2, 4, 5 and 6). However, in two other scenarios (a single mediator or two independent parallel mediators in the presence of unobserved confounders in figure 1E of scenario 5 and figure 1F of scenario 6), the biases were more sensitive to the variation of the effect of mediator-outcome than the effect of exposure-mediator when adjusting for the mediator (figures 7 and 8).
Conditioning on a mediator is of concern in all areas of epidemiologic studies. 13 19 37 It indeed lead to bias in estimating the total effect of the exposure on the outcome. 8 22 23 Mediators and confounders are indistinguishable in terms of statistical association and conceptual grounds. 3 Most of the studies focus on the mediation effect analysis such as the calculation of direct effect and indirect effect. 20 21 38-41 Recently, some authors have used causal diagrams to describe how to appropriately handle matching variables. In addition, they have proven that matching on mediator M renders M and D independent (by design) in the matched study. Matching on variables that are affected by the exposure and the outcome, that is, mediators between the exposure and the outcome, would ordinary produce irremediable bias. Furthermore, matching on mediator M blocks the causal path E→M→D and thus produces unfaithfulness in estimating the total effect E on D. 31 42 Little effort has been made to learn the performances of biases when adjusting for a mediator in estimating the total effect of an exposure on an outcome. Our study results revealed that the biases were more sensitive to the variation of the effects of exposure-mediator than effects of mediator-outcome when adjusting for the mediator in the absence of the unobserved confounder in causal diagrams ( figure 1A-D). Nevertheless, for causal diagrams ( figure 1E, F), the biases were more sensitive to the variation of effects of mediator-outcome than the effects of exposure-mediator when adjusting for a mediator in the presence of the unobserved confounder. Therefore, the biases of varying across different effects depended on the causal diagrams framework and whether an unobserved confounder existed.
The causal diagrams depicted in figure 1 are indeed very simplistic and concise, as they exclude the confounding factors of E and M as well as M and D. In practical applications, there exist some confounders in each pair of relationships among E, M and D. In addition, our simulation study was not comprehensive enough to evaluate the bias performances when adjusting for the mediator under logistic regression because it considered only binary variables, certain scenarios of effect size and common types of models. In medical research, regression modelling is commonly used to adjust for covariates associated with both the outcome and exposure. In this paper, the biases are defined by the difference between M-adjusted and unadjusted ORs, some of which is attributable to the non-collapsibility of the OR. In the field of causal inference, standardisation and inverse-probability weighting may obtain a different bias from that of regression modelling, and they may be better alternatives to calculate bias. 4 5 Therefore, in future research, the methods of standardisation and inverse-probability weighting could be used to calculate the biases of this paper definition. Future research should further reinforce the mechanisms and conceptual frameworks of confounders and mediators from causal diagrams to avoid falling into analytic pitfalls.

cOnclusIOn
In conclusion, the sensitivity of biases to the variation of the effects of exposure-mediator and mediator-outcome was related to whether there was an unobserved confounder in causal diagrams. The biases were more sensitive to the variation of the effects of exposure-mediator than the effects of mediator-outcome when adjusting for the mediator in the absence of unobserved confounders, while the biases were more sensitive to the variation of the effects of mediator-outcome than the effects of exposure-mediator in the presence of unobserved confounders.