Skip to main content
Log in

Testing for Measured Gene-Environment Interaction: Problems with the use of Cross-Product Terms and a Regression Model Reparameterization Solution

  • Original Research
  • Published:
Behavior Genetics Aims and scope Submit manuscript

Abstract

The study of gene-environment interaction (G × E) has garnered widespread attention. The most common way to assess interaction effects is in a regression model with a G × E interaction term that is a product of the values specified for the genotypic (G) and environmental (E) variables. In this paper we discuss the circumstances under which interaction can be modeled as a product term and cases in which use of a product term is inappropriate and may lead to erroneous conclusions about the presence and nature of interaction effects. In the case of a binary coded genetic variant (as used in dominant and recessive models, or where the minor allele occurs so infrequently that it is not observed in the homozygous state), the regression coefficient corresponding to a significant interaction term reflects a slope difference between the two genotype categories and appropriately characterizes the statistical interaction between the genetic and environmental variables. However, when using a three-category polymorphic genotype, as is commonly done when modeling an additive effect, both false positive and false negative results can occur, and the nature of the interaction can be misrepresented. We present a reparameterized regression equation that accurately captures interaction effects without the constraints imposed by modeling interactions using a single cross-product term. In addition, we provide a series of recommendations for making conclusions about the presence of meaningful G × E interactions, which take into account the nature of the observed interactions and whether they map onto sensible genotypic models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bakermans-Kranenburg MJ, van Ijzendoorn MH (2006) Gene-environment interaction of the dopamine D4 receptor (DRD4) and observed maternal insensitivity predicting externalizing behavior in preschoolers. Dev Psychobiol 48(5):406–409

    Article  PubMed  Google Scholar 

  • Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, Harrington H, McClay J, Mill J, Martin J, Braithwaite A, Poulton R (2003) Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science 301(5631):386–389

    Article  PubMed  Google Scholar 

  • Duncan L, Keller MC (2011) A critical review of the first 10 years of measured gene-by-environment interaction research in psychiatry. Am J Psychiatry 168(10):1041–1049

    Article  PubMed Central  PubMed  Google Scholar 

  • Keller M (2014) Gene × Environment interaction studies have not properly controlled for potential confounders: the problem and the (simple) solution. Biol Psychiatry 75(1):18–24

    Article  PubMed  Google Scholar 

  • Khoury MJ, Wacholder S (2009) Invited commentary: from genomewide association studies to gene-environment-wide interaction studies—challenges and opportunities. Am J Epidemiol 169(2):227–230

    Article  PubMed Central  PubMed  Google Scholar 

  • Kleinbaum DG, Klein M, Pryor ER (2002) Logistic regression: a self-learning text, 2nd edn. Springer, New York. ISBN 0387953973

    Google Scholar 

  • Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates Inc, Sunderland

    Google Scholar 

  • Murcray CE, Lewinger JP, Gauderman WJ (2009) Genome-environment interaction in genome-wide association studies. Am J Epidemiol 169(2):219–226

    Article  PubMed Central  PubMed  Google Scholar 

  • Petersen IT, Bates JE, Goodnight JA, Dodge KA, Lansford JE, Pettit GS, Latendresse SJ, Dick DM (2012) Interaction between serotonin transporter polymorphism (5-HTTLPR) and stressful life events in adolescents’ trajectories of anxious/depressed symptoms. Dev Psychol 48(5):1463–1475

    Article  PubMed Central  PubMed  Google Scholar 

  • Purcell S (2002) Variance components models for gene-environment interaction in twin analysis. Twin Res 5(6):554–571

    Article  PubMed  Google Scholar 

  • Van Hulle CA, Lahey BB, Rathouz PJ (2013) Operating characteristics of alternative statistical methods for detecting gene-by-measured environment interaction in the presence of gene-environment correlation in twin and sibling studies. Behav Genet 43(1):71–84

    Article  PubMed Central  PubMed  Google Scholar 

  • Widaman KF, Helm JL, Castro-Schilo L, Pluess M, Stallings MC, Belsky J (2012) Distinguishing ordinal and interactions. Psychiolog Methods 17(4):615–622

    Article  Google Scholar 

Download references

Acknowledgments

This works grows out of a program of research supported by K02AA018755 (to DMD). MCN was supported by R37DA18673.

Conflict of interest

The authors have no conflicts of interest to disclose.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fazil Aliev.

Additional information

Edited by Stacey Cherny.

Appendices

Appendix A

Here we delineate the specific conditions under which the traditional cross-product approach yields accurate estimates for a G variable with three levels.

Suppose that we have groups of individuals with three distinct genotypes (G = 0, 1 and 2 meaning each group has 0, 1 or 2 copies of the reference allele, respectively). Assume that there is a linear regression in each group between the phenotype and the environment.

$$ G = 0{\text{ group has the relation }}P|(G = 0) = \delta_{00} + \delta_{0 1} E $$
(A1)
$$ G = 1 {\text{ group}}\,{\text{has the relation }}P|(G = 1) = \delta_{ 10} + \delta_{ 1 1} E $$
(A2)
$$ G = 2{\text{ group has the relation }}P|(G = 2) = \delta_{20} + \delta_{ 2 1} E. $$
(A3)

The aim is to determine the δ ij coefficients using the G × E additive regression model for all groups of individuals/genotypes from Eq. (1). When β i coefficients of (1) are available for all individuals, for the subset of individuals with G = 0, equating G = 0 in (1) we have:

$$ P|(G = 1) = \beta_{0} + \beta_{2} E. $$

Combining this with Eq. (A1), which also represents the G = 0 group, we get:

$$ P|(G = 1) = \beta_{0} + \beta_{2} E = \delta_{00} + \delta_{01} E. $$

Accordingly, δ 00 = β 0 and δ 01 = β 2. Note that β 0 and β 2 coefficients are already defined by δ 00 and δ 01 as β 0 = δ 00 and β 2 = δ 01. Using these values in Eq. (1) changes it to the equivalent equation:

$$ P = \delta_{00} + \beta_{1} G + \delta_{01} E + \beta_{3} G \times E. $$
(A4)

Similarly, for the case where G = 1, using Eq. (A4) now instead of Eq. (1) and combining with Eq. (A2), the regression line specific to the G = 1 group yields:

$$ P|(G = 1) = \delta_{00} + \beta_{1} \cdot 1 + \delta_{01} E + \beta_{3} E = (\delta_{00} + \beta_{1} ) + (\delta_{01} + \beta_{3} )E = \delta_{10} + \delta_{11} E. $$

Hence δ 10 = δ 00 + β 1 and δ 11 = δ 01 + β 3 or β 1 = δ 10 − δ 00 and β 3 = δ 11 − δ 01.

This means Eq. (A4) changes to

$$ P = \delta_{00} + \left( {\delta_{ 10} - \delta_{00} } \right)G + \delta_{0 1} E + \left( {\delta_{ 1 1} - \delta_{0 1} } \right)G \times E. $$
(A5)

Accordingly, for the case where G = 2, applying Eqs. (A5) and (A3) yields

$$ P|(G = 2) = \delta_{00} + \left( {\delta_{ 10} - \delta_{00} } \right) \cdot 2 + \delta_{0 1} E + \left( {\delta_{ 1 1} - \delta_{0 1} } \right) \cdot 2 \cdot E = \delta_{20} + \delta_{21} E. $$

Accordingly,

$$ \delta_{20} = 2\delta_{ 10} - \delta_{00} , $$
(A6)
$$ \delta_{21} = 2\delta_{ 1 1} - \delta_{01} . $$
(A7)

This means that the six δ ij coefficients are not free because there is a linear dependency among them. Only when Eqs. (A6) and (A7) are satisfied are Eqs. (A1)–(A3) consistent. This means that only under very specific conditions will the use of the traditional cross-product interaction term yield the appropriate regression lines to model those interactions observed in the data. These conditions, dictated by Eqs. (A6)–(A7), are equivalent to

$$ \delta_{20} - \delta_{00} = 2(\delta_{ 1 0} - \delta_{00} ),\delta_{21} - \delta_{01} = 2(\delta_{ 1 1} - \delta_{01} ). $$
(A8)

Appendix B

In this section we generate gene (G) variable under Hardy–Weinberg condition and phenotype (P) variable using random environment (E) variable and then run tests for interaction with the two parameterizations explained in the paper (Eqs. 1 and 3).

Appendix C

Consider the linear regression model \( y = \beta_{0} + \beta_{1} x_{1} + \cdots + \beta_{p} x_{p} + \varepsilon \) with dependent variable y and independent variables x 1, ···, x p and assume that error term ε is normally distributed. Denote β = (β 0, β 1, ···, β p )′, where ′ represents the transpose sign. Having a sample of size n with values of y i , x 1i , ···, x pi (i = 1,···, n) the least squares estimate for β can be expressed as β′ = (X′X)−1 X′y,where y = (y1,···,y n )′ and X is the matrix containing n rows (1, x 1i, ···, x pi), i = 1, ···, n.

To test the hypothesis H 0: b′β = 0 vs H 1: b′β ≠ 0 with b = (b 0, b 1, ···, b p )′ being a contrast vector, note that b′β = b′(X′X)−1 X′y has a t-distribution with standard error \( s\sqrt {{\mathbf{b}}^\prime \left( {{\mathbf{X}}^\prime{\mathbf{X}}} \right)^{ - 1} {\mathbf{b}}} \) under the null hypothesis. s 2 is the estimated residual variance. So, for the test we can either use the standardized t-distribution or use the fact that the corresponding Wald statistic \( {\mathbf{b}}^\prime {\varvec{\upbeta}}(s^{ 2}{\mathbf{b}}^\prime \left( {{\mathbf{X}}^\prime {\mathbf{X}}}\right){\mathbf{b}})^{ - 1} {\mathbf{b}}^\prime {\varvec{\upbeta}} \) has asymptotic χ 2 distribution.

The linearHypothesis function from the R car package tests contrasts i.e. linear combinations of model coefficients. The linearHypothesis function tests linear hypotheses and methods for linear models, generalized linear models, multivariate linear models, linear and generalized linear mixed-effects models. linearHypothesis computes either a finite-sample F statistic or asymptotic Chi squared statistic for carrying out a Wald-test-based comparison between a model and a linearly restricted model. For mixed-effects models, the tests are Wald Chi square tests for the fixed effects.

Assuming that variables are named as G-Gene, E-Environment, G_E = G*E and G2_E = G2*E and linear model results are named as model_result to test Hypotheses A = 0 (which is same as \( \gamma_{3} + \gamma_{5} = 0 \)), B = 0 and C = 0 explained with Eqs. (4) we can use respectively. Note that “car” package needs to be installed in order to use linearHypothesis function.

Alternatively, we provide an R function which is based on the t-distribution for direct calculation of the contrast test statistic and p-value.

Now assuming that variables are named as G_E = gene*env and G2_E = gene2*env and linear model results are named as model_result to test Hypotheses A, B and C from Eqs. (4) we can use respectively.

The next function tests all three tests for the gene x environment effect and returns an overall p-value.

R example with simulation:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aliev, F., Latendresse, S.J., Bacanu, SA. et al. Testing for Measured Gene-Environment Interaction: Problems with the use of Cross-Product Terms and a Regression Model Reparameterization Solution. Behav Genet 44, 165–181 (2014). https://doi.org/10.1007/s10519-014-9642-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-014-9642-1

Keywords

Navigation