Member-only story
In this blog, we focus on understanding and comparing the generative processes of analysis of variance (ANOVA) vs. linear mixed effects regression model (lmer) for discrete explanatory variables. Even though there has been an overall shift from ANOVA to lmer model in the community of science, it is still helpful to first look at the data generating process of ANOVA.
Here we focus on the discrete explanatory variables (see this sister blog for the continuous variables). It is quite intuitive when the discrete variable has only two levels. We can use a binary representation (e.g., 1
for the presence of one level and 0
for the other level). Alternatively, we can use a sum-to-zero contrast coding format where the existence of one level is coded as 1
and the other level is coded as -1
. It is recommended to use the sum-to-zero contrast coding, since it allows us to directly assess the effect of each level in the explanatory variables against the mean estimate.
But when the discrete predictors have multiple levels, we need a more general way to simulate the data and build the model. Before that, we first need to understand the design matrix (X). A design matrix indicates which effect is present for discrete explanatory variables and what amount of an effect is present in the case of continuous explanatory variables (Kéry…