random-effects-within-between-effects-model.Rmd

---
title: "Fixed and Random Effects Models"
author: "Daniel Lüdecke"
date: "22 May 2019"
output: 
  html_document: 
    theme: cerulean
    toc: yes
bibliography: random-effects-within-between-effects-model.bib
---

```{r setup, include=FALSE,echo=FALSE}
library(knitr)
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE, 
  warning = FALSE,
  message = FALSE,
  comment = "#>", 
  dev = "png"
)
```

![cc](cc-attrib-nc.png)
<!---
(http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png)
--->

This document is licensed under the 
[Creative Commons attribution-noncommercial license](http://creativecommons.org/licenses/by-nc-sa/2.5/ca/).
Please share \& remix noncommercially, mentioning its origin. Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets).

## The violation of model-assumptions in RE-models for panel data

This example shows how to address the issue when group factors (random effects) and (time-constant) predictors correlate for mixed models, especially in panel data. Models, where predictors and group factors correlate, may have compromised estimates of uncertainty as well as possible bias. In particular in econometrics, fixed-effects models are considered the gold-standard to address such issues. However, it often makes no sense to consider group-effects as "fixed" over a long time period. Apart from this, there are more shortcomings of FE-models as well, see [@bell_fixed_2018], [@bell_understanding_2018] and [@bafumi_fitting_2006].

The following equations and discussion on FE vs. RE are based on [@bell_fixed_2018]. Further  discussion on FE vs. RE, also at the end of this document, refer to [@gelman_data_2007] and [@bafumi_fitting_2006].

### Adding group meaned predictors to solve this issue

The solution to the critics from "FE-modelers" is simple: If you include a group-mean of your variables in a random effects model (that is, calculating the mean of the predictor at each group-level and including it as a group-level predictor), it will give the same answer as a fixed effects model (see  table 3 very below, and [@bell_understanding_2018] as reference). This is why FE-modelers often call this type of RE-model also a "kind of" FE-model, i.e. they define a RE model as a model where predictors are assumed uncorrelated with the residuals. However,

> "Calling it a FE model is not just inaccurate. It also does down its potential. Eg FE models don’t usually include random slopes, and failing to do so can lead to incorrect SEs as well as being a less interesting and realistic model."

>  "A random effects model is such because it has random effects (that is, higher-level entities treated as a distribution) in it rather than fixed effects (higher-level entities treated as dummy variables) in it."

source: [Twitter-Discussion 1](https://twitter.com/AndrewJDBell/status/1026764338370105344), [Twitter-Discussion 2](https://twitter.com/AndrewJDBell/status/1026764347480178689)

### Problems of ignoring random slopes in Fixed Effects models

[@heisig_costs_2017] demonstrate how ignoring random slopes, i.e. neglecting "cross-cluster differences in the effects of lower-level controls reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power". You may refer to this paper to justify a mixed model with random slopes over "simple" FE-models.

## Examples

The following code snippets show how to translate the Equations from [@bell_fixed_2018] into R-code, using `lmer()` from the **lme4**-package.

```{r message=FALSE}
library(lme4)
library(sjPlot)
library(parameters)
library(lfe)

load("example.RData")
```

_Sourcecode and data are available [here](https://github.com/strengejacke/mixed-models-snippets)._

## Description of the data

* Variables:
  * `x_tv`  : time-varying variable
  * `z1_ti` : first time-invariant variable, co-variate
  * `z2_ti` : second time-invariant variable, co-variate
  * `QoL`   : Response (quality of life of patient)
  * `ID`    : patient ID
  * `time`  : time-point of measurement


## "Classical" growth-model for longitudinal data

```{r}
model_re <- lmer(
  QoL ~ time + age + x_tv + z1_ti + z2_ti + (1 + time | ID),
  data = d
)
```

## Computing the de-meaned and group-meaned variables

Next is a model from Eq. 10, which includes the _"de-meaned"_ time-varying variable as well as the _"group-meaned"_ time-varying variable.

```{r}
# compute mean of "x_tv" for each subject (ID) and
# then "de-mean" x_tv
d <- cbind(
  d,
  demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters"
)
```

Now we have:

  * `x_tv_between` : time-varying variable with the mean of `x_tv` accross all time-points, for each patient (ID).
  * `x_tv_within` : the de-meaned time-varying variable `x_tv`

`QoL_between` and `QoL_within` are used to test different FE-models, which are described later. In those models, I also use a "de-meaned" response variable without the group-variable (`ID`) as fixed effect (see Equation 6 in the paper).

## The complex random-effect-within-between model (REWB)

Eq. 10 suggests allowing the "within-effect" (de-meaned) vary across individuals, that's why `x_tv_within` is added as random slope as well.

Here, the estimate of `x_tv_within` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _between-subject_ effect. This model also allows for heterogenity across level-2 units, that's why `x_tv_within` also appears in the random effects. The estimates of `z1_ti`
 and `z2_ti` also indicate a _between-subject_ effect, as this is a level-2 variable, which cannot have a within-subject effect.


### Model from Equation 10

Here is the equation 10 from Bell et al. 2018:

```{r echo=FALSE}
f <- "y<sub>it</sub> = &beta;<sub>0</sub> + &beta;<sub>1W</sub> (x<sub>it</sub> - &#x035E;x<sub>i</sub>) + &beta;<sub>2B</sub> &#x035E;x<sub>i</sub> + &beta;<sub>3</sub> z<sub>i</sub> + &upsilon;<sub>i0</sub> + &upsilon;<sub>i1</sub> (x<sub>it</sub> - &#x035E;x<sub>i</sub>) + &epsilon;<sub>it</sub>"
knitr::asis_output(f)
```

```{r echo=FALSE}
f <- "<ul><li>x<sub>it</sub> - &#x035E;x<sub>i</sub> is the de-meaned predictor, <em>x_tv_within</em></li><li>&#x035E;x<sub>i</sub> is the group-meaned predictor, <em>x_tv_between</em></li><li>&beta;<sub>1W</sub> is the coefficient for x_tv_within (within-subject)</li><li>&beta;<sub>2B</sub> is the coefficient for x_tv_between (bewteen-subject)</li><li>&beta;<sub>3</sub> is the coefficient for `z1_ti` or `z2_ti` (bewteen-subject)</li></ul>"
knitr::asis_output(f)
```

```{r}
# This model this leads to an error (number of observations <= number of 
# random effects), the check for nobs vs. re is ignored here.
model_rewb <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within | ID),
  data = d,
  control = lmerControl(check.nobs.vs.nRE = "ignore")
)

# An alternative could be to model the random effects as not correlated.
# m2b <- lmer(
#   QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time + x_tv_within || ID),
#   data = d
# )

# here we get no error message, model runs fine...
model_complex_rewb <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 
    (1 + time | ID) + (1 + x_tv_within | ID),
  data = d
)

# an alternative would be to assume independence between random slopes
# and no covariance...
model_complex_rewb_2 <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 
    (1 + time | ID) + (0 + x_tv_within | ID),
  data = d
)
```


We compare all model fits, but we go on with `model_complex_rewb` (the second model in the table below) for now...

#### Table 1: Comparison of complex REWB-Models

```{r message=FALSE}
tab_model(
  model_rewb, model_complex_rewb, model_complex_rewb_2,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Complex REWB (1)", "Complex REWB (2)", "Complex REWB (3)")
)
```


## The simple random-effect-within-between model (REWB) and Mundlak model

After email correspondance, the paper's authors suggest that, depending on the research interest and necessary complexity of the model, a "simple" random-slope might be suitable as well. As stated in the paper, this is useful if homogenity across level-2 units is assumed. This model usually yields the same results as a FE-model, however, we additionally have information about the random effects - and the model can incorporate time-invariant covariates.

Again, the estimate of `x_tv_within` indicates the within-subject effect, while the estimate of `x_tv_between` indicates the between-subject effect.

### Model from Equation 2

```{r echo=FALSE}
f <- "y<sub>it</sub> = &beta;<sub>0</sub> + &beta;<sub>1W</sub> (x<sub>it</sub> - &#x035E;x<sub>i</sub>) + &beta;<sub>2B</sub> &#x035E;x<sub>i</sub> + &beta;<sub>3</sub> z<sub>i</sub> + (&upsilon;<sub>i</sub> + &epsilon;<sub>it</sub>)"
knitr::asis_output(f)
```

```{r}
model_simple_rewb <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + (1 + time | ID),
  data = d
)
```

An alternativ would be the **Mundlak** model. Here, the estimate of `x_tv` indicates the _within-subject_ effect, while the estimate of `x_tv_between` indicates the _contextual_ effect.

### Model from Equation 3

```{r echo=FALSE}
f <- "y<sub>it</sub> = &beta;<sub>0</sub> + &beta;<sub>1W</sub> x<sub>it</sub> + &beta;<sub>2C</sub> &#x035E;x<sub>i</sub> + &beta;<sub>3</sub> z<sub>i</sub> + (&upsilon;<sub>i</sub> + &epsilon;<sub>it</sub>)"
knitr::asis_output(f)
```

```{r}
model_mundlak <- lmer(
  QoL ~ time + age + x_tv + x_tv_between + z1_ti + z2_ti + (1 + time | ID),
  data = d
)
```

The contextual effect, i.e. the coefficient for `x_tv_between`, indicates the effect of an individual (at level 1) that moves from one level-2 group into another one. In the above model, or in general: in case of longitudinal data, the contextual effect is meaningless, as level-2 predictors are individuals (or subjects) themselves, and by definition cannot "move" to another individual. Therefor, the REWB-model is more informative.

## Comparison of models

In table 2, we compare the "classical" RE-model, the complex REWB-model, the "simple" REWB-model and the Mundlak-Model.

#### Table 2: Comparison of RE, REWB and Mundlak Models

```{r message=FALSE}
tab_model(
  model_re, model_complex_rewb, model_simple_rewb, model_mundlak,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Classical RE", "Complex REWB", "Simple REWB", "Mundlak")
)
```


## Check if a REWB- or simple RE-model suitable

If the estimates of the within- and between-effect (`x_tv_within` and `x_tv_between`) are (almost) identical, or if the contextual effect (`x_tv_between`) in the **Mundlak**-model is zero and doesn't give a significant improvement for the model, you can also use a simple RE-model.

A simple way to check this is a likelihood-ratio test between the simple RE-model and the Mundlak-model:

```{r}
anova(model_re, model_mundlak)
```

Here we see a significant improvement of the Mundlak-model over the simple RE-model, indicating that it makes sense to model within- and between-subjects effects, i.e. to apply a REWB-model.

## Comparison FE- and REWB-Model

The function `felm()` from the package **lfe** was use to compute the fixed effects regression models. Base R's `lm()` gives the same result, however, the output is much longer due to the ID-parameters.


```{r echo=FALSE}
f <- "y<sub>it</sub> = &beta;<sub>1</sub> (x<sub>it</sub> - &#x035E;x<sub>i</sub>) + (&upsilon;<sub>i</sub> + &epsilon;<sub>it</sub>)"
knitr::asis_output(f)
```

```{r}
# Model from Equation 5
model_fe_ID <- felm(
  QoL ~ time + x_tv_within | ID,
  data = d
)

# same as this lm-model
# model_fe_ID <- lm(
#   QoL ~ 0 + time + x_tv_within + ID,
#   data = d
# )
```

Equation 6 describes a fixed effects model with de-meaned dependent variable.

```{r echo=FALSE}
f <- "(y<sub>it</sub> - &#x035E;y<sub>i</sub>) = &beta;<sub>1</sub> (x<sub>it</sub> - &#x035E;x<sub>i</sub>) + &epsilon;<sub>it</sub>"
knitr::asis_output(f)
```

```{r}
# Model from Equation 6
model_fe_y_within <- felm(
  QoL_within ~ time + x_tv_within,
  data = d
)

# or ...
# model_fe_y_within <- lm(
#   QoL_within ~ 0 + time + x_tv_within,
#   data = d
# )
```

We compare the results from the FE-models with a simple RE-model and the REWB-model.

```{r}
model_re_2 <- lmer(
  QoL ~ time + x_tv_within + x_tv_between + (1 | ID),
  data = d
)

# Compare with complex REWB-model
model_complex_rewb3 <- lmer(
  QoL ~ time + x_tv_within + x_tv_between +
    (1 + time | ID) + (1 + x_tv_within | ID),
  data = d
)
```

As we can see, the estimates of the FE-models and the RE-model are identical. However, the estimates from the REWB-model differ. This is because the time-varying predictors, the within-subject effect `x_tv_within`, is allowed to vary between subjects as well (i.e. it is modelled as random slope).

#### Table 3: Comparison of FE- and RE-models

```{r message=FALSE}
tab_model(
  model_fe_ID, model_fe_y_within, model_re_2, model_complex_rewb3,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("FE-model with ID", "FE, de-meaned Y (with Intercept)", "RE", "Complex REWB")
)
```

## Comparison with the panelr-package

The [panelr-package](https://panelr.jacob-long.com/) provides functions to fit models similar to those suggested by Bell et al. 2018, especially the "simple REWB model" (`model_simple_rewb`) and the Mundlak-model (`model_mundlak`). For the complex REWB model (`model_complex_rewb`), I needed some slight modification when using `panelr::wbm()`, so the following model `model_complex_rewb_panelr` that mimics the complex REWB model is similar to the above model `model_rewb`.

Here we compare the results from `panelr::wbm()` with our previous models `model_complex_rewb` (complex REWB), `model_simple_rewb` (simple REWB) and `model_mundlak` (Mundlak).

```{r}
library(panelr)

# prepare the data for processing with "panelr"
pd <- panel_data(d, id = ID, wave = time)

# the complex REWB-model
model_complex_rewb_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time  | (time + x_tv | ID), 
           data = pd, 
           control = lmerControl(check.nobs.vs.nRE = "ignore"))

# the simple REWB-model
model_rewb_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time | ID), data = pd)

# the Mundlak model
model_mundlak_panelr <- wbm(QoL ~ x_tv | age + z1_ti + z2_ti + time | (time | ID), data = pd, model = "contextual")

tab_model(
  model_complex_rewb_panelr, model_rewb_panelr, model_mundlak_panelr,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Complex REWB", "Simple REWB", "Mundlak")
)
```

```{r}
# compare with other models
tab_model(
  model_complex_rewb, model_simple_rewb, model_mundlak,
  show.ci = FALSE, 
  show.se = TRUE, 
  auto.label = FALSE, 
  string.se = "SE",
  show.icc = FALSE,
  dv.labels = c("Complex REWB", "Simple REWB", "Mundlak")
)
```


As we can see, coefficients, standard errors and p-values of all relevant parameters are identical for the simple REWB and Mundlak models from both packages (`panelr::wbm()` and `lme4::lmer()`). This confirms the correct "translation" of the formulae from Bell at al. 2018 into `lmer()`-syntax.

The complex REWB models are also (almost) identical, the minor variation after the second fractional part is most likely due to the slightly different random effects specification.

## Conclusion

When group factors (random effects) and (time-constant) predictors correlate, it's recommended to fit a complex random-effect-within-between model (REWB) instead of a "simple" mixed effects model. This requires de- and group-meaning the time-varying predictors. Depending on the data structure, random slope and intercept may correlate or not.

The random effects structure, i.e. how to model random slopes and intercepts and allow correlations among them, depends on the nature of the data. The benefits from using mixed effects models over fixed effects models are more precise estimates (in particular when random slopes are included) and the possibility to include between-subjects effects.

In case of convergence problems or singular fits, note that changing the optimizer might help. In this context, some models ran fine in _lme4_, while other models that had problems being fitted in _lme4_ ran without any problems in [**glmmTMB**](random-effects-within-between-effects-model-glmmtmb.html)

```{r eval=FALSE}
# compute group-mean of "x_tv" for each subject (ID) and
# then "de-mean" x_tv
d <- cbind(
  d,
  demean(d, select = c("x_tv", "QoL"), group = "ID") # from package "parameters"
)

# fit complex REWB-model
m <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 
    (1 + time | ID) + (1 + x_tv_within | ID),
  data = d
)

# an alternative would be to assume independence between random slopes
# and no covariance...
m <- lmer(
  QoL ~ time + age + x_tv_within + x_tv_between + z1_ti + z2_ti + 
    (1 + time | ID) + (0 + x_tv_within | ID),
  data = d
)
```

* `x_tv_within` indicates the _within-subject_ effect
* `x_tv_between` indicates the _between-subject_ effect
* `z1_ti` and `z2_ti` also indicate a _between-subject_ effect

## Further critics of the FE-approach

(source: http://andrewgelman.com/2012/04/02/fixed-effects-and-identification/)

> "But the so-called fixed effects model does not in general minimize bias. It only minimizes bias under some particular models. As I wrote above, 'it’s just another model.' Another way to see this, in the time-series cross-sectional case, is to recognize that there’s no reason to think of group-level coefficients as truly 'fixed'. One example I remember was a regression on some political outcomes, with 40 years of data for each of 50 states, where the analysis included 'fixed effects' for states. I’m sorry but it doesn’t really make sense to think of Vermont from 1960 through 2000 as being 'fixed' in any sense."

> "I just don’t see how setting the group-level variance to infinity can be better than estimating it from the data or setting it to a reasonable finite value. That said, the big advantage of multilevel (“random effects”) modeling comes when you are interested in the varying coefficients themselves, or if you’re interested in predictions for new groups, or if you want the treatment effect itself to vary by group. On a slightly different note, I’m unhappy with many of the time-series cross-sectional analyses I’ve seen because I don’t buy the assumption of constancy over time. That is, I don’t really think those effects are “fixed”!"

> "I don’t know that there’s anything much that’s time-invariant in what I study. But, in any case, the so-called fixed-effects analysis is mathematically a special case of multilevel modeling in which the group-level variance is set to infinity. I agree that there’s no need to “believe” that model for the method to work; however, I think it works because of some implicit additivity assumptions. I’d prefer to (a) allow the group-level variance to be finite, and (b) work in the relevant assumptions more directly."

## Further Readings

- Discussion at [Cross-Validted](https://stats.stackexchange.com/q/100227)

# References