-
Notifications
You must be signed in to change notification settings - Fork 41
/
Copy pathextensions.Rmd
941 lines (702 loc) · 41.5 KB
/
extensions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
```{r chunk_setup-ext, include=FALSE, eval=TRUE}
knitr::opts_chunk$set(
# cache
cache.rebuild = FALSE,
cache = TRUE
)
```
# Common Extensions
<br>
```{r top-plot-crossed, echo=FALSE, cache.rebuild=TRUE}
tags$div(
style = "width:50%; margin:auto auto; font-size:50%",
DiagrammeR::grViz(
'scripts/crossed.gv',
width = '100%',
height = '33%'
)
)
```
<br>
## Additional Grouping Structure
### Cross-classified models
Oftentimes there will be additional sources of variance beyond one grouping factor. Consider as an example, a visual perception experiment where there are multiple trials for each individual, to go along with specific images displayed to all individuals. Such data might look like this.
```{r demodata_crossed, echo=FALSE}
crossing(Person = 1:20, Image = letters[1:10]) %>%
mutate(Score = sample(1:10, 200, replace = TRUE)) %>%
DT::datatable(
options = list(
dom = 'tp',
autoWidth = F,
columnDefs = list(
list(width = '10px', targets = 0:2),
list(className = 'dt-center', targets = 0:2)
)
),
rownames = F,
width = 250
)
```
<br>
<br>
In this scenario, we have observations clustered within both person and image, but person and image are not nested within one another- all participants see all 10 images. Such a situation is typically referred to as having *crossed* random effects, which pretty much just means non-nested, or even more to the point, 'default'. The larger issue for the situations we'll look at next is that we will have multiple sources variances to consider.
#### Example: Student achievement
For this demonstration we'll look at achievement scores for students. The sources of dependency are due to students having gone to the same primary or secondary schools. However, in this example, going to a primary school doesn't necessarily mean you'll go to a specific secondary school. Note also that there are no repeated measures, we see each student only once. Here's a quick look a the data, and for more detail, check the [appendix][Data].
```{r pupil_nurses_setup, echo=FALSE, eval=FALSE}
pupils = read_sav('data/raw_data/joop_hox_data2/9 CrossClass/pupcross.sav') %>%
as_factor() %>%
mutate(ACHIEV = as.numeric(as.character(ACHIEV)),
PUPSEX = factor(PUPSEX, labels = c('male', 'female'))) %>%
rename(
achievement = ACHIEV,
primary_school_id = PSCHOOL,
secondary_school_id = SSCHOOL,
sex = PUPSEX,
ses = PUPSES,
primary_denominational = PDENOM,
secondary_denominational = SDENOM
)
save(pupils, file='data/pupils.RData')
nurses = read_sav('data/raw_data/joop_hox_data2/2 Basic Model/nurses.sav') %>%
as_factor() %>%
rename(experience = experien,
treatment = expcon,
sex = gender) %>%
mutate(
treatment = factor(treatment, labels = c('Ctrl', 'Training')),
age = as.numeric(as.character(age)),
sex = factor(sex, labels = c('Male', 'Female'))
) %>%
select(-starts_with('Z'),-starts_with('C'))
save(nurses, file = 'data/nurses.RData')
```
```{r examine_pupil_data, echo=1}
load('data/pupils.RData')
DT::datatable(pupils,
options = list(
dom = 'tp',
scrollX = T,
autoWidth = T
),
# columnDefs = list(list(width = '150px', targets = 1),
# list(width = '100px', targets = 3))),
rownames = F)
```
<br>
<br>
For our mixed model, we'll look at the effects of `sex` and socioeconomic status, `ses`, a six level variable from low to high, on scholastic achievement. The range of achievement scores is roughly `r round(min(pupils$achievement))` to `r round(max(pupils$achievement))`, with mean of `r round(mean(pupils$achievement), 1)` and standard deviation `r round(sd(pupils$achievement), 1)`. We'll take into account the clustering at primary school and secondary school. To incorporate the additional structure in <span class="pack">lme4</span> syntax is very easy- we just do as we did before, though now for both grouping factors[^crossed_notation].
```{r cross_classified, eval=1}
pupils_crossed = lmer(
achievement ~
sex
+ ses
+ (1|primary_school_id)
+ (1|secondary_school_id),
data = pupils
)
summary(pupils_crossed, correlation = FALSE)
```
```{r cross_classified_fixed, echo=FALSE}
pupils_crossed %>%
extract_fixed_effects() %>%
select(-(t:p_value)) %>%
kable_df()
```
The fixed effects tell us there is a positive effect of being female on achievement, and in general, relative to lowest SES category, being in the upper categories of SES also has a positive effect. Now for the random effects.
```{r cross_classified_random, echo=FALSE}
crossed_var_cor = extract_vc(pupils_crossed, ci_level = 0)
crossed_var_cor %>%
kable_df(digits = 2)
```
When we look at the variance components, we see that primary and secondary school contributes about `r perc(sum(crossed_var_cor$variance[1:2])/sum(crossed_var_cor$variance), digits = 0)` of the total variance. Most of the variance attributable to school comes from the primary school.
If we inspect the random effects, we can see that we now have two sets of effects- `r n_distinct(pupils$primary_school_id)` for the primary schools, and `r n_distinct(pupils$secondary_school_id)` for the secondary. Both would be incorporated into any pupil-specific prediction.
```{r crossed_re}
glimpse(ranef(pupils_crossed))
```
Let's look at them visually using <span class="pack">merTools</span>.
```{r crossed_re_plot, echo=FALSE}
p = merTools::plotREsim(merTools::REsim(pupils_crossed)) +
theme_clean()
p
# multiplots with patchwork still do non-transparent background
# library(patchwork)
# p1 = plot_coefficients(pupils_crossed, ranef=T, which_ranef='primary_school_id') +
# labs(x='Primary School') +
# lims(y = c(-1.5,1.5)) +
# theme(axis.ticks = element_blank(),
# axis.text.x = element_blank())
# p2 = plot_coefficients(pupils_crossed, ranef=T, which_ranef='secondary_school_id') +
# labs(x='Secondary School', y='') +
# lims(y = c(-1.5,1.5)) +
# theme(axis.ticks = element_blank(),
# axis.text.x = element_blank(),
# axis.text.y = element_blank())
# {p1 + p2}
```
The above visualization allows us to actually see less variability among students for secondary schools. Note that we have the usual extensions here if desired. As an example, we could also do random slopes for student characteristics.
### Hierarchical structure
Now that we have looked at cross-classified models, we can proceed to examine *hierarchical* cluster structuring. In this situation, we have clusters nested within other clusters, which may be nested within still other clusters. A typical example might be cities within counties, and counties within states.
#### Example: Nurses and stress
For our demonstration we'll use the nurses data set. Here we are interested in the effect of a training program (`treatment`) on stress levels (on a scale of 1-7) of nurses. In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital. For more information see the [appendix][Data].
```{r nurses_data, echo=1}
load('data/nurses.RData')
DT::datatable(nurses,
options = list(
dom = 'tp',
scrollX = T,
autoWidth = T
),
# columnDefs = list(list(width = '150px', targets = 1),
# list(width = '100px', targets = 3))),
rownames = F)
```
<br>
<br>
For the model, we examine effects of the treatment, as well as several other covariates, at least one each at the nurse, ward, and hospital levels. Again, when it comes to the fixed effects portion, you can simply think about that part as you would any standard regression, we just add covariates as theory/exploration would suggest. To incorporate this type of random effects structure is not too different from the cross-classified approach, but does have a slight change to the syntax.
```{r hierarchical, eval=1:2}
nurses_hierarchical = lmer(
stress ~
age
+ sex
+ experience
+ treatment
+ wardtype
+ hospsize
+ (1 | hospital)
+ (1 | hospital:ward),
data = nurses
)
# same thing!
nurses_hierarchical = lmer(
stress ~
age
+ sex
+ experience
+ treatment
+ wardtype
+ hospsize
+ (1 | hospital/ward),
data = nurses
)
summary(nurses_hierarchical, correlation = F)
```
```{r hierarchical_fixed, echo=FALSE}
nurses_hierarchical %>%
extract_fixed_effects() %>%
select(-(t:p_value)) %>%
kable_df()
```
<br>
As far as the fixed effects go, about the only thing that doesn't have a statistical effect is ward type[^signflip].
<br>
```{r hierarchical_random, echo=FALSE}
# note tidy doesn't work with multiple random effects and conf.int
hierarch_var_cor = extract_vc(nurses_hierarchical, ci_level = 0)
hierarch_var_cor %>%
kable_df()
```
Concerning the random effects, there appears to be quite a bit of variability from ward to ward especially, but also hospital. Recall that stress is a 7 point scale, so from ward to ward we can expect scores to bounce around about half a point on average, which is quite dramatic in my opinion. Again we inspect it visually.
```{r hierarchical_re_plot, echo=FALSE}
p = merTools::plotREsim(merTools::REsim(nurses_hierarchical)) +
theme_clean()
p
```
### Crossed vs. nested
The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different in the result of these two models?
```{r crossed_vs_nested, echo=1:3}
nurses_hierarchical = lmer(
stress ~
age
+ sex
+ experience
+ treatment
+ wardtype
+ hospsize
+ (1 | hospital)
+ (1 | hospital:wardid), #**#
data = nurses
)
nurses_crossed = lmer(
stress ~
age
+ sex
+ experience
+ treatment
+ wardtype
+ hospsize
+ (1 | hospital)
+ (1 | wardid), #**#
data = nurses
)
hierarch_var_cor %>%
kable_df()
crossed_var_cor = extract_vc(nurses_crossed, ci_level = 0)
crossed_var_cor %>%
kable_df()
```
Nothing? Good, you're not crazy. Here's a quote from the [lme4 text](http://lme4.r-forge.r-project.org/book/Ch2.pdf), section 2.2.1.1, which is definitely worth your time.
> The blurring of mixed-effects models with the concept of multiple,
hierarchical levels of variation results in an unwarranted emphasis on 'levels'
when defining a model and leads to considerable confusion. It is perfectly
legitimate to define models having random effects associated with non-nested
factors. The reasons for the emphasis on defining random effects with respect to
nested factors only are that such cases do occur frequently in practice, and that
some of the computational methods for estimating the parameters in the models
can only be easily applied to nested factors.
>
This is not the case for the methods used in the lme4 package. *Indeed there is
nothing special done for models with random effects for nested factors*. When
random effects are associated with multiple factors, exactly the same
computational methods are used whether the factors form a nested sequence or are
partially crossed or are completely crossed.
You might have noticed that we were using `wardid` rather than the `ward` grouping variable as in our first example. Even though every ward is unique, the `ward` column labels them with an arbitrary sequence starting with 1 within each hospital. While this might seem natural, ward 1 in hospital 1 is not the same as ward 1 in hospital 2, so it's probably not a good idea to give them the same label. The `wardid` column properly distinguishes the wards with unique values (e.g. 11, 12).
What would have happened had we used that variable as a crossed random effect?
```{r bad_cross, echo=1}
nurses_crossed_bad_data = lmer(
stress ~
age
+ sex
+ experience
+ treatment
+ wardtype
+ hospsize
+ (1|hospital)
+ (1|ward),
data = nurses
)
extract_vc(nurses_crossed_bad_data, ci_level = 0) %>%
kable_df()
```
This is certainly not the result we want. The variance in `ward` is already captured by treatment and type. However, as demonstrated, this can be avoided with the proper syntax, or proper labeling in the data to allow unique clusters to have unique identifiers.
For more, see this [discussion on Stack Exchange](https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified), as well as [this from the GLMM FAQ](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#nested-or-crossed) from one of the <span class="pack">lme4</span> developers. Josh Errickson at the University of Michigan's CSCAR group also has a nice [write-up with visual depiction](http://errickson.net/stats-notes/vizrandomeffects.html) of the underlying matrices of interest, which served as inspiration for some of the visualization in the next section.
So there you have it. When it comes to <span class="pack">lme4</span>, or mixed models more generally, crossed vs. nested is simply a state of mind (data)[^crossnest].
## Residual Structure
Sometimes we will want to obtain more specific estimates regarding the residual covariance/correlation structure. This is especially the case in the longitudinal setting, where we think that observations closer in time would be more strongly correlated than those further apart, or that the variance changes over time. What does this sort of model look like?
Let's begin by thinking about the covariance/correlation matrix for the entire set of observations for our target variable, and how we want to represent the dependency in those observations. I'll show a visualization of the first 5 people from our GPA data and modeling situation. Recall that each person has 6 observations. This display regards the results from our random intercepts (only) model for GPA.
```{r residual_varcov, echo=FALSE}
rescov <- function(model, data) {
var.d <- crossprod(getME(model,"Lambdat"))
Zt <- getME(model,"Zt")
vr <- sigma(model)^2
var.b <- vr*(t(Zt) %*% var.d %*% Zt)
sI <- vr * Diagonal(nrow(data))
var.y <- var.b + sI
invisible(var.y)
}
gpa_vis_cov = lmer(gpa ~ occasion + (1|student), data=gpa)
# summary(gpa_vis_cov)
rc1 <- rescov(gpa_vis_cov, gpa)
# image(rc1[1:60,1:60])
rc1[1:30, 1:30] %>%
as.matrix() %>%
reshape2::melt() %>%
mutate(value = factor(round(value, 5))) %>%
ggplot(aes(Var1, Var2, fill = value)) +
geom_tile(color = 'gray92', size = 1) +
scale_y_continuous(trans = 'reverse') +
scale_fill_manual(values = c('gray92', '#00aaff80', '#ff550080')) +
theme_void() +
theme(
legend.key = ggplot2::element_rect(fill = 'transparent', colour = NA),
legend.background = ggplot2::element_rect(fill = 'transparent', colour = NA),
panel.background = ggplot2::element_blank(),
panel.grid = ggplot2::element_blank(),
strip.background = ggplot2::element_blank(),
plot.background = ggplot2::element_rect(fill = "transparent", colour = NA)
)
# ggplotly()
# library(plotly)
# library(viridis)
# plot_ly(z =~ as.matrix(rc1[1:30, 1:30]), type='contour', colors = c('gray92','#00aaff80', '#ff550080')) %>%
# layout(yaxis = list(autorange = "reversed"))
```
Each block represents the covariance matrix pertaining to observations within an individual. Within the person, there are variances on the diagonal and covariances of observations on the off-diagonal. When considering the whole data, we can see that observations from one person have no covariance with another person (gray). Furthermore, the covariance within a person is a constant value, the variance is also a constant value. Where did those values come from? Let's look again at the result from our random intercept model.
```{r residual_varcov2, echo=F}
vc = extract_vc(gpa_vis_cov, ci_level = 0)
totvar = sum(vc$variance)
vc %>%
kable_df()
```
Remember that there are two sources of variance in this model, the residual observation level variance, and that pertaining to person. Combined they provide the total residual variance that we aren't already capturing with our model features (i.e. fixed effect variables). In this case, it's about `r round(totvar, 2)`, the value displayed on our diagonal. The off-diagonal is the variance attributable to student, which we alternately interpreted as an intraclass correlation (dividing by the total variance converts it to the correlation metric).
More generically, and referring to previous notation for our estimated variances, we can see the covariance matrix (for a cluster) as follows.
$$\Sigma =
\left[
\begin{array}{ccc}
\color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2\\
\tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 \\
\tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} \\
\end{array}\right]$$
<br>
This represents a covariance structure called *compound symmetry*, where correlations, the off-diagonal values, are constant. It is the default in most mixed model settings, and the same as what is shown visually above. Now let's start to think about other types of covariance structures.
Consider the following model for an individual and just three time points to keep things simpler to show.
$$\boldsymbol{y} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$$
So we have three observations of $y$ that are multivariate normally distributed. The mean $\mu$ is a function of covariates just like in standard regression.
$$\mu = b_0 + b_1\cdot \mathrm{time} + b_2\cdot x_1 ...$$
However, instead of just plopping an $\epsilon$ at the end, we want to go further in defining the entire residual variance/covariance structure for all three time points.
In the simplest setting of a standard linear regression model, we have constant variance and no covariance.
$$\Sigma =
\left[
\begin{array}{ccc}
\sigma^2 & 0 & 0 \\
0 & \sigma^2 & 0 \\
0 & 0 & \sigma^2 \\
\end{array}\right]$$
Next, we can relax the assumption of equal variances, and estimate each separately. In this case of *heterogeneous variances*, we might see more or less variance over time, for example.
$$\Sigma =
\left[
\begin{array}{ccc}
\sigma_1^2 & 0 & 0 \\
0 & \sigma_2^2 & 0 \\
0 & 0 & \sigma_3^2 \\
\end{array}\right]$$
Now let's say we actually want to get at the underlying covariance/correlation. I'll switch to the correlation representation, but you can still think of the variances as constant or separately estimated. So now we have something like this, where $\rho$ represents the residual correlation among observations.
$$\Sigma = \sigma^2
\left[
\begin{array}{ccc}
1 & \rho_1 & \rho_2 \\
\rho_1 & 1 & \rho_3 \\
\rho_2 & \rho_3 & 1 \\
\end{array}\right]$$
In this case we'd estimate a different correlation for all time point pairs (with constant variance). This is typically described as an *unstructured*, or simply 'symmetric', correlation structure.
If you are familiar with repeated measures ANOVA, which is a [special case of a mixed model](https://m-clark.github.io/docs/mixedModels/anovamixed.html), you may recall that the usual assumption is a *sphericity*. Sphericity is a relaxed form of *compound symmetry*, where all the correlations have the same value, i.e. $\rho_1=\rho_2=\rho_3$, and all variances are equal.
Another very commonly used correlation structure (for time-based settings) is an *autocorrelation* structure, of lag order one, for the residuals. What this means is that we assume the residuals at one time point apart correlate with some value $\rho$, observations at two time points apart correlate $\rho^2$, and so on. As such we only need to estimate $\rho$, while the rest are then automatically determined. Here's what it'd look like for four time points.
$$\Sigma = \sigma^2
\left[
\begin{array}{cccc}
1 & \rho & \rho^2 & \rho^3 \\
\rho & 1 & \rho & \rho^2 \\
\rho^2 & \rho & 1 & \rho \\
\rho^3 & \rho^2 & \rho & 1 \\
\end{array}\right]$$
If our model estimated $\rho$ to be .5, the resulting correlation matrix would look like the following within each cluster.
$$\Sigma = \sigma^2
\left[
\begin{array}{cccc}
1 & .5 & .25 & .06 \\
.5 & 1 & .5 & .25 \\
.25 & .5 & 1 & .5 \\
.06 & .25 & .5 & 1 \\
\end{array}\right]$$
Again, the main point is that points further apart in time are assumed to have less correlation.
Know that there are many patterns and possibilities to potentially consider, and that they are not limited to the repeated measures scenario. For example, the correlation could represent spatial structure, where units closer together geographically would be more correlated. And as noted, we could also have variances that are different at each time point[^residstruct]. We'll start with that for the next example.
### Heterogeneous variance
Unfortunately, <span class="pack">lme4</span> does not provide the ability to model the residual covariance structure, at least [not in a straightforward fashion](https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html), though many other mixed model packages do[^lmerho]. In fact, two packages that come with the basic R installation do so, <span class="pack">mgcv</span> and <span class="pack">nlme</span>. We'll demonstrate with the latter.
The <span class="pack">nlme</span> package will have a different random effect specification, though not *too* different. In addition, to estimate heterogeneous variances, we'll need to use an additional `weights` argument. The following will allow each time point of occasion to have a unique estimate.
```{r heterovar, echo=1:5, eval=-5}
library(nlme)
heterovar_res = lme(
gpa ~ occasion,
data = gpa,
random = ~ 1 | student,
weights = varIdent(form = ~ 1 | occasion)
)
summary(heterovar_res)
extract_fixed_effects(heterovar_res) %>%
kable_df()
vc = extract_vc(heterovar_res, ci_level = 0)
vc %>%
kable_df()
```
At this point we're getting the same stuff we're used to. Now the not-so fun part. For the values we're interested in for this example, i.e. the variances at each occasion, <span class="pack">nlme</span> does not make it easy on someone to understand initially, as the output regards the way things are for estimation, not for what one would usually have to report. The variances are scaled relative to the first variance estimate, which is actually the reported residual variance in the random effects part. Additionally the values are also on the standard deviation rather than variance scale. From the default output display, we can see that variance decreases over time in this case, but the actual values are not provided.
```{r heterovar_variances}
summary(heterovar_res$modelStruct)
```
Relative values are fine, I guess, but what we'd want are the actual estimates. Here's how you can get them using the residual standard deviation to scale those values, then square them to get on the variance scale.
```{r heterovar_extract_vars_from_nlmes_cold_dead_hands}
(c(1.0000000, coef(heterovar_res$modelStruct$varStruct, unconstrained = FALSE))*heterovar_res$sigma)^2
```
Yeah. You'll have to look this up every time you want to do it, or just make your own function that takes the model input. Here is how my function would extract the information.
```{r mixedup-extract_het_var}
mixedup::extract_het_var(heterovar_res, scale = 'var')
```
Another alternative to keep in mind is <span class="pack">glmmTMB</span>, which is provided by the primary developer of <span class="pack" style = "">lme4</span>. It would allow one to stay more in the <span class="pack">lme4</span> syntax style for similar models, and provide similar looking output. In this case we have a special syntax for the case of heterogeneous variances, specifically using tthe <span class="func" style = "">diag</span> function within the model formula.
```{r glmmTMB_hetero}
library(glmmTMB)
heterovar_res2 = glmmTMB(
gpa ~ occasion + (1|student) + diag(0 + occas |student),
data = gpa
)
summary(heterovar_res2)
```
Note that the variances displayed for each time point are not conflated with the residual variance. To compare with <span class="pack">nlme</span>, just add the residual variance to those estimates. As with every mixed model package (apparently), it still takes a bit to get the variances in a usable form from the <span class="func">VarCorr</span> object. The following shows how though, and compares to the <span class="pack">nlme</span> result.
```{r extract-het-var-glmmtmb, eval = FALSE}
vc_glmmtmb = VarCorr(heterovar_res2)
vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2
```
Here is the output from my function:
```{r mixedup-extract_het_var-tmb}
mixedup::extract_het_var(heterovar_res2, scale = 'var')
```
In any case, putting these together shows we get the same result.
```{r glmmTMB_hetero_compare_nlme, echo=F}
vc_glmmtmb = extract_het_var(heterovar_res2, scale = 'var')[-1] # remove group id
vc_nlme = extract_het_var(heterovar_res, scale = 'var')
names(vc_glmmtmb) = names(vc_nlme) =
unite(expand_grid(paste('year', 1:3), paste('sem', 1:2)), 'x', everything(), sep = ' ')$x
vc_compare = rbind(glmmTMB = vc_glmmtmb, nlme = vc_nlme)
vc_compare %>%
kable_df()
```
### Autocorrelation
The following example demonstrates the autocorrelation structure we described previously. In <span class="pack">nlme</span> we use the built-in <span class="func">corAR1</span> function and `correlation` argument similar to how we did with the `weights` argument for heterogeneous variances.
```{r corr_residual, echo=1:5, eval=-5}
library(nlme)
corr_res = lme(
gpa ~ occasion,
data = gpa,
random = ~ 1 | student,
correlation = corAR1(form = ~ occasion)
)
summary(corr_res)
extract_fixed_effects(corr_res) %>%
kable_df()
vc = extract_vc(corr_res, ci_level = 0)
vc %>%
kable_df()
```
<br>
Notice first that the fixed effect for occasion is the same as [before][Mixed model]. The variance estimates have changed slightly along with the variances of the fixed effects (i.e. the standard errors). The main thing is that we have a new parameter called `Phi` in the <span class="pack">nlme</span> output that represents our autocorrelation, with value of `r round(coef(corr_res$model$corStruct, unconstrained = F), 3)`. This suggests at least some correlation exists among the residuals for observations next to each other in time, though it diminishes quickly as observations grow further apart.
Note that glmmTMB will analyze such structure as well. However, we need the factor form for occasion for this specification `occas`, and also note how it is part of model formula, like another random effect. More on this in the [supplemental section][Correlation Structure Revisited].
```{r corr_residual_glmmTMB, eval=FALSE, echo=1:4}
corr_res_tmb = glmmTMB(
gpa ~ occasion + ar1(0 + occas | student) + (1 | student),
data = gpa
)
corr_res_brm = brms::brm(
gpa ~ occasion + ar(occasion, student) + (1 | student),
data = gpa,
cores = 4
)
summarise_model(corr_res)
summary(corr_res_tmb)
summarise_model(corr_res_brm)
extract_cor_structure(corr_res_tmb, which_cor = 'ar1')
extract_cor_structure(corr_res_brm)
extract_cor_structure(corr_res)
```
We can use my package function to extract the autocorrelation parameter from these types of models.
```{r extract-corr}
mixedup::extract_cor_structure(corr_res)
```
```{r glmmTMB_autocor, eval=FALSE, echo=FALSE}
# currently as of 2018-2 there is a bug in VarCorr at the end where it calls
# structure() when adding random intercept. Thus the model runs but one can't do
# anything with it because practically everything calls VarCorr; Inspection
# suggests the estimate (0.80877943) duplicates
# https://bbolker.github.io/mixedmodels-misc/notes/corr_braindump.html and brms
# gets .83
# update 2020-09 brms and nlme are notably closer with gpa data using brms new approach to ar structure, while glmmTMB estimates no student variance and correlation value almost double the others with the gpa data
simCor1 <- function(
phi = 0.8,
sdgrp = 2,
sdres = 1,
npergrp = 20,
ngrp = 20,
seed = NULL,
## set linkinv/simfun for GLMM sims
linkinv = identity,
simfun = identity
) {
if (!is.null(seed))
set.seed(seed)
cmat <- sdres * phi ^ abs(outer(0:(npergrp - 1), 0:(npergrp - 1), "-"))
errs <- MASS::mvrnorm(ngrp, mu = rep(0, npergrp), Sigma = cmat)
ranef <- rnorm(ngrp, mean = 0, sd = sdgrp)
d <- data.frame(f = rep(1:ngrp, each = npergrp))
eta <-
ranef[as.numeric(d$f)] + c(t(errs)) ## unpack errors by row
mu <- linkinv(eta)
d$y <- simfun(mu)
d$tt <- factor(rep(1:npergrp, ngrp))
return(d)
}
d <- simCor1(
phi = 0.8,
sdgrp = 2,
sdres = 1,
seed = 101
)
corr_res2 = glmmTMB(gpa ~ occasion + (1 | student) + ar1(0 + occasion | student),
data = gpa)
summary(corr_res2)
VarCorr(corr_res2, sigma = .16707)
corr_res2$obj$env$report(corr_res2$fit$parfull)$corr[[2]]
test = brm(
gpa ~ occasion + (1 | student),
data = gpa,
autocor = cor_ar(form = ~ occas),
cores = 4
)
(lme_simple_fit <- lme(
y ~ 1,
random = ~ 1 | f,
data = d,
correlation = corAR1()
))
glmmTMB_simple_fit <-
glmmTMB(y ~ 1 + (1 | f) + ar1(tt - 1 | f), data = d, family = gaussian)
glmmTMB_simple_fit
glmmTMB_simple_fit$obj$env$report(glmmTMB_simple_fit$fit$parfull)$corr[[2]][2, 1]
corr_res2$obj$env$report(corr_res2$fit$parfull)$corr[[2]][2, 1]
```
## Generalized Linear Mixed Models
Just as generalized linear models extend the standard linear model, we can generalize (linear) mixed models to *generalized linear mixed models*. Furthermore, there is nothing restricting us to only the exponential family, as other packages would potentially allow for many other target distributions.
For this example, we'll do a logistic mixed model with a binary outcome. In this case, we'll use the speed dating data set. In this demonstration, the experimental setting had randomly assigned each participant to ten short (four minutes) dates with other participants. For each date, each person rated six attributes (attractive, sincere, intelligent, fun, ambitious, shared interests) of the other person on a 10-point scale and wrote down whether he or she would like to see the other person again.
Our target variable is whether the participant would be willing to date the person again (`decision`, recorded as Yes or No). To keep things simple, the predictors will be limited to the sex of the participant (`sex`), whether the partner was of the same race (`samerace`), and three of the attribute ratings the participant gave of their partner- attractiveness (`attractive`), sincerity (`sincere`), and intelligence (`intelligent`). The latter have been scaled to have zero mean and standard deviation of .5, which puts them on a more even footing with the binary covariates (`_sc`)[^scalecontbin].
```{r speed_dating, echo=FALSE, eval=FALSE}
speed_dating0 = readr::read_csv('data/raw_data/ARM_Data/Speed Dating Data.csv')
speed_dating = speed_dating0 %>%
select(1:17, attr, sinc, intel, fun, amb, shar, dec) %>%
rename(
id_win_wave = id,
sex = gender,
partner_id = pid,
n_met_in_wave = round,
partner_age = age_o,
partner_race = race_o,
attractive = attr,
sincere = sinc,
intelligent = intel,
fun = fun,
ambitious = amb,
shared_interests = shar,
decision = dec
) %>%
mutate(
decision = factor(decision, labels = c('No', 'Yes')),
sex = factor(sex, labels = c('Female', 'Male')),
samerace = factor(samerace, labels = c('No', 'Yes')),
attractive_sc = scale(attractive, scale = .5)[, 1],
sincere_sc = scale(sincere, scale = .5)[, 1],
intelligent_sc = scale(intelligent, scale = .5)[, 1],
fun_sc = scale(fun, scale = .5)[, 1],
ambitious_sc = scale(ambitious, scale = .5)[, 1],
shared_interests_sc = scale(shared_interests, scale = .5)[, 1]
) %>%
group_by(iid) %>%
mutate(never_always = if_else(all(decision == 'Yes') |
all(decision == 'No'), 1, 0)) %>%
ungroup() %>%
filter(never_always == 0) %>% # as in Fahrmeier
select(-never_always)
# describeAll(speed_dating)
save(speed_dating, file = 'data/speed_dating.RData')
```
```{r glmm_init, eval=FALSE, echo=FALSE}
# pretty much dupes fahrmeier although their table has a typo, and their would be 500, not 390 individuals after getting rid of constant
# sd_model = glmer(decision ~ sex*attractive_sc + sex*shared_interests_sc
# + (1|iid), data=speed_dating, family=binomial)
load('data/speed_dating.RData')
sd_model = glmer(
decision ~
sex
+ samerace
+ attractive_sc
+ sincere_sc
+ intelligent_sc
+ (1 | iid),
data = speed_dating,
family = binomial
)
summary(sd_model, correlation = F)
glmm_var_cor = tidy(VarCorr(sd_model)) %>% # because for some reason knitr can't find an object it just used in the previous chunk.
select(-var2) %>%
rename(variance = vcov, sd = sdcor) %>%
mutate_if(is.numeric, arm::fround, digits = 2)
save(sd_model, glmm_var_cor, file = 'data/speed_dating_model.RData')
```
To fit the model, we'll use the <span class="func" style = "">glmer</span> function and specify the family as `binomial`, just as we would for the usual logistic setting using <span class="func" style = "">glm</span> in base R.
```{r glmm_speed_dating, eval=FALSE}
load('data/speed_dating.RData')
sd_model = glmer(
decision ~
sex
+ samerace
+ attractive_sc
+ sincere_sc
+ intelligent_sc
+ (1 | iid),
data = speed_dating,
family = binomial
)
summary(sd_model, correlation = FALSE)
```
```{r glmm_fixed, echo=FALSE}
load('data/speed_dating_model.RData')
extract_fixed_effects(sd_model) %>%
kable_df()
```
<br>
The fixed effects results are as expected for the attributes, with attractiveness being a very strong effect in particular, while sex of the participant was statistically negligible. You are free to exponentiate the coefficients to get the odds ratios if desired, just as you would with standard logistic regression. Now for the random effects result.
<br>
```{r glmm_random, echo=FALSE, eval=TRUE}
extract_vc(sd_model, ci_level = 0) %>%
select(-var_prop) %>%
kable_df()
```
<br>
For the variance components, notice that there is no residual variance. This is because we are not modeling with the normal distribution for the target variable, thus there is no $\sigma$ to estimate. However, the result suggests that there is quite a bit of variability from person to person on the logit scale.
## Exercises for Extensions
### Sociometric data
In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size.
```{r socio_setup, echo=FALSE, eval=FALSE}
soc = read_spss('data/raw_data/joop_hox_data2/9 CrossClass/SocsLong.sav')
glimpse(soc)
sociometric = soc %>%
mutate(sexsend = factor(sexsend, labels = c('Male', 'Female')), # from text 0 male, 1 female
sexrec = factor(sexrec, labels = c('Male', 'Female')))
save(sociometric, file='data/sociometric.RData')
```
To run a mixed model, we will have three sources of structure to consider:
- senders (within group)
- receivers (within group)
- group
First, load the sociometric data.
```{r load_socio}
load('data/sociometric.RData')
```
To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run.
- Model 1: No covariates, only sender and receiver random effects. Note that even though we don't add group yet, still use the nesting approach to specify the effects (e.g. `1|group:receiver`)
- Model 2: No covariates, add group random effect
- Model 3: Add all covariates: `agesend/rec`, `sexsend/rec`, and `grsize` (group size)
- Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model `sexsend:sexrec`.
- Compare models with AIC (see the note about [model comparison][model comparison]), e.g. `AIC(model_1, model_2, ...)`. A lower value would indicate the model is preferred.
```{r socio, echo=F, eval=FALSE}
model1 = lmer(rating ~ (1|group:sender) + (1|group:receiver),
data = sociometric)
summary(model1, correlation = FALSE)
model2 = lmer(rating ~ (1|group:sender) + (1|group:receiver) + (1|group),
data = sociometric)
summary(model2, correlation = F)
model3 = lmer(rating ~ sexsend + sexrec + agesend + agerec + grsize + (1|group:sender) + (1|group:receiver) + (1|group),
data = sociometric)
summary(model3, correlation = FALSE)
model4 = lmer(
rating ~ sexsend*sexrec + agesend + agerec + grsize +
(1|group:sender) + (1|group:receiver) + (1|group),
data = sociometric)
summary(model4, correlation = FALSE)
c(AIC(model1), AIC(model2), AIC(model3), AIC(model4))
```
### Patents
Do a Poisson mixed effect model using the [patent data][Data]. Model the number of citations (`ncit`) based on whether there was opposition (`opposition`) and if it was for the biotechnology/pharmaceutical industry (`biopharm`). Use year as a random effect to account for unspecified economic conditions.
```{r patent_setup, echo=FALSE, eval=FALSE}
patents0 = readr::read_tsv('data/raw_data/patent.raw')
patents = patents0 %>%
rename(opposition = opp)
save(patents, file='data/patents.RData')
glmer(ncit ~ opposition + biopharm + (1|year), data=patents, family='poisson')
```
```{r patent_starter, eval=FALSE}
load('data/patents.RData')
```
Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (`subject` in the data). In other words, no specific clustering or grouped structure is necessary, but we can use the random effect approach to get at the extra variance beyond what would be assumed with a Poisson distribution.
[^residstruct]: One reason to do so would be that you expect variability to decrease over time, e.g. due to experience. You might also allow that variance to be different due to some other grouping factor entirely (e.g. due to treatment group membership).
[^crossed_notation]: I don't show the formal model here as we did before, but this is why depicting mixed models solely as 'multilevel' becomes a bit problematic in my opinion. In the standard mixed model notation it's straightforward though, you just add an additional random effect term, just as we do in the actual model syntax.
[^signflip]: Setting aside our discussion to take a turn regarding regression modeling more generally, this is a good example of 'surprising' effects not being so surprising when you consider them more closely. Take a look at the effect of experience. More experience means less stress, this is probably not surprising. Now look at the age effect. It's positive! But wouldn't older nurses have more experience? What's going on here? When interpreting experience, it is with age *held constant*, thus more experience helps with lowering stress no matter what your age. With age, we're holding experience constant. If experience doesn't matter, being older is affiliated with more stress, which might be expected given the type of very busy and high pressure work often being done (the mean age is `r median(nurses$age)`). A good way to better understand this specifically is to look at predicted values when age is young, middle, and older vs. experience levels at low, middle, and high experience, possibly explicitly including the interaction of the two in the model. Also note that if you take experience out of the model, the age effect is negative, which is expected, as it captures experience also.
[^crossnest]: Just a reminder, it *does* matter if you label your data in a less than optimal fashion. For example, if in the nesting situation you start your id variable at 1 for each nested group, then you have to use the nested notation in <span class="pack">lme4</span>, otherwise, e.g. it won't know that id = 1 in group 1 is different from id 1 in group 2. In our hospital example, this would be akin to using `ward` instead of `wardid` as we did. Again though, this wouldn't be an issue if one practices good data habits. Note also the `:` syntax. In other modeling contexts in R this denotes an interaction, and that is no different here. In some contexts, typically due to experimental designs, one would want to explore random effects of the sort 1|A, 1|B and 1|A:B. However, this is relatively rare.
[^lmerho]: This feature request has been made by its users for over a decade at this point- it's not gonna happen. The issue is that the way <span class="pack">lmer</span> works by default employs a method that won't allow it (this is why it is faster and better performing than other packages). Unfortunately the common response to this issue is 'use <span class="pack">nlme</span>'. However many other packages work with <span class="pack">lme4</span> model results, but not <span class="pack">nlme</span>, and if you aren't going to use <span class="pack">lme4</span> for mixed models you might as well go Bayesian with <span class="pack">rstanarm</span> or <span class="pack">brms</span> instead of <span class="pack">nlme</span>. I would even prefer <span class="pack">mgcv</span> to <span class="pack">nlme</span> because of the other capabilities it provides, and the model objects created are easier to work with in my opinion.
[^scalecontbin]: Note that for a balanced binary variable, the mean `p=.5` and standard deviation is `sqrt(p*(1-p)) = .5`