-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path50_voting.Rmd
1330 lines (1167 loc) · 88.7 KB
/
50_voting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# District-Party Ideology and Primary Outcomes {#ch:voting}
$\renewcommand{\ind}[0]{\perp \!\!\! \perp}$
$\renewcommand{\doop}[1]{\mathit{do}\left(#1\right)}$
$\renewcommand{\diff}[1]{\, \mathrm{d}#1}$
$\renewcommand{\E}[1]{\mathbb{E}\left[#1\right]}$
$\renewcommand{\p}[1]{p\left(#1\right)}$
```{r stopper, eval = FALSE, cache = FALSE, include = FALSE}
knitr::knit_exit()
```
```{r knitr-05-1-voting, include = FALSE, cache = FALSE}
source(here::here("assets-bookdown", "knitr-helpers.R"))
```
```{r}
library("here")
library("magrittr")
library("tidyverse")
library("broom")
library("ggdag")
library("ggtext")
library("scales")
library("latex2exp")
library("patchwork")
library("ggforce")
```
How does district-party ideology matter for primary outcomes?
The strategic positioning dilemma theory predicts that candidates position themselves as a compromise between the district electorate and the partisan primary electorate.
They care about the primary electorate to fend off competition in primaries, either by deterring primary competitors from running at all or by being the "best fitting" candidate to represent the district-party constituency.
I find in Chapter \@ref(ch:positioning) that district-party ideology does affect candidate positioning, even after controlling for aggregate partisan voting in the district.
This chapter asks if this effort by candidates to position themselves toward their partisan constituency ultimately helps them win primary election outcomes.
Do more conservative districts nominate more conservative nominees?
I argue that this research question presents problems for statistical modeling and causal inference that come from the same basic limitation of the data: district-party ideology does not vary across candidates in the same district.
While it is possible to measure the correlation between district-party ideology and the CF score of the primary _winner_, this method selects on the dependent variable.
We already know from Chapter \@ref(ch:positioning) that candidates are generally more conservative when the electorate is more conservative, so the simple correlation between district-party ideology and nominee ideology does not capture whether more conservative electorates prefer more conservative candidates, _conditional on the set of available candidates_ in the primary.
Conditioning on a primary race, however, also conditions on district-party ideology, removing all statistical (and therefore causal) variation in district-party ideology across candidates in the same district.
To understand the role of district-party ideology in primary outcomes, we must reframe the research question around causal quantities that are actually identifiable.
I confront these statistical and causal challenges using an augmented conditional logit modeling approach.
Traditionally, conditional logit is a model that predicts discrete choice based on covariates that vary across _alternatives_ within a choice set (candidates), holding data on the chooser (the electorate) constant within the choice set.
This modeling limitation means that chooser-level features, such as district-party ideology, cannot directly affect candidate choice, but they can indirectly affect candidate choice.
I discuss these indirect effects below, and I devise a statistical model that flexibly estimates a related causal quantity: the causal effect of candidate ideology on primary outcomes, with heterogeneous effects that vary across primary electorates with different district-party ideologies.
Using this modeling approach, I find a noisy effect of candidate ideology on primary outcomes.
Primary candidates are less likely to win primary elections when their CF scores are especially centrist or especially extreme, but the estimates are not especially precise.
Furthermore, I find no evidence that the effect varies with district-party ideology across primary electorates.
Although candidates appear to position themselves strategically to fit the particular partisan constituency in their district (Chapter \@ref(ch:positioning)), I find no evidence that partisan constituencies reward these positioning maneuvers any differently as a function of public ideology.
## Spatial Voting and Candidate Choice
```{r utility-data}
chooser_data <-
tibble(
x = -2,
utility = 0,
math_label = "bar(theta)[g]",
label = "District-Party\nIdeology"
)
util_data <- tibble(
cand = seq(-10, 10, .1),
u_distance = cand - chooser_data$x,
utility = -u_distance^2
)
```
```{r utility-model}
ggplot(util_data) +
aes(x = cand, y = utility) +
geom_line(color = primary) +
geom_hline(yintercept = 0) +
geom_point(data = chooser_data, aes(x = x)) +
geom_text(
data = chooser_data, aes(x = x, label = math_label), parse =TRUE, vjust = 2
) +
geom_text(
data = chooser_data, aes(x = x, label = label), vjust = -0.5
) +
annotate(
geom = "text", label = "← Candidate position →",
x = 10, y = 0,
hjust = 1, vjust = 2
) +
annotate(
geom = "richtext",
label = glue::glue("<b style='color:{primary}'>Candidate utility</b> decreases<br>when candidate is farther<br>from group ideal point"),
x = 4, y = -60,
hjust = 1,
label.color = NA, fill = NA,
family = font_fam
) +
coord_cartesian(ylim = c(-100, 20)) +
theme_mgd_dag() +
labs(
x = NULL, y = NULL,
title = "Spatial Model of Candidate Choice"
)
```
How do the ideal points of candidates and electorates affect primary elections?
Spatial voting models argue that primary candidates are more likely to win the nomination when they position their candidacies closer in ideological space to the median primary voter [@downs:1957:economic-theory; @aldrich:1983:downsian-parties].
This is an essential mechanism underlying the strategic positioning dilemma theory, which states that a candidate must strike a balance between the median partisan voter and the district median voter to win both the primary and the general election [@burden:2001:polarizing-primaries; @brady-han-pope:2007:out-of-step].
This intuition appears to hold in general elections for U.S. House: candidates who are too progressive or too conservative perform worse than candidates who are "just right" [@canes-wrone-et-al:2002:out-of-step; @simas:2013:house-proximity].
Figure \@ref(fig:utility-model) plots the key claim of spatial voting models: a candidate is most appealing to a constituency when the candidate's ideological location (represented on a left–right ideological continuum) matches the constituency's preferred ideological outcome.
The candidate is less appealing, or provides less _utility_ (or "value") to the constituency, when the ideological distance between the candidate and the constituency grows larger.
This utility loss occurs whether the candidate is too progressive or too conservative.
```{r utility-model, include = TRUE, fig.scap = "Spatial proximity and candidate utility.", fig.cap = "A spatial voting model's description of candidate utility (value) as a function of candidate position and district-party ideology. Candidate value is maximized at the group ideal point $\\bar{\\theta}_{g}$ and decreases in either direction. The example in this plot assumes quadratic utility loss."}
```
One important shortcoming of existing primary elections research is the inability of empirical models to capture this "optimal positioning" in primaries.
Studies often measure the relationship between candidate "extremity" and their performance in primary elections—finding that more extreme candidates are more likely to win primary elections [@king-et-al:2016:twitter-primary] or that this effect is limited to extreme Republicans [@nielson-visalvanick:2017:primary-elections]—but extremity is allowed only a constant or monotonic effect on the candidate's primary performance [@hall-snyder:2015:ideology; @king-et-al:2016:twitter-primary; @nielson-visalvanick:2017:primary-elections].
Without the possibility of non-monotonicity in the extremity–victory relationship, these empirical models do not reflect their underlying theoretical models.
Furthermore, without a measure of the partisan constituency's ideal point (district-party ideology), these studies have no way to know whether the optimal candidate ideology is different in more conservative or more progressive electorates.
Because my project measures district-party ideology, I can estimate the optimal candidate ideology in different districts with different partisan constituency ideologies.
Another important limitation of many existing studies is that the factors affecting primary choice cannot be inferred from studying only incumbent members of congress or primary nominees, which is selecting on the dependent variable [e.g. @brady-han-pope:2007:out-of-step; @hirano-et-al:2010:primary-polarization; @mcghee-et-al:2014:nomination-systems; @kujala:2020:primary-donors].
Without somehow accounting for the menu of candidates that a primary electorate can choose from, we cannot infer whether candidates with certain ideological positions are actually preferred over candidates with other ideological positions.
The analysis in this chapter confronts this problem by modeling primary candidate choice using a conditional logit approach, similar to other recent studies of primary choice or multiparty elections [@alvarez-nagler:1998:clogit; @porter-treul:2020:primary-experience; @simas:2017:primary-electability; @ansolabehere-et-al:2004:direct-primary-party-loyalty].
As I discuss in Chapter \@ref(ch:arg), there are several reasons to doubt the explanatory power of a spatial model for House primary voting.
Few voters are likely to be aware of candidate positioning in contexts where the party label does not provide differentiating information between candidates [@norrander:1989:primary-voters].
Voters do respond to policy differences between candidates if they are made aware of those differences [@lelkes:2019:policy-over-party], but learning about the issue positions of House primary candidates is costly.
Voters may also strategically prefer more electable candidates even if those candidates are not closest to their ideal points, but even sophisticated primary voters may not be familiar with House candidates enough to know which candidates present the starkest ideology–electability trade-offs [@simas:2017:primary-electability].
Non-ideological traits like incumbency, an "outsider" reputation [@porter-treul:2020:primary-experience], early fundraising [@bonica:2020:lawyers-in-congress], gender [especially in Democratic races, @thomsen:2020:ideology-gender], or other "valence" features [@nyhuis:2018:separate-valences] may be easier for primary voters to detect and act upon than candidates' ideological stances.
It is also possible that candidate ideology's effect on primary elections is mainly a selection function, deterring moderate candidates from entering a primary at all, leaving only the ideologically faithful candidates who present less ideological contrast to voters [@thomsen:2014:moderate-candidates].
### Causal and statistical identifiability
This project is interested in understanding district-party ideology and how it shapes primary elections.
An essential constraint of this chapter's analysis is that the "effect of district-party ideology on primary election outcomes" is not a convenient causal quantity to work with.
This is because district-party ideology is constant across all candidates who compete in the same primary contest, therefore it has no direct effect on the probability that any one candidate wins.
It can only have indirect effects that interact with other characteristics of the candidates.
This section discusses these indirect effects, how they interact with the modeling constraints for primary election data, and how we define causal estimands under these constraints.
Consider a primary race $r$ containing $n_{r} > 1$ primary candidates, each candidate indexed $i$.
Let $y_{r} = i$ signify that candidate $i$ wins race $r$, with the probability that $i$ wins $r$ given by $\psi_{ir}$.
Choice settings such as this, where one chooser must select among several alternatives in the choice set, is traditionally modeled using a conditional logit likelihood [@mcfadden:1973:conditional-logit].
Conditional logit has been employed to study candidate choice in U.S. primaries by @ansolabehere-et-al:2004:direct-primary-party-loyalty, @culbert:2015:strategic-voting-presidential-primaries, @simas:2017:primary-electability, and @porter-treul:2020:primary-experience.
Conditional logit supposes that the chooser—in this case, the electorate for race $r$—selects a candidate $i$ by comparing the utility they receive from each candidate in the race.
Suppose that this utility $\omega_{ir}$ contains a systematic component $u_{ir}$ and a stochastic component $e_{ir}$.
\begin{align}
\omega_{ir} &= u_{ir} + e_{ir}
(\#eq:utility-error)
\end{align}
The probability that $i$ is chosen is defined as the probability that $\omega_{ir}$ is greatest among the alternatives in $r$.
Because the error term $e_{ir}$ is unknown and idiosyncratic to the chooser-choice pairing, conditional logit makes a distributional assumption for the error term and calculates the probability that $\omega_{ir}$ is greatest given knowledge of the systematic utility component only.
This probability is calculated as the softmax function of the systematic components in the choice set,
\begin{align}
\begin{split}
p\left(y_{r} = i\right) &= \psi_{ir} \\
&= \frac{\text{exp}\left(u_{ir}\right) }{\sum\limits_{i = 1}^{n_{r}}\text{exp}\left(u_{ir}\right)}
\end{split}
(\#eq:softmax-probability)
\end{align}
which follows from the assumption that $e_{ir}$ is distributed Gumbel, as in logistic regression.
The distinguishing feature of conditional logit is that _chooser_ attributes, additive utility shocks that are specific to the chooser, do not have identifiable effects on the choice probability because the chooser is fixed within a choice set.
As a result, researchers using conditional logit tend to model the choice problem as a function of the alternatives only.
In the case of primary elections, choosers are primary electorates, and district-party ideology is fixed for a given electorate.^[
District-party groups are not perfectly synonymous with primary electorates, since some constituents who belong to the district-party do not vote in the primary, and some primary voters may not identify with the party.
While this conceptual gap could be explored in future research projects, this project tolerates the inconsistency because the most recent evidence on the representativeness of primary electorates finds that they resemble the demographic profile and policy attitudes of the district-party public [@sides-et-al:2018:primary-representativeness].
This analysis contains more years of data and relies on fewer modeling assumptions than analyses that conclude that primary electorates are more polarized than district-parties. [@jacobson:2012:polarization-origins; @hill:2015:nominating-institution].
]
This means that district-party ideology, $\bar{\theta}_{g[r]}$ for group $g$ in which $r$ takes place, cannot _directly_ affect the probability that a candidate is chosen.
This is consistent with the spatial model intuition from Figure \@ref(fig:utility-model): shifting the district-party ideal point $\bar{\theta}_{g[r]}$ left or right affects utility only because it changes the distance between $\bar{\theta}_{g[r]}$ and the candidate location, so the interaction between district-party ideology and candidate location is key.
More generally, chooser-level features can be included in conditional logit models as long as there is some cross-level interaction with the choice-level data for statistical identification [@fox-et-al:2012:random-coef-logit].^[
I refer to an "interaction" generally as a function that depends on both district-party ideology and candidate location.
It does not necessarily imply a multiplicative "interaction term" that is more common in linear modeling, although multiplicative interaction terms are an example of such a function.
]
Building a statistical model that enables this interactivity is an important contribution of this research design.
This conditional logit model's identifiability constraint matters for causal inference as well, because it affects which causal quantities are feasible to estimate.
Consider the potential outcome $\omega_{ir}\left(\text{CF}_{ir}, \bar{\theta}_{g}\right)$, the candidate utility resulting from a given candidate ideology and district-party ideology.^[
For notational convenience, let $g$ imply $g[r]$.
]
Imagine that we intervene on district-party ideology and measure the average utility effect^[
For the current discussion, we consider the effect on utility instead of the effect on win probability.
This is because win probability is complicated by the presence of other candidates, whereas utility is a straightforward function of chooser and choice features.
It is important to understand the relationship between the causal model structure and the outcome scale because treatments can have different effects on different scales [@vanderweele:2009:interaction-modification].
I discuss causal effects on win probability in Section \@ref(sec:causal-probs).
]
of setting $\bar{\theta}_{g} = \theta$ versus some other value $\theta'$, $\E{\omega_{ir}\left(\text{CF}_{ir}, \theta\right) - \omega_{ir}\left(\text{CF}_{ir}, \theta'\right)}$.
This effect does exist for individual candidates: changing district-party ideology affects the primary electorate's candidate utility by increasing or decreasing the ideological distance between the district-party and the candidate.
As a result, the average effect of district-party ideology is an average over all of its interactive effects with candidate ideology.
But because conditional logit model does not provide an easy interface for modeling chooser-level effects directly, it is impractical to condition on other district-level characteristics to render district-party ideology ignorable.
It is much simpler, instead, to consider the average effect of candidate positioning on candidate utility.
Conditioning on candidate features is more straightforward with conditional logit, so causal identification of alternative-level effects is more analytically straightforward as well.
The conditional average effect of CF score on candidate utility would thus be
$\E{\omega_{ir}\left(\text{CF}, \theta\right) - \omega_{ir}\left(\text{CF}', \theta\right) \mid C_{ir} = c, r}$,
for a comparison of two values $\text{CF}$ and $\text{CF}'$, fixing the district-party ideology at $\theta$ and conditioning on other candidate-varying attributes $C_{ir} = c$ and the race $r$.^[
Conditioning on the race, which defines the choice set, is inherent to conditional logit.
Conditioning on the choice set is what makes undermines the identifiability chooser-level effects without cross-level interactions.
]
This effect is also an average over the interactive effects with district-party ideology, but conditioning on confounders is much easier.
Because this project is focused on the added value of my district-party ideology measure, I go one step further to model effect heterogeneity over district-party ideology instead of holding it constant.
Because identifying ignorable variation in $\bar{\theta}_{g}$ is a challenge in conditional logit, I approach this heterogeneity from an effect modification perspective.
This means that any heterogeneity in causal effects over district-party ideology is not causally attributed to district-party ideology.
Instead, it reflects only the causal effects of CF scores conditional on a given district-party ideology value [see @kam-trussler:2017:HTEs].
To clarify this point, I rewrite the potential outcome as $\omega_{ir}(\text{CF}_{ir})$, removing the causal effect of $\bar{\theta}_{g}$ from the notation.
Formally, we say that district-party ideology is an "indirect modifier" if the CF score effect ($\text{CF}$ versus $\text{CF}'$) varies across levels of district-party ideology ($\theta$ versus $\theta'$), conditional on stratum $c$ and race $r$ [@vanderweele-robins:2007:effect-modification].
In other words, the conditional average effect of candidate ideology is heterogeneous over district-party ideology if the following quantity is not zero:
\begin{align}
\E{\omega_{ir}\left(\text{CF}\right) - \omega_{ir}\left(\text{CF}'\right) \mid \bar{\theta}_{g} = \theta, c, r}
&-
\E{\omega_{ir}\left(\text{CF}\right) - \omega_{ir}\left(\text{CF}'\right) \mid \bar{\theta}_{g} = \theta', c, r}.
(\#eq:hte)
\end{align}
<!------- TO DO ---------
- or maybe we should just say the expectation is over i \in r?
instead of conditioning on R?
------------------------->
```{r choice-dag}
clogit_dag <-
dagify(
Y ~ CF + C + U,
CF ~ G + C,
G ~ U,
exposure = "G",
outcome = "Y",
coords = tribble(
~ name , ~ x , ~ y ,
"C" , 1 , 2 ,
"CF" , 1 , 1 ,
"G" , 0 , 1,
"U" , 0 , 0 ,
"Y" , 2 , 1
),
labels = c(
"G" = "bar(theta)[g]",
"CF" = "CF[ir]",
"C" = "C[ir]",
"Y" = "mu[ir]",
"U" = "U"
)
) %>%
tidy_dagitty() %>%
as_tibble() %>%
print()
```
```{r plot-choice-dag}
ggplot(clogit_dag) +
aes(x = x, y = y, xend = xend, yend = yend) +
geom_dag_edges(data_directed = filter(clogit_dag, name != "U")) +
geom_dag_edges(
data_directed = clogit_dag %>%
filter((name == "U" & to == "G")),
edge_color = "gray",
edge_linetype = 2
) +
geom_dag_edges_arc(
data = clogit_dag %>% filter(to == "Y" & name == "U"),
curvature = -0.3,
edge_linetype = 2,
edge_color = "gray"
) +
geom_dag_point(
data = filter(clogit_dag, name != "U"),
color = "gray80"
) +
geom_dag_node(
data = filter(clogit_dag, name == "U"),
internal_color = "gray",
color = "white"
) +
geom_dag_text(
aes(label = label),
parse = TRUE,
color = "black",
family = font_fam
) +
theme_mgd_dag() +
theme(legend.position = "none") +
labs(
x = NULL, y = NULL,
title = "How CF Score Affects Primary Victory",
subtitle = "Indirect modification by district-party ideology"
) +
NULL
```
Figure \@ref(fig:plot-choice-dag) plots a causal graph of the system under consideration.
The causal effect of candidate position $\text{CF}_{ir}$ on candidate utility $\omega_{ir}$ is unidentified without conditioning on pre-treatment candidate features $C_{ir}$.
District-party ideology is included as an indirect modifier of the CF score effect $\text{CF}_{ir} \rightarrow \omega_{ir}$, represented with the path $\bar{\theta}_{g} \rightarrow \text{CF}_{ir}$ and no direct path between $\bar{\theta}_{g}$ and $\omega_{ir}$ [@vanderweele-robins:2007:effect-modification].
<!------- TO DO ---------
- could add labels to describe the U path, modification, etc.
------------------------->
Because district-party ideology is included as an indirect modifier instead of as a joint treatment, back-door paths that connect district-party ideology and candidate utility through unobserved variables $U$ are allowed to exist without confounding the CF score effect or the effect modification interpretation [@vanderweele:2009:interaction-modification].
They do confound the causal effects of district-party ideology, however, which is why effect heterogeneity cannot be describes as the causal effect of district-party ideology.
```{r plot-choice-dag, include = TRUE, out.width = "60%", fig.height = 6, fig.width = 6, fig.scap = "Causal diagram of CF score effect on win probability.", fig.cap = "Causal diagram of CF score effect on win probability. District-party ideology is an indirect modifier because it has no direct effect on primary outcomes except through candidate proximity. Unobservables $U$ are uncontrolled, so the effect of district-party ideology is not identified. The CF score effect is identified conditioning on $C$ and district-party ideology."}
```
## Modeling Causal Heterogeneity with Continuous Interactions
This section describes a statistical model for primary candidate choice that achieves two key objectives.
First, the model is designed to capture the heterogeneous causal effect of candidate positioning, conditional on district-party ideology.
That is, the model contains appropriate interactions to include chooser-level attributes in the conditional choice model.
And second, the model contains the flexibility to capture non-monotonic effects of candidate positioning: utility losses for candidates that position themselves too far from the district-party ideal point in either ideological direction.
The model detailed below achieves these objectives using two tactics.
The first tactic: I model candidate utility using a linear combination of CF scores and district-party ideology.
This linear combination projects CF scores and district-party ideology into a common space that can be interpreted as an "ideological distance" between CF scores and district-party ideology, allowing candidate utility to increase or decrease as a function of the ideological distance metric.
The second tactic: The distance metric's effect on candidate utility is modeled with a spline function.
The spline function serves the dual purpose of capturing nonlinearities in candidate utility—an essential component of the spatial voting model—and preserving the interaction between chooser and choice data through those nonlinearities.
This strategy enables the effect of candidate positioning on candidate choice to be heterogeneous across candidates with different CF scores and heterogeneous across primary electorates with different district-party ideology values.
The conditional logit model begins by defining the probability that candidate $i$ is chosen in race $r$ as a softmax function of $u_{ir}$, the systematic component of a candidate's utility conditional on the choice set.
\begin{align}
\begin{split}
p\left(y_{r} = i\right) &= \psi_{ir} \\
\psi_{ir} &= \frac{\text{exp}\left(u_{ir}\right) }{\sum\limits_{i \in r}^{n_{r}}\text{exp}\left(u_{ir}\right)} \\
u_{ir} &= f\left(\text{CF}_{ir}, \bar{\theta}_{g[r]}\right) + \mathbf{c}_{ir}^{\intercal}\gamma
\end{split}
(\#eq:clogit-likelihood)
\end{align}
I use $f()$ to represent a flexible function of candidate $i$'s CF score and the district-party public ideology $\bar{\theta}_{g[r]}$ for group $g$ in which race $r$ is held.
I include a vector of candidate-level covariates $\mathbf{c}_{ir}$ with regression coefficients $\gamma$.
Causal inference requires the assumption that conditioning on candidate features renders CF scores ignorable among the candidates in $r$, conditioning also on all features of $r$.
I then construct $f()$ as a flexible spline function of CF scores and district-party ideal points.
Although CF scores and district-party ideology both represent ideal points, the two measures are not constructed in the same ideal point space, so calculating the absolute or squared distance between ideal points [e.g. @adams-et-al:2004:discounting-directional-voting] is not immediately possible.
To rectify this, I create a function that maps these two measures into a common space.
Let $\Delta_{ir}$ be a linear combination of $\text{CF}_{ir}$ and $\bar{\theta}_{g}$ with coefficients $\alpha$ and $\beta$,
\begin{align}
\begin{split}
\Delta_{ir} &= \alpha \text{CF}_{ir} + \beta\bar{\theta}_{g[r]} \\
\alpha^{2} + \beta^{2} &= 1
\end{split}
(\#eq:linear-combo)
\end{align}
which represents an assumption that CF scores and district-party ideology space are affine transformations of one another, similar to the way Aldrich–McKelvey scaling estimates an affine mapping between ideology spaces [@aldrich-mckelvey:1977:scaling; @hare-et-al:2015:bayes-aldrich-mckelvey].
Another way to interpret $\Delta_{ir}$ is that the common ideal point space is a weighted average of CF scores and district-party ideology, with weights that are estimated from the data.
The second line of \@ref(eq:linear-combo) restricts the coefficients to have a norm of $1$, which is an identifiability restriction on the location and scale of the $\Delta$ space that would otherwise be arbitrary.
The restriction implies a direct mapping between $\text{CF}$ space and $\bar{\theta}_{g}$ space, since $\beta$ is defined in terms of $\alpha$,
\begin{align}
\begin{split}
1 &= \alpha^{2} + \beta^{2} \\
\beta &= \pm\sqrt{(1 - \alpha^{2})}
\end{split}
\end{align}
which clarifies how the linear transformation is estimating essentially a scale factor between the two ideal point spaces, parameterized by $\alpha$ only.
Because $\Delta_{ir}$ is a linear transformation of CF scores and district-party ideology, it has the algebraic interpretation of a "distance measure" of the candidate's CF score and the district-party public ideology in the $\Delta$ space.
For convenience, I therefore refer to $\Delta_{ir}$ as "ideological distance."^[
It is important to note here that my use of "distance" refers more generally to vector spaces than it does to ideal point "differences."
The "difference" $(x - z)$ is a special case of the distance $\alpha x + \beta z$ where $\alpha = 1$ and $\beta = -1$.
For a linear regression of $y$ on $(\alpha x + \beta z)$, regression predictions for $y$ would be invariant to any nonzero combination of $\alpha$ and $\beta$ values.
So although I refer to $\Delta_{ir}$ as an ideal point "distance," it contains the same information as an ideal point "difference" up to an arbitrary rotation of the $\Delta$ space [e.g. @armstrong-et-al:2014-spatial-models, xv].
Restricting the rotation of $\Delta$—for example, by fixing $\beta < 0$—would improve the interpretation of $\Delta$ as an ideal point "difference," but it would make Bayesian estimation more difficult by introducing unnecessary boundaries and discontinuities in the posterior distribution over $\alpha$ and $\beta$.
For ease of estimation, I therefore leave the rotation of the $\Delta$ space unrestricted.
<!------- TO DO ---------
- some cite on identifiability in Bayesian models?
------------------------->
]
I then create a function that lets candidate utility be a nonlinear function of the ideal point distance $\Delta_{ir}$.
This captures the spatial voting intuition: shocks to either CF scores or district-party ideology change the ideal point distance $\Delta_{ir}$, which has a nonlinear effect on candidate utility depending on whether the shock moves the ideal point distance toward or away from the optimal distance.
I create this nonlinear effect using b-splines.
I construct a set of basis functions of $\Delta_{ir}$ using a degree-$3$ polynomial basis with $30$ knots across the range of $\Delta_{ir}$.^[
I restrict the range of possible $\Delta_{ir}$ to be equidistant from $0$ by centering CF scores and district-party ideology within the model so their respective minima and maxima are equidistant from zero.
This implies a separate $\Delta$ spaces for both parties, as CF scores and district-party ideology take different values for Republicans and Democrats.
It also means that the knot locations change for each combination of $\alpha$ and $\beta$ values.
]
Let $b_{k}(\Delta_{ir})$ be the $k$^th^ basis function out of $K$ total, each with a coefficient $\phi_{k}$.
The function $f()$ from \@ref(eq:clogit-likelihood) results then in a spline regression on $\Delta_{ir}$.
\begin{align}
\begin{split}
u_{ir} &= f\left(\text{CF}_{ir}, \bar{\theta}_{g[r]}\right) + \mathbf{c}_{ir}^{\intercal}\gamma \\
f\left(\text{CF}_{ir}, \bar{\theta}_{g[r]}\right) %_
&= \sum\limits_{k} b_{k}\left(\Delta_{ir}\right)\phi_{k}
\end{split}
(\#eq:spline-function)
\end{align}
The spline function ultimately is a sum of the weighted basis functions.
The spline enables a continuous interaction effect between CF scores and district-party ideology because the basis functions are nonlinear transformations of district-party ideology and CF scores.
Because the function is nonlinear, the chain rule ensures that the derivative of $u_{ir}$ with respect to CF scores (the instantaneous effect of CF scores) is a function that contains district-party ideology $\bar{\theta}_{g}$.
By specifying the interaction between chooser- and choice-level data in this way, I sidestep the identifiability limitation of a simpler conditional logit model, allowing the causal effect of CF score to vary in different electorates with different district-party ideologies.
Interacting two continuous variables through the spline function is much more flexible than a multiplicative interaction term between CF scores and district-party ideology, which would fail to capture both the utility optimum predicted by spatial voting models and any other non-constant interactions.
Creating the ideal point distance metric also has a generative interpretation that is superior to the multiplicative interaction, because the common ideal point metric is a more faithful representation of spatial voting models.
A multiplicative interaction has no comparable generative interpretation.
Although the interpretation of $\Delta$ as a common ideal point metric is algebraically sensible, a limitation of the approach is that the model does not very accurately identify which $\alpha$ and $\beta$ values create more plausible common spaces in terms of posterior probability.
This is because the spline regression is flexible enough to create sensible regression functions out of the many configurations of $\Delta$ space, so there is no need for the model to detect a single "correct" configuration.
If a particular draw of $\alpha$ and $\beta$ values "compress" the ideal point space in some way, the spline coefficients are able to "stretch" that space back out to fit the data.
As a result, the posterior distribution of spline functions is identified from the data even if its component parameters—$\alpha$, $\beta$, and spline weights $\phi_{k}$—are not strongly identifiable on their own.
This trade-off between global and local identifiability appears in other flexible modeling approaches such as neural networks [@mackay:1992:bayes-neural-net-backpropagation; @beck-et-al:2004:neural-net] and is naturally suited to a Bayesian framework because unidentified or over-paramaterized models pose no special problem for probabilistic inference [@jackman:2009:bayesian, 272].
In short, while the model sacrifices some interpretability to fit a flexible regression function, the trade-off is worth the ability to capture nonlinear patterns in spatial voting while avoiding specific assumptions about the form of the candidate utility function or the ideal point mappings.
### Data {#sec:vote-data}
```{r data-5}
mcmc_path <- file.path("data", "mcmc", "5-voting")
fits_raw <- here(mcmc_path, "local_vb-main.rds") %>%
read_rds() %>%
group_by(party, control_spec) %>%
print()
fits_data <- fits_raw %>%
select(data, stan_data) %>%
print()
```
```{r data-no-incumbents}
noinc_raw <- here(mcmc_path, "local_vb-no_incumbents.rds") %>%
read_rds() %>%
group_by(party, control_spec) %>%
print()
```
The data for this analysis are drawn primarily from two secondary sources, the Database on Interests, Money in Politics, and Elections [DIME, @bonica:2019:dime] and the Primary Timing Project [PTP, @boatright-et-al:2020:primary-timing-data].
Cases are organized at the candidate-contest level, with identifiers for each primary contest indexing political party $\times$ congressional district $\times$ election cycle.
Because primary candidates can run unopposed, I restrict the data to primary races containing at least two candidates.
I keep only primary races where the number of winning candidates equals $1$, which removes any election where the winner lacked a CF score estimate (so is missing from the DIME) or where primary outcomes are miscoded in the original data sources.
I also drop any primary race where the outcome was decided by a convention instead of an election (coded in the PTP).
Lastly, I remove all blanket and top-two primary races, which are not limited to candidates in a single party.
<!------- TO DO ---------
- how many?
------------------------->
```{r}
link_tab <-
here("data", "_model-output", "05-voting", "link-sum.rds") %>%
read_rds() %>%
rename(
matched = `1`,
unmatched = `NA`
) %>%
mutate(
total = matched + unmatched
) %>%
print()
```
The DIME database contains most of the essential data used for this analysis: CF scores and primary outcome indicators.
Primary outcomes for the 2016 election cycle were less thoroughly coded than the primary outcomes for 2012 and 2014 cycles, which led to lots of missing data.
Missing primary outcomes in the DIME were supplemented with primary outcome data from the PTP.
Matching the same candidacy across databases was not easy using candidate identifiers,^[
Candidate IDs in the DIME are regenerated with each vintage of the database, creating inconsistencies in the same candidate's IDs over time.
As a result, the DIME identifiers that were initially copied into the PTP do not match the DIME identifiers in more recent DIME vintages.
]
so I merge the databases using the probabilistic record-linkage algorithm developed by @enamorado-et-al:2018:record-linkage.
I link candidates by name, state, district number, election cycle, and political party.
This process matches `r filter(link_tab, cycle == 2016) %>% pull(p) %>% percent(accuracy = 1)` of candidacies in 2016 and
`r (sum(link_tab$matched) / sum(link_tab$total)) %>% percent(accuracy = 1)` of candidates in the entire dataset.
For candidacies where the DIME and the PTP disagree about the outcome of a primary race, I defer to the PTP because its narrower substantive focus on primary elections lends it more credibility.
<!------- TO DO ---------
- ???
------------------------->
Predictive data include dynamic CF scores for every candidate and district-party ideal points from the IRT model in Chapter \@ref(ch:model).
The conditional logit does not identify district-level shocks to candidate utility because these variables are fixed for all candidates in a primary race, so the choice of controls in $\mathbf{c}_{ir}$ differs sharply from the district Chapter \@ref(ch:positioning).
Instead of including district-level demographics, economic indicators, or political background characteristics such as the previous presidential vote in the district, $\mathbf{c}_{ir}$ contains candidate-level features that could affect their ideological positioning as well their likelihood of winning the primary.^[
Features of a party-group or congressional district can certainly affect candidate CF scores _on average_, which is the focus of Chapter \@ref(ch:positioning).
It is helpful to think about the conditional logit as using only the "residual" variation in CF scores and other candidate features after these average effects are controlled by conditioning on the choice set.
]
I include an indicator variable for female candidate, which is associated with greater progressivism and a slightly higher primary win probability at least among Democrats [@thomsen-swers:2017:women-run; @thomsen:2019:women-win; @thomsen:2020:ideology-gender].
I also include an indicator for incumbent candidates, who both have more moderate CF scores (seen in Chapter \@ref(ch:positioning)) and are more likely to win their primary reelections.
I include no additional indicators for challengers and open-seat candidates, since open-seat races only compare open-seat candidates to one another, and non-incumbency implies challenger status for any race containing an incumbent candidate.
The standard control specification includes one last covariate for the contribution amount that a candidate donates to their own campaign, which is logged and standardized.
This control is intended to block a back-door path from CF scores to primary victory through candidate wealth, which could affect both the candidate's ideological position and their win probability.
Although there are additional measures of a candidate's campaign fundraising and spending available in the DIME, I do not use these variables as controls to identify the CF score effect.
This is because previous research suggests that candidate ideology is more likely to influence a candidate's fundraising than vice-versa [@stone-simas:2010:candidate-valence; @barber-et-al:2016:ideological-donors; @thomsen-swers:2017:women-run].
The utility model underlying CF scores assumes that this is true _ex ante_, by modeling campaign contributions as a function of ideological affinity.
```{r clogit-n}
bind_rows(noinc_raw, fits_data) %>%
unnest(data) %>%
ungroup() %>%
mutate(
Party = ifelse(party == "D", "Democrats", "Republicans"),
party = NULL,
Subset = case_when(
control_spec == "main" ~ "Full data",
TRUE ~ "No incumbents"
),
control_spec = NULL
) %>%
group_by(Party, Subset) %>%
summarize(
`Primary Races` = comma(n_distinct(set)),
`Total Candidates` = comma(n())
) %>%
arrange(Subset) %>%
knitr::kable(
caption.short = "Number of primary races and primary candidates",
caption = "Number of primary races and primary candidates",
booktabs = TRUE
)
```
I estimate separate models for Republicans and Democrats because control variables may confound the treatment effect differently for each party.
For instance, gender is thought to have a greater impact in Democratic primaries than in Republican primaries [@thomsen-swers:2017:women-run; @thomsen:2019:women-win; @thomsen:2020:ideology-gender].
It also may be the case that causal effects vary across party, either because Republican or Democratic voters are not equally aware of candidate ideology or because district-party ideology has different modifying effects for Republicans and Democrats.
I also estimate the same model with the sample limited to primary contests with no incumbent present, a practice employed by earlier researchers to sidestep the overwhelming likelihood that incumbents win reelection [e.g. @porter-treul:2020:primary-experience].
Table \@ref(tab:clogit-n) displays the number of candidates and primary contests in each of these subsets of data.
```{r clogit-n, include = TRUE}
```
### Bayesian modeling, priors, and prior simulation
Like other models featured in this project, the Bayesian setup of this model provides several important benefits.
The most important benefit is regularization in the spline function.
Although the spline function is beneficial because it can fit many complex functions, complex models always run a risk of overfitting.
The trade-off between flexibility and overfitting is especially salient for modeling heterogeneous treatment effects because growing the number of possible comparisons will also grow the number of false positives if no additional methodological adjustments are made.
This concern has led researchers to use regularized estimators to detect heterogeneous effects, which introduce bias to shrink heterogeneities toward zero.
Bayesian additive regression trees, for example, model flexible interactions by regularizing the tree structure in favor of shorter trees and partial pooling of "leaf" estimates toward the mean of the data [@hill:2011:bart; @green-kern:2012:bart].
```{r spline-coef-prior}
plot_spline_coef_draws <- tibble(
raw = rnorm(10000),
eta = abs(1.5 * rt(10000, df = 3)),
phi = raw*eta,
) %>%
ggplot() +
aes(x = phi) +
geom_histogram(
boundary = 0, bins = 100, fill = primary, alpha = 0.7
) +
xlim(c(-10, 10)) +
labs(
x = TeX("Spline coefficient $\\phi_{k}$"),
y = "Count",
title = "Prior for Spline Coefficient",
subtitle = "Normal prior with T(3) scale"
)
```
```{r spline-prior}
# possible spline functions
n_coef_draws <- 10
num_knots <- 30
spline_degree <- 3
coef_draws <-
tibble(
k = 1:(num_knots + spline_degree)
) %>%
crossing(
rep = 1:n_coef_draws
) %>%
mutate(
eta = 1.5 * rt(n(), df = 3) %>% abs(),
phi_raw = rnorm(n()),
phi = phi_raw * eta
) %>%
select(rep, k, phi) %>%
pivot_wider(
names_from = "rep",
values_from = "phi",
) %>%
select(-k) %>%
print()
spline_data <- tibble(delta = seq(0, 1, length.out = 1000))
spline_data <- spline_data %$%
splines::bs(
delta,
df = num_knots + spline_degree,
degree = spline_degree,
intercept = TRUE
) %>%
(function(x) x %*% as.matrix(coef_draws)) %>%
as_tibble() %>%
set_names(~ str_glue("f_{.}")) %>%
bind_cols(spline_data, .) %>%
pivot_longer(
cols = starts_with("f_"),
names_to = "draw",
values_to = "spline"
) %>%
print()
plot_prior_spline_functions <-
ggplot(spline_data) +
aes(x = delta, y = spline) +
geom_line(
aes(group = draw),
color = primary
) +
geom_hline(yintercept = 0) +
coord_cartesian(ylim = c(-8, 8)) +
labs(
x = TeX("$\\Delta_{ir}$: Linear combination of CF score and $\\bar{\\theta}_{g}$"),
y = "Spline function",
title = "Prior Draws of Spline Function",
subtitle = str_glue("Prior simulations from {n_coef_draws} draws")
) +
scale_x_continuous(breaks = c(0, 1), labels = c("Min", "Max")) +
scale_y_continuous(breaks = seq(-6, 6, 3))
```
```{r plot-spline-priors}
plot_spline_coef_draws + plot_prior_spline_functions
```
I use a hierarchical prior for the spline coefficients to penalize the complexity of the spline function.
The prior for each basis function's coefficient $\phi_{k}$ has a Normal distribution,
\begin{align}
\phi_{k} &\sim \text{Normal}\left(0, \eta \right)
(\#eq:spline-marginal)
\end{align}
where $\eta$ is estimated from the data.
By estimating an adaptive prior distribution for the spline coefficients, coefficients are shrunk toward zero through partial pooling.
This prior is implemented in Stan as using a non-centered parameterization, which decomposes $\phi_{k}$ into a standard Normal variable $\tilde{\phi}_{k}$ and a scale factor $\eta$.
\begin{align}
\begin{split}
\phi_{k} &= \tilde{\phi}_{k}\eta \\ %_
\tilde{\phi}_{k} &\sim \text{Normal}\left(0, 1\right) %_
\end{split}
(\#eq:spline-shrinkage)
\end{align}
The non-centered parameterization stretches a standard Normal distribution to create a Normal distribution with a scale of $\eta$.
This parameterization is valuable for Bayesian estimation because it de-correlates random variables in the posterior distribution, creating an easier posterior geometry for estimation algorithms.
I give the scale factor $\eta$ a Half-$T$ prior with $3$ degrees of freedom and a scale of $1.5$,
\begin{align}
\eta &\sim \text{Half-T}\left(\nu = 3, \mu = 0, \sigma = 1.5\right)
(\#eq:scale-T)
\end{align}
which regularizes the scale value toward zero, but has a modestly flat tail to allow strong signals from the data to depart from the prior.
This Normal-T mixture is similar to a "horseshoe prior" [@carvalho-et-al:2010:horseshoe-prior; @piironen-vehtari:2017:horseshoe-hyperprior; @piironen-vehtari:2017:horseshoe-sparse-vs-reg], which is a popular prior for estimating sparse coefficients with regularization.^[
Note that "sparsity" in this context does not imply coefficients of exactly-zero as it does with non-Bayesian L1 regularization [@tibshirani:1996:lasso; @ratkovic-tingley:2017:sparse-lasso-plus].
Sparse priors may result in posterior _modes_ at zero, but posterior intervals will contain non-zero values [@park-casella:2008:bayesian-lasso].
]
Unlike the horseshoe, which uses a half-Cauchy scale, the Half-T scale places lower probability on extremely large coefficients but doesn't regularize as strongly as a Half-Normal prior.
The left-side panel in Figure \@ref(fig:plot-spline-priors) plots a histogram of simulated coefficient draws from this prior, which features a spike at zero and flatter tails than a Normal-Normal mixture.^[
The tails are long enough that many draws actually fall far outside the region plotted in the figure.
These values are much rarer than the values contained in the plotted region, but they are much more probable than they would be under, for example, a Normal-Normal prior.
]
<!------- TO DO ---------
- plot a prior of the spline function???
- without a prior, this can oscillates to \pm infinity
- order-4 (3 degree) basis splines, 30 knots, N-C(0, 1) prior on 30 coefs
------------------------->
```{r plot-spline-priors, include = TRUE, fig.width = 9, fig.height = 5, out.width = "100%", fig.scap = "Prior draws of spline coefficient and spline function.", fig.cap = "Prior draws of spline coefficient and spline function. Left: histogram of prior draws for an individual spline coefficient. Right: draws from the implied prior over spline functions."}
```
The right panel of Figure \@ref(fig:plot-spline-priors) shows `r n_coef_draws` prior predictive draws of the spline function, resulting from `r n_coef_draws` coefficient vectors drawn from the hierarchical prior.
There are a few important details to note about the construction of this prior.
First, most of the "peaks" of the spline function are in a neighborhood near zero, especially within the $(-3, 3)$ interval.
Although at first this sounds like a very narrow prior, it is important to remember that the spline function is defined on the utility (logit) scale, where small changes in utility can have large, nonlinear effects.
For context, a coefficient of $3$ on the logit scale would increase the success probability from $.5$ to `r plogis(3) %>% round(2)` in a two-candidate choice set, which is a larger effect than almost anything that occurs regularly in elections.
Furthermore, a preference for a spline functions near zero is essential for regularization, so this amount of prior information is appropriate for controlling the spline fit.
At the same time, there are several peaks that decisively escape the $(-3, 3)$ neighborhood.
These larger peaks reflect the flatter-tailed $T$ prior on $\eta$, allowing larger coefficients.
The shape of the $T$ tail retains enough flexibility to detect a spike in utility even if the center of the prior concentrates spline functions near zero.
This plot also shows that 30 knots are more than enough flexibility to capture a utility spike along $\Delta$ space.
For the remaining coefficients $\gamma$, I specify a weakly informative prior,
\begin{align}
\gamma &\sim \text{Normal}\left(0, 5\right)
(\#eq:clogit-wt-priors)
\end{align}
which rules out explosive coefficient values while still allowing candidate attributes like incumbency to exhibit large correlations with candidate utility.
For causal inference, it is important not to regularize confounding effects too much to avoid re-introducing bias into treatment effect estimates [@hahn-et-al:2018:regularization-confounding; @hahn-et-al:2020:bayesian-causal-forests].^[
For high dimensional problems where regularization cannot be avoided, recent work recommends separate treatment and response models [@hahn-et-al:2018:regularization-confounding; @hahn-et-al:2020:bayesian-causal-forests] with a split-sample approach [@ratkovic:2019:rehabilitating-regression].
]
```{r a-b-data}
unit_params <- tibble(
a_raw = rnorm(10000),
b_raw = rnorm(10000)
) %>%
mutate(
across(
.cols = c(a_raw, b_raw),
.fns = list(id = ~ . / sqrt(a_raw^2 + b_raw^2))
)
) %>%
rename(
`Constrained α` = a_raw_id,
`Constrained β` = b_raw_id
)
```
```{r a-b-priors}
unit_params %>%
pivot_longer(
cols = contains("Constrained"),
names_to = "param",
values_to = "value"
) %>%
ggplot() +
aes(x = value) +
facet_wrap(
~ param,
scales = "free"
) +
geom_histogram(alpha = 0.7, fill = primary) +
labs(
x = "Prior value", y = NULL,
title = "Prior Draws for Ideal Point Distance Coefficients"
) +
ggeasy::easy_remove_y_axis()
```
Because $\alpha$ and $\beta$ are constrained to have a norm of $1$, their values fall on the unit circle.
I give these parameters a joint prior that is flat along the unit circle.
Stan implements this prior automatically by drawing unnormalized parameters $\tilde{\alpha}$ and $\tilde{\beta}$ from independent standard Normal distributions and then dividing by their norm,
\begin{align}
\begin{split}
\tilde{\alpha}, \tilde{\beta} &\sim \text{Normal}\left(0, 1\right) \\
\alpha = \sqrt{\tilde{\alpha}^2 + \tilde{\beta}^2} \\
\beta = \sqrt{\tilde{\alpha}^2 + \tilde{\beta}^2}
\end{split}
\end{align}
which creates a flat density over the unit circle.^[
Technically, this transformation is undefined if the norm is exactly zero, which realistically never happens.
]
The marginal densities for $\alpha$ and $\beta$, shown in Figure \@ref(fig:a-b-priors) are not exactly flat due to the nonlinear transformation from Cartesian coordinates to polar coordinates.
```{r a-b-priors, include = TRUE, out.width = "100%", fig.width = 9, fig.height = 4, fig.scap = "Prior draws for ideal point distance coefficients.", fig.cap = "Prior draws of coefficients that map CF cores and district-party ideology to the common ideal point distance metric $\\Delta$. These priors create a flat prior on unit circle coordinates, even though the marginal priors are not flat."}
```
## Findings {#sec:vote-findings}
```{r tidy-5}
fits <- fits_raw %>%
transmute(
tidy_fit = map(
.x = vb_fit,
.f = tidy,
conf.int = TRUE,
conf.level = 0.9
)
) %>%
print()
```
```{r draws-5}
vb_draws <- fits_raw %>%
transmute(
draws = map(
.x = vb_fit,
.f = rstan::extract
)
) %>%
print()
```
```{r spline-coefs-data}
main_coefs <- fits %>%
unnest(tidy_fit) %>%
ungroup() %>%
filter(
str_detect(term, "_post") == FALSE,
str_detect(term, "wt") | str_detect(term, "spline_scale")
) %>%
mutate(
index = parse_number(term),
term_label = case_when(
term == "wt[1]" ~ "Female",
term == "wt[2]" ~ "Incumbent",
term == "wt[3]" ~ "Log self-contribs (std.)",
str_detect(term, "wt_spline") ~
str_glue("Basis {index}") %>% as.character(),
term == "spline_scale" ~ "Spline scale"
),
prefix = case_when(
str_detect(term, "spline") ~ "Spline Parameters",
str_detect(term, "wt") ~ "Regression Coefs",
TRUE ~ "Aux"
)
) %>%
filter(str_detect(term, "linkers") == FALSE) %>%
mutate(
term_label = fct_reorder(term_label, index) %>% fct_rev(),
party_name = ifelse(party == "D", "Democrats", "Republicans")
) %>%
print()
```
```{r plot-spline-coefs}
ggplot(main_coefs) +
aes(
x = term_label,
y = estimate,
color = party_name
) +
geom_hline(yintercept = 0) +
geom_pointrange(
aes(ymin = conf.low, ymax = conf.high, shape = party_name),
position = position_dodge(width = -0.25),
fill = "white"
) +
facet_wrap(~ prefix, scales = "free") +
coord_flip() +
scale_color_manual(values = party_colors) +
scale_shape_manual(values = c("Democrats" = 16, "Republicans" = 22)) +
labs(
x = NULL,
y = "Posterior parameter value",
color = NULL,
shape = NULL,
title = "Conditional Logit Parameters",
subtitle = "Fullrank variational estimations"
) +
theme(
legend.position = c(0.25, 0.2),
legend.background = element_rect(fill = alpha("white", 0.9)),
plot.title.position = "plot"
)
```
```{r linker-plot}
vb_draws %>%
transmute(
link_draws = map(
.x = draws,
.f = ~ {
.x$linkers %>%
as_tibble(.name_repair = "unique") %>%
set_names(c("alpha", "beta"))
}
)
) %>%
unnest(link_draws) %>%
ggplot() +
aes(x = alpha, y = beta, color = party) +
geom_hline(yintercept = 0, color = "black") +
geom_vline(xintercept = 0, color = "black") +
geom_jitter(width = .1, height = .1, shape = 16, alpha = 0.5) +
facet_wrap(
~ party,
labeller = as_labeller(c("D" = "Democrats", "R" = "Republicans"))
) +
coord_fixed() +
scale_color_manual(values = party_code_colors) +
labs(
title = "Coefficients for Ideal Point Distance (Δ)",
subtitle = "Samples (jittered) from variational posterior",
x = "CF score weight (α)",
y = "District-party ideology\nweight (β)"
) +
theme(
legend.position = "none",
plot.title.position = "plot"
)
```
All models were estimated using Stan's full-rank variational inference algorithm, which approximates the posterior distribution as a collection of Normal distributions with a full-rank covariance matrix [@kucukelbir:2015:ADVI].
The main discussion of results focuses on the models estimated using the full datasets.
I briefly review the key trends among non-incumbent races in Section \@ref(sec:no-incumbents).
To facilitate the interpretation of the spline model, I first show Figure \@ref(fig:linker-plot), which contains posterior samples the coefficients that map CF scores and district-party ideology into the common ideal point distance measure $\Delta$.
Because I identify the latent $\Delta$ space by constraining these coefficients to have a norm of $1$, all pairs of parameters fall on the unit circle.
Points are jittered in the plot to convey which values have greater posterior probability.
As mentioned above, many possible ideal point mappings can be rationalized as part of the spline function, so the posterior distribution to does not concentrate very tightly around particular combinations of $\alpha$ and $\beta$ values.
This results in posterior samples that cover all four quadrants of the unit circle.
This is not concerning, however, because the common ideal point space is created to facilitate heterogeneous effects, not to be interpreted directly.
```{r linker-plot, include = TRUE, out.width = "100%", fig.height = 5, fig.width = 9, fig.scap = "Posterior draws of linear mapping parameters.", fig.cap = "Posterior draws of parameters that map CF scores and district-party ideology into $\\Delta$ space. Points all fall on the unit circle but are slightly jittered to convey posterior density."}
```
Coefficients from the candidate utility model are presented in Figure \@ref(fig:plot-spline-coefs).
The left panel shows regression coefficients for control variables in $\mathbf{c}_{ir}$: gender, incumbency, and candidate self-fundraising.
These coefficients find that gender is positively related to candidate utility among Democrats more than Republicans, a finding that reflects recent evidence from @thomsen:2019:women-win.
Unsurprisingly, incumbency has a strong, positive relationship to candidate utility in both parties.
Candidate self-fundraising does not strongly relate to candidate utility in either party.
This could be because heavier self-funders reflect a mixture of wealthy candidates, who may be advantaged because of their connections to other wealthy funders, and down-on-their luck candidates who rely more heavily on self-fundraising to make up for meager fundraising receipts elsewhere.
The right panel shows all spline basis function coefficients and the scale parameter in the smoothing prior for the spline coefficients.
Most spline coefficients have posterior point estimates near zero, which is the intended result of the regularizing prior on the coefficients.
A few coefficients do depart from the prior, an initial indication that the spline regression detects a smooth function with a small number of "wiggles" rather than a highly variable function with many local peaks and troughs.
```{r plot-spline-coefs, include = TRUE, out.width = "100%", fig.height = 8, fig.width = 9, fig.scap = "Posterior parameters from conditional logit.", fig.cap = "Posterior parameters from conditional logit. Points and intervals are variational point estimates and 90 percent quantile intervals from approximate posterior. Left panel shows regression weights for covariates. Right panel shows basis function coefficients and hierarchical scale parameter. There are greater than 30 spline coefficients because higher spline degrees create additional basis functions."}
```
```{r spline-plot}
spline_means <- fits %>%
left_join(fits_data) %>%
mutate(
spline_means = map2(
.x = tidy_fit,
.y = stan_data,
.f = ~ {
rhs <- tibble(
CF = .y$CF,
i = .y$i
)
means <- .x %>%
filter(
str_detect(term, "spline_mean_post") |
str_detect(term, "spline_lower_post") |
str_detect(term, "spline_upper_post")
) %>%
mutate(i = parse_number(term))
left_join(means, rhs, by = "i")
}
)
) %>%
select(spline_means) %>%
unnest(spline_means) %>%
print()
```
```{r plot-spline-posterior}
spline_means %>%
filter(str_detect(term, "mean")) %>%
ggplot() +
aes(x = CF, y = estimate, color = party) +
geom_vline(xintercept = 0, color = "gray") +
geom_hline(yintercept = 0, color = "black") +
geom_ribbon(
aes(ymin = conf.low, ymax = conf.high, fill = party),
color = NA,
alpha = 0.3,
show.legend = FALSE
) +
geom_line(size = 1, show.legend = FALSE) +
geom_line(
data = filter(spline_means, str_detect(term, "mean") == FALSE),
aes(linetype = str_detect(term, "lower")),
color = "black"
) +
geom_rug(aes(y = NULL), alpha = 0.2, show.legend = FALSE) +
scale_color_manual(values = party_code_colors) +
scale_fill_manual(values = party_code_colors) +
scale_linetype_manual(
values = c(2, 3),
labels = c("TRUE" = "1 sd below mean", "FALSE" = "1 sd above mean")
) +
facet_wrap(
~ party,
scales = 'free_x',
labeller = as_labeller(c("D" = "Democrats", "R" = "Republicans"))
) +
labs(
title = "How CF Score Affects Candidate Utility",
subtitle = "Negligible interaction with district-party ideology",
x = "Candidate CF Score",
y = "Spline function of ideal point distance",
linetype = TeX("$\\bar{\\theta}_{g}$ value")
) +
theme(
legend.position = c(.9, .8),
legend.background = element_rect(fill = alpha("white", 0.5))
) +
# coord_cartesian(xlim = c(-5.5, 5.5))
NULL
```
The key finding from the conditional logit model is the spline function plotted in Figure \@ref(fig:plot-spline-posterior).
The spline is a function of the common ideal point metric $\Delta$, which means the spline is a function of both CF scores and district-party ideology.
This means that the shape of the spline function comes two signals in the data.
First, which $\Delta$ values are related to candidate utility, and second, what combination of CF scores and district-party ideology (in terms of $\alpha$ and $\beta$ values) more strongly affect CF scores.
I show candidate CF scores along the horizontal axis, and spline functions holding district-party ideology fixed at different values are plotted on the vertical axis.
Solid lines show the spline function conditioned on _average_ district-party ideology in each party, while dashed and dotted lines condition the spline function on district-party ideology values one standard deviation above and below the mean.
The shaded region shows the 90% posterior interval for the spline function conditioning on the average district-party ideology, calculated from samples from the variational posterior.^[
It is worth noting the value of Bayesian computation for generating uncertainty intervals for a complex function such as this.