-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathpublication--08--voting.Rmd
146 lines (113 loc) · 5.45 KB
/
publication--08--voting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: Graphing voting geography
bibliography: bibliography.bibtex
---
```{r init, echo = FALSE, message = FALSE}
library(knitr)
if (! is.null(current_input())) { # if knitr is being used
knitr::read_chunk("settings.R")
} else {
source("settings.R")
}
```
## Parameters
### Document rendering
```{r rendering_settings, echo = FALSE, warning=FALSE, message=FALSE}
```
### Analysis input/output
```{r io_settings}
```
## Introduction
Although metacoder has been designed for use with taxonomic data, any data that can be assigned to a heirarchy can be used.
To demonstrate this, we have used metacoder to display the results of the 2016 Democratic primary election.
The data used can be dowloaded here:
https://www.kaggle.com/benhamner/2016-us-election/
## Read in data
We will use the `readr` package to read in the data.
```{r}
library(readr)
file_path <- file.path(input_folder, "primary_results.csv")
data <- read_csv(file_path)
print(data)
```
## Add region and divisons columns
The data does not include the region or division of states/counties.
Adding these will make the visulization more interesting.
```{r}
divisions <- stack(list("New England" = c("Connecticut", "Maine", "Massachusetts",
"New Hampshire", "Rhode Island", "Vermont"),
"Mid-Atlantic" = c("New Jersey", "New York", "Pennsylvania"),
"NE Central" = c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"),
"NW Central" = c("Iowa", "Kansas", "Minnesota", "Missouri",
"Nebraska", "North Dakota", "South Dakota"),
"South Atlantic" = c("Delaware", "Florida", "Georgia", "Maryland",
"North Carolina", "South Carolina", "Virginia",
"Washington D.C.", "West Virginia"),
"SE Central" = c("Alabama", "Kentucky", "Mississippi", "Tennessee"),
"SW Central" = c("Arkansas", "Louisiana", "Oklahoma", "Texas"),
"Mountain" = c("Arizona", "Colorado", "Idaho", "Montana", "Nevada",
"New Mexico", "Utah", "Wyoming"),
"Pacific" = c("Alaska", "California", "Hawaii", "Oregon", "Washington")))
data$division <- as.character(divisions$ind[match(data$state, divisions$values)])
regions <- stack(list("Northeast" = c("New England", "Mid-Atlantic"),
"Midwest" = c("NE Central", "NW Central"),
"South" = c("South Atlantic", "SE Central", "SW Central"),
"West" = c("Mountain", "Pacific")))
data$region <- as.character(regions$ind[match(data$division, regions$values)])
data$country <- "USA"
print(data)
```
## Create and parse classifications
The code below creates a single column in the data set that contains all of the levels of the geographic hierarchy for each location.
It is then parsed using `parse_taxonomy_table` so that the other columns are preserved in the `taxmap` object.
```{r}
library(metacoder)
voting_data <- parse_tax_data(data, class_cols = c("country", "region", "division", "state", "county"))
print(voting_data)
```
## Get canidate vote counts
We have now need to sum the data for geographic region.
```{r}
voting_data <- mutate_obs(voting_data, data = "place_data",
taxon_id = taxon_ids,
total_votes = unlist(obs_apply(voting_data, "tax_data", sum, value = "votes")),
clinton_votes = sapply(obs(voting_data, "tax_data"),
function(i) {
subset <- voting_data$data$tax_data[i, ]
sum(subset$votes[subset$candidate == "Hillary Clinton"])
}),
sanders_votes = sapply(obs(voting_data, "tax_data"),
function(i) {
subset <- voting_data$data$tax_data[i, ]
sum(subset$votes[subset$candidate == "Bernie Sanders"])
})
)
```
## Get top counties
I will get a list of the "taxon" IDs for the county in each state with the most votes.
```{r}
top_counties <- unlist(subtaxa_apply(voting_data, subset = n_supertaxa == 3, value = "votes",
function(x) names(x[which.max(x)])))
```
## Plotting results
```{r}
voting_data %>%
heat_tree(node_size = total_votes,
node_size_range = c(0.0002, 0.06),
node_color = (clinton_votes - sanders_votes) / total_votes * 100,
edge_label = ifelse(taxon_ids %in% top_counties | n_supertaxa <= 3, taxon_names, ""),
edge_label_size_trans = "area",
edge_label_size_range = c(0.008, 0.025),
node_color_range = c("#a6611a", "lightgray", "#018571"),
node_color_interval = c(-50, 50),
edge_color_range = c("#a6611a", "lightgray", "#018571"),
edge_color_interval = c(-50, 50),
node_color_axis_label = "Clinton Sanders",
node_size_axis_label = "Total votes",
repel_labels = FALSE,
output_file = result_path("voting"))
```
## Software and packages used
```{r}
sessionInfo()
```