publication--08--voting.Rmd

---
title: Graphing voting geography
bibliography: bibliography.bibtex
---

```{r init, echo = FALSE, message = FALSE}
library(knitr)
if (! is.null(current_input())) { # if knitr is being used
  knitr::read_chunk("settings.R")
} else {
  source("settings.R")
}
```

## Parameters

### Document rendering

```{r rendering_settings, echo = FALSE, warning=FALSE, message=FALSE}
```

### Analysis input/output

```{r io_settings}
```


## Introduction

Although metacoder has been designed for use with taxonomic data, any data that can be assigned to a heirarchy can be used. 
To demonstrate this, we have used metacoder to display the results of the 2016 Democratic primary election. 
The data used can be dowloaded here:

https://www.kaggle.com/benhamner/2016-us-election/

## Read in data

We will use the `readr` package to read in the data.

```{r}
library(readr)
file_path <- file.path(input_folder, "primary_results.csv")
data <- read_csv(file_path)
print(data)
```

## Add region and divisons columns

The data does not include the region or division of states/counties. 
Adding these will make the visulization more interesting. 

```{r}
divisions <- stack(list("New England" = c("Connecticut", "Maine", "Massachusetts",
                                          "New Hampshire", "Rhode Island", "Vermont"),
                        "Mid-Atlantic" = c("New Jersey", "New York", "Pennsylvania"),
                        "NE Central" = c("Illinois", "Indiana", "Michigan", "Ohio", "Wisconsin"),
                        "NW Central" = c("Iowa", "Kansas", "Minnesota", "Missouri",
                                         "Nebraska", "North Dakota", "South Dakota"),
                        "South Atlantic" = c("Delaware", "Florida", "Georgia", "Maryland",
                                             "North Carolina", "South Carolina", "Virginia",
                                             "Washington D.C.", "West Virginia"),
                        "SE Central" = c("Alabama", "Kentucky", "Mississippi", "Tennessee"),
                        "SW Central" = c("Arkansas", "Louisiana", "Oklahoma", "Texas"),
                        "Mountain" = c("Arizona", "Colorado", "Idaho", "Montana", "Nevada",
                                       "New Mexico", "Utah", "Wyoming"),
                        "Pacific" = c("Alaska", "California", "Hawaii", "Oregon", "Washington")))
data$division <- as.character(divisions$ind[match(data$state, divisions$values)])

regions <- stack(list("Northeast" = c("New England", "Mid-Atlantic"),
                      "Midwest" = c("NE Central", "NW Central"),
                      "South" = c("South Atlantic", "SE Central", "SW Central"),
                      "West" = c("Mountain", "Pacific")))
data$region <- as.character(regions$ind[match(data$division, regions$values)])

data$country <- "USA"

print(data)
```


## Create and parse classifications

The code below creates a single column in the data set that contains all of the levels of the geographic hierarchy for each location. 
It is then parsed using `parse_taxonomy_table` so that the other columns are preserved in the `taxmap` object.

```{r}
library(metacoder)
voting_data <- parse_tax_data(data, class_cols = c("country", "region", "division", "state", "county"))
print(voting_data)
```

## Get canidate vote counts

We have now need to sum the data for geographic region.

```{r}
voting_data <- mutate_obs(voting_data, data =  "place_data",
                          taxon_id = taxon_ids,
                          total_votes = unlist(obs_apply(voting_data, "tax_data", sum, value = "votes")),
                          clinton_votes = sapply(obs(voting_data, "tax_data"),
                                                 function(i) {
                                                   subset <- voting_data$data$tax_data[i, ]
                                                   sum(subset$votes[subset$candidate == "Hillary Clinton"])
                                                 }),
                          sanders_votes = sapply(obs(voting_data, "tax_data"),
                                                 function(i) {
                                                   subset <- voting_data$data$tax_data[i, ]
                                                   sum(subset$votes[subset$candidate == "Bernie Sanders"])
                                                 })
)
```

## Get top counties

I will get a list of the "taxon" IDs for the county in each state with the most votes.

```{r}
top_counties <- unlist(subtaxa_apply(voting_data, subset = n_supertaxa == 3, value = "votes",
                                     function(x) names(x[which.max(x)])))
```

## Plotting results

```{r}
voting_data %>%
  heat_tree(node_size = total_votes,
            node_size_range = c(0.0002, 0.06),
            node_color = (clinton_votes - sanders_votes) / total_votes * 100,
            edge_label = ifelse(taxon_ids %in% top_counties | n_supertaxa <= 3, taxon_names, ""),
            edge_label_size_trans = "area",
            edge_label_size_range = c(0.008, 0.025),
            node_color_range = c("#a6611a", "lightgray", "#018571"),
            node_color_interval = c(-50, 50),
            edge_color_range = c("#a6611a", "lightgray", "#018571"),
            edge_color_interval = c(-50, 50),
            node_color_axis_label = "Clinton               Sanders",
            node_size_axis_label = "Total votes",
            repel_labels = FALSE,
            output_file = result_path("voting"))
```

## Software and packages used

```{r}
sessionInfo()
```