This (living) document tracks our project's goals, in terms of the research questions we plan to investigate. We anticipate continuing to update (add content to, as well as restructure as needed) the document as we identify additional interesting questions. To suggest questions, please raise them on the #election-transparency Slack channel within d4d. Or submit diffs to this document via a PR.
We are initially focused on three questions (in rough priority order, based upon availability of data):
- Examine factors that explain county-level Presidential election results in 2016, and identify counties exhibiting anomalies or unexplained variation.
- Examine ways in which the 2016 election differed from recent (or maybe even not-so-recent) elections, taking socio-economic and demographic characteristics of the electorate into account.
- Examine the significance and impact of redistricting on election outcomes.
Our data currently include, for every county in the United States:
- Vote counts for Trump (R), Clinton (D), Johnson (L), Stein (G), and Other candidates in the 2016 election (data.world)
- Number of registered voters eligible to vote in the 2016 election (data.world)
- In the 28 states plus the District of Columbia where state law allows party affiliation at registration, the total number of voters registered in the following parties:
- Democratic
- Green
- Libertarian
- Republican
- Unaffiliated
- Other
- In the other 22 states, just total registered voters (categorized as "unaffiliated")
- In the 28 states plus the District of Columbia where state law allows party affiliation at registration, the total number of voters registered in the following parties:
- Limited (but growing) voter registration data for periods earlier than the 2016 general election
- For most counties (i.e., where data are available), various socio-economic and demographic variables (data.world link pending)
- From the American Community Survey, 2015, 5-year estimates:
- Total population
- Age (population by Census age categories, median age)
- Sex
- Race
- Ethnicity (Hispanic / non-Hispanic)
- Educational attainment (population by Census categories)
- Median household income
- Employment and Labor Market Participation
- Oct 2016 employment and unemployment (From the Census Current Population Survey, via Bureau of Labor Statistics)
- Total employment and manufacturing employment in 1980, 1990, 2000, 2010, and 2016 from the Bureau of Economic Analysis
- From the American Community Survey, 2015, 5-year estimates:
- Election-related attributes of each of the 50 states plus DC, including: Number of electoral votes, type of voter ID law in place in 2016 (data.world)
We also have the FIPS county code, county name, state FIPS code, state name, and state abbreviation to support creation of choropleths and geographic rollups.
This category is our initial priority, because it is best supported by the data we have initially on-hand. The overall objective here is to examine factors that explain county-level Presidential election results in 2016, and also to identify counties exhibiting anomalies or unexplained variation.
- State/county choropleths for the swing states (viz)
- Some basic viz (e.g., stacked bar charts) of (candidate vote totals, party reg, D/R victory margin) ~ (each of the socio-economic and demographic factors)
- Imbalance of the "impact" of a vote due to the electoral college (viz)
- Characteristics of the n closest counties (n=20? n=50?). What made things so close in Winnebago County (Rockford), Illinois anyway, where HRC won 55,713 to 55,624, a margin of 0.08%??
- Characteristics of counties where Johnson and Stein votes were greater than the margin of victory. Which candidate won?
- Scatterplots (blue (D) vs red (R)) dots, possibly with dot size representing victory margin) for 2D crosstabs (viz). Like:
- Educational attainment / unemployment
- Age / Race
- Race / manufacturing decline
- Turnout and margin of victory (i.e., did more voters turn out in counties where the race wound up being close)
- Counties where margin of victory was significantly different than that party's registration margin (in 28 states + DC that allow affiliation) (viz - maps and tables)
- Basic regressions (various combinations...expect this will be an iterative process!):
- D/R victory margin ~ median age; % white non-hispanic; % high school ed only; % college and above; income; manufacturing decline; unemployment; LFPR; turnout; party registration
- (logistic) D (or R) win ~ ^^^^
- Once we get a model with a good fit, look at the residuals and see if that helps identify some anomalies
- Train various classification models, then look at counties that consistently aren't classified properly
Further modeling ideas for the 2016 results:
- Proportions of registered Democrats and registered Republicans - however, because not every state allows registration by party, this is tough to include
- Total county population
- Demographics:
- Proportion of males
- Proportions of children (0-19 years) and older adults (65 years and up)
- Proportion of adults never married
- Proportion Hispanic
- Whether county is majority white, majority black, or no racial majority
- Inverse Simpson diversity index
- Is effect of diversity index modified by which race is in the majority?
- Proportion with less than high school, high school, and more than high school education
- This may be modified by proportion of adults never married? Thinking along the lines of farming community with relatively stable family structures and relatively low rates of college education may be different from communities with high proportions of unmarried adults and low rates of college education.
- Economy
- Proportion manufacturing employment, 2015 (note: it would be really interesting to interact this with proportion manufacturing employment in 2001, but we'd need 2001 population estimates for this)
- Proportion unemployed, 2015 (unemployed / labor force)
- Proportion in the labor force (how much does this correlate with proportions of children, adults, older adults?)
- Median housing costs
- Median household income
- Urban-rural classification scheme, 2013; perhaps interact this with same in 2006 or 1990? How many counties changed?
- Proportion of adults who voted
- Proportion of votes for third-party candidates (Stein, Johnson, and other)