Skip to content

Quantfish workshop planning

Robert Wildermuth edited this page Oct 16, 2018 · 9 revisions

Fall 2018, Data analysis using R

Idea is a series of tutorials, to provide people with some simple tools that can facilitate their data analysis and visualization. Organizers: G. Fay & M. Winton, R.Wildermuth, J. Cummings

Dates and topics

Nov 13, 2018: Intro to R

  • Quick teaser/preview of potential uses of R in fisheries science

Nov 20, 2018: R Data Basics

  • Reading in data, including readxl package
  • Data types, including date formatting
  • Simple graphs
  • Viewing tables
  • summarize(), str()
  • R for Data Science (http://r4ds.had.co.nz/index.html) Ch 4: Workflow: basics Ch 6: Workflow: scripts
  • Products:
    • Histograms, boxplots
    • Saved script with notes

Nov 27, 2018: R Data Wrangling

  • Joins, subsetting
  • filter(), select(), mutate()
  • R for Data Science Ch 5: Data transformation
  • Products:
    • Group means summary
    • Sample size/counts info
    • "pivot table"-like object

Tutorial Title topics

  1. what is R? Ecological data analysis in R (product: map of locations with CPUE for 1 year, fitting a GLM [maturity ogive?])
  2. Reading data into R, creating a histogram/boxplot (product: table of summary stats, boxplot/histogram?)
  3. Data wrangling (joins, filter, select, purrr, etc.)
  4. Plotting (product: data visualization of own dataset?) [Jonathan]
  5. Mapping (product: Map that uses multiple layers - (from shapefiles, polygons, etc.) [Megan]
  6. Linear modeling (product: plots of diagnostics, predictive evaluation, model selection, plotting predictions)

Suggestions for topics / functions, etc.

Want to id some folks to teach individual sessions (i.e. not all done by GF & MW)

  1. What do we want people to obtain experience with?
  2. What are functions that are easy to teach to accomplish these?
  3. What are example exercises / common problems that will cover these functions.
  4. What are some example data sets that can be used to demonstrate these exercises/problems?
  • Intro to R & R studio, and tidyverse. Very little on this (enough to enable people to work). Emphasis on using tools, value of them comes from seeing how the tools are useful, rather than exposition on the tools.

(GF: Contemplate setting up cloud-based studio server so that we can alleviate installation problems...)

Useful Data set would be one with spatial information, multiple (>1) measured variable, and some type of grouping structure.

  • Data wrangling (split over multiple tutorials) / exploratory data analysis
  1. reading data into R (read_csv, readxl , read from GoogleDrive)
  2. intro to data frames as objects to work with in R (tibbles)
    • characters/real/factors
    • dates, lubridate
  3. dplyr functions (common API structure- functions are verbs that signal action, 1st argument is a data frame, returns data frame (generally), other arguments reference columns by name)
  4. choose columns (select)
  5. choose rows (filter) [use of logicals, multiple conditions, is.na()]
  6. magrittr (the pipe!) i.e. f(x, y) is the same as x %>% f(y)
  7. introduce some simple vector-based arithmetic functions (e.g. sum()) sorting

--

  1. create/change variables (mutate) [can bring in additional arithmetic functions here, e.g. mean()
  2. summarising many rows (summarise), one or more variables
  3. summarise across groups (group_by), (e.g. do group_by and summarise to obtain group means, sds, etc.)
  4. plotting with multiple groups (e.g. fill with geom_histogram)
  5. count() , n()
  6. joining data tables together (left_join() , this is most common (at least for me), perhaps mention other types of join...)
  7. other functions, drop_na ,
  8. reshaping data (wide/long format), benefits of doing analysis in long format, recording often easier in wide format (gather, spread)
  9. iterations - for loops are slow! purrr map functions
  10. list columns - can start to bring in statistical models now (e.g. t-tests or lm on groups), use of map functions to extract diagnostics from statistical model object (e.g. pull p-value into a separate column)
  11. unnest

Think we could do all the above with a dataset like a trawl survey, with different species, or different management areas as groups. (we'd subset the data, but leave some of the mess to demonstrate some of the above)

  • Plotting (main types of plots) scatterplots Histograms Boxplots Barplots Pairplots

Maps

  1. ggplot2 package
  2. ggplot approach, call to ggplot, then add layers
  3. visual elements via geoms
  4. appearance via aesthetics
  5. geom_histogram & geom_point during data wrangling tut
  6. multipanel plots via faceting
  7. some additional stuff from chap 28 of R4DS
  8. saving figures to file - hi-res figures (think initially the way to bring this up is just as machinery to get to visuals, a later tutorial might talk about the ggplot mechanics)
  • Mapping making maps in ggplot Creating easy maps to show data layers, probably including info from google maps etc. ggmap exercise probably easiest entry point reading shapefiles into R (prefer sf approach for compatibility with other topics, but rgdal/sp functions useful too) rgeos Rasters, etc. spatial objects (multi-dimensional arrays) Making polygons

The above we might be able to do in ~4 tutorials. (thinking there will likely need to be some recaps during sessions)

Other topics: (some simple, some more complex...)

  • writing functions
  • R Markdown (could bring this in a tutorial but not have it be the main focus - in fact we could just view Rmarkdown as a default environment for working in R)
  • generalized linear modeling
  • interactive apps via R Shiny (perhaps the goal would be to turn a previous tut into a web app)
  • image analysis?
  • text analysis
  • time series modeling
  • Rcpp