Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OZ FETP: Introduction to data management #92

Open
zkamvar opened this issue Nov 14, 2019 · 1 comment
Open

OZ FETP: Introduction to data management #92

zkamvar opened this issue Nov 14, 2019 · 1 comment

Comments

@zkamvar
Copy link
Member

zkamvar commented Nov 14, 2019

https://github.com/ArminderD/Data_management

This exercise was developed for the Australian Field Epidemiology Training Program, The Masters of Philosophy (Applied Epidemiology) offered through the Australian National University. This exercise provides an introduction to the RStudio environment including basic commands and the generation of basic graphs. The exercise also serves as an introduction to data management in epidemiology introducing the concepts of data checking, data cleaning, and manipulation of variables.

@zkamvar
Copy link
Member Author

zkamvar commented Nov 14, 2019

Preview document: https://htmlpreview.github.io/?https://github.com/ArminderD/Data_management/blob/master/Data_managment.html

Because there are questions and potential solutions, this will be classified as a practical instead of a Case Study.

Some issues to address, though some of these need to be cleared with the authors of the material beforehand.

  • Remove attach(). There is never a good reason to use this function.
  • Convert image tables to text
  • Find out why there is an extra column "gram" in the data set
  • Standardize code formatting
  • Standardize markdown formatting (e.g. Graphics is a level 4 header, but should be something like a level 2)
  • Standardize language used (e.g. there are places which mentions a graphics menu, but this doesn't exist in RStudio)
  • Standardize data subsetting (e.g. sometimes the user is instructed to use x[x$a > 1 & x$b < 40] and sometimes they are instructed to use subset())
  • Change ifelse cascade to dplyr::case_when()
  • Update factor manipulation (they don't need to be converted to numbers first)
  • Update dates section to have users not convert dates to character
  • Consider using tidyverse functions in place of merge and reshape (maybe?)

zkamvar added a commit that referenced this issue Nov 27, 2019
This is nearly the raw RMarkdown document I recieved. I have attempted
to add the credits at the bottom where they belong

This will address #92
zkamvar added a commit that referenced this issue Nov 28, 2019
this addressed #92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant