Skip to content

tidybiology/tidybiology-plusds

Repository files navigation

tidybiology

Introduction to Data Science in Health Care (Sessions 1 and 2)

The ability to make data-driven decisions is redefining the future of patient care. This two-part series provides an introduction to the emerging field of health data science using the R software language, including data analysis and visualization, with a particular focus on its utility for insight in healthcare. No prior knowledge of data science or computer programming is assumed; laptops are required. Attendees will be provided with healthcare dataset examples, and introduced to R packages and code used to examine data. Particular attention will be paid to code interpretation and data provenance methods by learning to generate reproducible data output files. Although specific datasets will be used for analysis in class, this workshop will provide broadly applicable tools to reproducibly analyze and visualize data across the healthcare continuum. This series has two parts; participants are encouraged to register for both sessions (April 8 and 9, 2020).

Heart Disease Public Health Dataset

https://www.kaggle.com/johnsmith88/heart-disease-dataset/version/2

This data set dates from 1988 and consists of four databases: Cleveland, Hungary, Switzerland, and Long Beach V. It contains 76 attributes, including the predicted attribute, but all published experiments refer to using a subset of 14 attributes:
1. age: age in years
2. sex: (1 = male; 0 = female)
3. cp: chest pain type, (4 values)
4. trestbps: resting blood pressure, resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestorol in mg/dl
6. fbs: fasting blood sugar, fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
7. restecg: resting electrocardiographic results (values 0,1,2)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina, (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) colored by flouroscopy
13. thal: 0 = normal; 1 = fixed defect; 2 = reversible defect
14. target: the presence of heart disease in the patient (0 = no disease; 1 = disease)

Links

Class Topic Code Recording
Session 1 Intoduction to Data Science, R Fundamentals, Data Manipulation LINK Video
Session 2 Data Visualization, Importing and Joining, Making Reproducible Reports LINK Video

About

tidybiology lecture for +DS course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages