The ability to make data-driven decisions is redefining the future of patient care. This two-part series provides an introduction to the emerging field of health data science using the R software language, including data analysis and visualization, with a particular focus on its utility for insight in healthcare. No prior knowledge of data science or computer programming is assumed; laptops are required. Attendees will be provided with healthcare dataset examples, and introduced to R packages and code used to examine data. Particular attention will be paid to code interpretation and data provenance methods by learning to generate reproducible data output files. Although specific datasets will be used for analysis in class, this workshop will provide broadly applicable tools to reproducibly analyze and visualize data across the healthcare continuum. This series has two parts; participants are encouraged to register for both sessions (April 8 and 9, 2020).
https://www.kaggle.com/johnsmith88/heart-disease-dataset/version/2
This data set dates from 1988 and consists of four databases: Cleveland,
Hungary, Switzerland, and Long Beach V. It contains 76 attributes,
including the predicted attribute, but all published experiments refer
to using a subset of 14 attributes:
1. age: age in years
2. sex: (1 = male; 0 = female)
3. cp: chest pain type, (4 values)
4. trestbps: resting blood pressure, resting blood pressure (in mm Hg
on admission to the hospital)
5. chol: serum cholestorol in mg/dl
6. fbs: fasting blood sugar, fasting blood sugar > 120 mg/dl (1 =
true; 0 = false)
7. restecg: resting electrocardiographic results (values 0,1,2)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina, (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) colored by flouroscopy
13. thal: 0 = normal; 1 = fixed defect; 2 = reversible defect
14. target: the presence of heart disease in the patient (0 = no
disease; 1 = disease)
Class | Topic | Code | Recording |
---|---|---|---|
Session 1 | Intoduction to Data Science, R Fundamentals, Data Manipulation | LINK | Video |
Session 2 | Data Visualization, Importing and Joining, Making Reproducible Reports | LINK | Video |