Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
README.md		README.md

README.md

DS-SF-30 | Unit Project 2: Exploratory Data Analysis

Submission:

Please push your assignment to your fork (your GitHub repository of the course) and submit a link to it via the form shared in Slack.

PROMPT

In this project, you will implement the exploratory data analysis plan developed in Project 1. This will lay the groundwork for our modeling exercise in Project 3.

Before completing an analysis, it is critical to understand your data. You will need to identify all the biases of the variables in your model in order to accurately assess the strengths and limitations of your analysis and predictions.

Following these steps will help you better understand your dataset.

Objective: A Jupyter notebook writeup that provides a dataset overview with visualizations and statistical analysis.

DELIVERABLES

Jupyter Notebook Data Exploratory Analysis

Requirements:
- Read in your dataset, determine how many samples are present, and identify any missing data.
- Create a table of descriptive statistics for each of the variables (count, mean, standard deviation, ...).
- Describe the distributions of your data.
- Plot boxplots for each variable.
- Create a covariance matrix.
- Determine any issues or limitations based on your exploratory analysis.
- Outline exploratory analysis methods.

RESOURCES

Dataset

The dataset is available here.

Starter code

For this project we will be using an Jupyter notebook. This notebook will use matplotlib for plotting and visualizing our data. This type of visualization is handy for prototyping and quick data analysis. We will discuss more advanced data visualizations for disseminating your work.

Open the starter code notebook in Anaconda.

Suggestions for Getting Started

Read in your dataset.
Try out a few pandas commands for describing your data:
- df.describe(),
- df['columnName'].sum(),
- df['columnName'].mean(),
- df['columnName'].count(),
- df.corr()
Read the documentation for pandas. Most of the time, there is a tutorial that you can follow; learning to read documentation is crucial to your success as a data scientist.

Past Projects

Look at some sample notebooks for an example of the types of visualizations you can use in your notebook.

Example Notebook

Additional Links

EVALUATION

The rubric is available here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2

2

README.md

DS-SF-30 | Unit Project 2: Exploratory Data Analysis

PROMPT

DELIVERABLES

Jupyter Notebook Data Exploratory Analysis

RESOURCES

Dataset

Starter code

Suggestions for Getting Started

Past Projects

Additional Links

EVALUATION

Files

2

Directory actions

More options

Directory actions

More options

Latest commit

History

2

Folders and files

parent directory

README.md

DS-SF-30 | Unit Project 2: Exploratory Data Analysis

PROMPT

DELIVERABLES

Jupyter Notebook Data Exploratory Analysis

RESOURCES

Dataset

Starter code

Suggestions for Getting Started

Past Projects

Additional Links

EVALUATION