Skip to content

Latest commit

 

History

History
 
 

2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

DS-SF-30 | Final Project 2: Experimental Write-Up and Exploratory Data Analysis

Submission:

  • Please push your assignment to your fork (your GitHub repository of the course) and submit a link to it via the form shared in Slack.

PROMPT

In this project, you will complete the problem statement and research design outline for your final project. This will serve as the starting point for your analysis. Make sure to include a specific aim and hypothesis, well-defined risks and assumptions, and clearly articulated goals and success metrics. You will create a Jupyter notebook that explores your dataset mathematically.

Objectives:

  • Create an outline of your research design approach, including hypothesis, assumptions, goals, and success metrics.
  • Create an exploratory data analysis notebook with statistical analysis and visualization.

DELIVERABLES

Project Design Writeup

  • Well-articulated problem statement with "specific aim" and hypothesis, based on your lightning talk.
  • An outline of any potential methods and models.
  • Detailed explanation of the available data. (i.e., build a data dictionary or link to pre-built data dictionaries)
  • Describe any outstanding questions, assumptions, risks, and caveats.
  • Demonstrate domain knowledge, including specific features or relevant benchmarks from similar projects.
  • Define your goals and criteria, in order to explain what success looks like.

Exploratory Analysis Writeup

  • A well organized Jupyter notebook with code and fully ran top to bottom.
  • At least one visual for each independent variable and, if possible, its relationship to your dependent variable.
    • It's just as important to show what's not correlated as it is to show any actual correlations found.
    • Visuals should be well labeled and intuitive based on the data types.
      • For example, if your x variable is temperature and y is "did it rain," a reasonable visual would be two histograms of temperature, one where it rained, and one where it didn't.
    • Tables are a perfectly valid visualization tool! Interweave them into your work.
  • Provide insight about dataset and its impact on your hypothesis.

RESOURCES

Suggestions for Getting Started

  • The more time you spend researching, the less time you'll likely spend writing; this is a positive sign!
  • While researching, keep track of all of your resources. Make sure they're trustworthy.
  • If you've seen similar work online, see if you can find the code that implemented the data munging. It might come in handy.
  • If your project requires using an API, make sure you can get access to it. Not everyone gives away API keys immediately, and you don't want to be caught with no data with one week left to work!
  • Consider building some helper functions that help you quickly visualize and interpret data.
    • Exploratory data analysis should be formulaic; the code should not be holding you back. There are plenty of "starter code" examples from class materials.
  • DRY: Don't Repeat Yourself! If you see yourself copy and pasting code a lot, turn it into a function, and use the function instead!

Specific Tips

  • Provide a sense of depth and scale to the project, which can be used to guide where the majority of your time should be spent working on the project.
  • Show a clear connection between the datasets and the problem presented. The project should avoid working with independent variables (or features) that would not ordinarily be available in order to predict your target.