Global Datathon 2018 (2018-09-28)
Working on the case for Telelink - A study to predict air pollution for the next 24 hour period!
More info here
Our article page on the site of Data Science Society: Link
We reached 2-nd place in the competition finals! Great job, everybody!
The project is structured following the Cross-Industry Standard Practice in Data Mining (CRISP-DM). We include the following folders:
0. Data - Contains the raw data necessary to complete the analysis, generously provided by Telelink.
1. Business Understanding - Documentation about the business case and goal of the project.
2. Data Understanding - Exploartory scripts and output graphics.
3. Data Preparation - Scripts that transform the data after a thorough exploration phase. The output from the scripts can be saved in 0. Data.
4. Modelling - Scripts that conduct the actual modelling operations.
5. Evaluation - Model accuracy testing scripts.
6. Deployment - After optimal models have been selected, we can create new scripts that automate the entire process so far end export the results in a user-friendly system such as R Shiny, interactive Excel tables or similar.
7. Documentation - We can draft our final article here before uploading it on the Datathon website.
It is strongly advised to use sequential prefixes before each filename. That way, we know the sequential order in which each file or folder in the project should be executed in order to arrive at the final resut.
Example:
- Data Preparation
- DP_010 First steps.R
- DP_011 First steps graph 01.png
- DP_012 First steps graph 02.png
- DP_020 Second approach.R