This project is one of the projects in my Data Science Nano-degree at Udacity. and In this project, i will try to create a ML model, which will be able to predict when users at Sparkify are about to churn. libraries : i used many but the important ones are 1- numpy 2- matlab problems i got : the large amount of data is one of the biggest problems . Results : At the end of this project, two main iterations on a churn-prediction model were implemented and evaluated. The first model used a simple pivot of the event that seemed to contain the most relevant difference between churning and non-churning users.
random forest classifier model :
Random Forest -> PR AUC: 1.0 Random Forest | precision = 1.0 | recall = 1.0 | F1-Score = 1.0
gradient boosted trees (ie ada boost):
Gradient Boosted Trees -> PR AUC: 1.0 Gradient Boosted Trees | precision = 1.0 | recall = 1.0 | F1-Score = 1.0
SVM:
Support Vector Machine -> PR AUC: 1.0 Support Vector Machine | precision = 1.0 | recall = 1.0 | F1-Score = 1.0
logistic regression model:
Logistic Regression -> PR AUC: 1.0 Logistic Regression | precision = 0.4 | recall = 0.4 | F1-Score = 0.4
1- readme 2- sparkify