This project is for an African telecommunications company that provides customers with airtime and mobile data bundles. The objective of this project is to develop a machine learning model to predict the likelihood of each customer “churning,” i.e. becoming inactive and not making any transactions for 90 days.
This solution will help this telecom company to better serve their customers by understanding which customers are at risk of leaving.
Code | Name | Deployed App |
---|---|---|
CP | EXPRESSO CUSTOMER CHURN PREDICTION | STREAMLIT APP |
The complete dataset can be found and downloaded from Zindi: Zindi Churn Challenge
Train data : Contains information about 1 million customers. There is a column called CHURN that indicates if a client churned or not. This is the target variable. You must estimate the likelihood that these clients churned. We will use this file to train our model.
Test data : This is similar to the train data but without the Churn column. We will use this file to test our model.
The dataset has 19 variables. Below are the definitions of variables found in the datasets:
VARAIABLE | DEFINITION | FEATURE |
---|---|---|
user_id |
A unique identification number of a client | Numeric |
REGION |
The location of each client | Categorical |
TENURE |
Duration of the network usage | Numeric |
MONTANT |
The top-up amount | Numeric |
FREQUENCE_RECH |
The number of times the customer refilled | Numeric |
REVENUE |
The monthly income of each client | Numeric |
ARPU_SEGMENT |
Income over 90 days / 3 | Numeric |
FREQUENCE |
The number of times the client has made an income | Numeric |
DATA_VOLUME |
The number of connections | Numeric |
ON_NET |
Inter Expresso calls | Numeric |
ORANGE |
calls to Orange network | Numeric |
TIGO |
calls to Tigo network | Numeric |
ZONE1 |
calls to zones1 | Numeric |
ZONE2 |
calls to zones2 | Numeric |
MRG |
a client who is going | Categorical |
REGULARITY |
The number of times the client is active for 90 days | Numeric |
TOP_PACK |
The most active packs | Categorical |
FREQ_TOP_PACK |
The number of times the client has activated the top pack packages | Numeric |
CHURN |
Whether a client still patronizes the network or not. This is the variable to predict (Target Variable) | Binary |
It is recommended to have Virtual Studio Code or any other standard code editor on your local machine.
Install the required packages locally to your computer.
It is recommended that you run a python version 3.0 and above. You can download the required python version from here.
Use these recommended steps to set up your local machine for this project:
-
Clone the repo : To clone this repo, copy the url and paste it in your GitHub desktop or code editor on your local machine.
https://github.com/elvis-darko/Team_Zurich_Capstone_Project.git
-
Create the Python's virtual environment :
This will isolate the required libraries of the project to avoid conflicts.
Choose any of the line of code that will work on your local machine.python3 -m venv venv python -m venv venv
-
Activate the Python's virtual environment :
This will ensure that the Python kernel & libraries will be those of the created isolated environment.- for windows : venv\Scripts\activate - for Linux & MacOS : source venv/bin/activate
-
Upgrade Pip :
Pip is the installed libraries/packages manager. Upgrading Pip will give an to up-to-date version that will work correctly.python -m pip install --upgrade pip
-
Install the required libraries/packages :
There are libraries and packages that are required for this project. These libraries and packages are listed in therequirements.txt
file.
The text file will allow you to import these libraries and packages into the python's scripts and notebooks without any issue.python -m pip install -r requirements.txt
A streamlit app was added for further exploration of the model. The streamlit app provides a simple Graphic User Interface where predicitons can be made from inputs.
-
Run the demo app (being at the root of the repository):
Streamlit run streamlit.app.py
The evaluation metric for this challenge is Area Under the Curve (AUC).
The values can be between 0 and 1, inclusive. Where 1 indicates the customer churned and 0 indicates the customer stayed with Expresso.
Our final work would look like this:
user_id CHURN
00001dbe00e56fc4b1c1b65dda63de2a5ece55f9 0.98
000055d41c8a62052dd426592e8a4a3342bf565d 0.12
000081dd3245e6869a4a9c574c7050e7bb84c2c8 0.37
Here are some ressources you would read to have a good understanding of tools, packages and concepts used in the project:
- How to improve machine learning models
- Machine Learning tutorial - A step by step guide
- Create user interfaces for machine learning models
- Getting started with Streamlit
NAME | COUNTRY | |
---|---|---|
ELVIS DARKO | GHANA | [email protected] |
FAITH BERIDA | NIGERIA | [email protected] |
RICHMOND E.Y. ABAKE | GHANA | [email protected] |
RICHMOND TETTEH | GHANA | |
JOSEPH GIKUBU | KENYA | [email protected] |
MARIE GRACE KAGAJU | RWANDA | [email protected] |