Skip to content

This project is part of an assessment for the Data Science Pathway of the Talent Mobility Program by Azubi Africa.

License

Notifications You must be signed in to change notification settings

elvis-darko/AZUBI-AFRICA---TALENT-MOBILITY-PROGRAM-ASSESSMENT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AZUBI AFRICA TALENT MOBILITY PROGRAM ASSESSMENT

PROJECT DESCRIPTION

As a member of the data analytics team, my role involves creating tools that use the bank's operational data to help the business achieve its goals and projections.
For this project, I have been tasked to predict whether a client will subscribe to a term deposit (indicated by the variable "y" as "yes" or "n" as "no"). My task involves analyzing the dataset to assess trends and inisghts. Also, I am tasked to build a predictive model that determines the likelihood of a client subscribing to a term deposit based on the features provided in the dataset.

SUMMARY

PROJET CODE NAME DEPLOYED APP DESCRIPTION
TMP_1 CLIENT TERM DEPOSIT PREDICTION STREAMLIT APP Data Scientist is tasked to train and deploy a machine learning model that predicts the likelihood of a client subscribing for the Bank's Term Deposit

Sample of my tasks are as follows;

  1. Conduct Exploratory Data Analysis (EDA)
    I identify patterns, correlations, and any necessary data preprocessing steps, such as handling missing values, outliers, and data normalization.

  2. Feature Engineering
    I evaluate which features might be most relevant to predicting client subscription and consider creating new features if applicable.

  3. Build a Predictive Model
    I use a machine learning algorithm of choice to build a model predicting the subscription outcome.

  4. Evaluate Model Performance
    I use appropriate metrics such as accuracy, precision, recall, and F1 score to assess model effectiveness. Also, I Consider any imbalanced classes and adjust accordingly, possibly using techniques like oversampling, undersampling, or adjusting the class weights.

  5. Explain the Findings and Insights
    I summarize key findings from the EDA and insights from the model, such as which features were most impactful, common characteristics of clients likely to subscribe, and actionable recommendations for the marketing team.

THE DATA

The data is related with direct marketing campaigns of a banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

There are four datasets:

  • bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed.
  • bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.
  • bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).
  • bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs).

The data has 17 attributes which are listed in the table.

VARAIABLE DEFINITION DATA TYPE
AGE Age of client Numeric
JOB Type of job Categorical
MARITAL Marital status of client Categorical
EDUCATION Education level of client Categorical
DEFAULT Has client's credit defaluted Binary
HOUSING Does client have a house loan Binary
BALANCE Client average yearly balance Numeric
LOAN Does the client have a personal loan Binary
CONTACT Type of communication with bank Categorical
DAY Last contact day of the month Numeric
MONTH Last contact month of the year Categorical
DURATION Last contact duration in seconds Numeric
EMP.VAR.RATE Employee Varistion Rate Numeric
CONS.PRICE.IDX Consumer Price Index Numeric
CONS.CONF.IDX Consumer Confidence Index Numeric
EURIBO3M 3 months Euro Interbank Offered Rate Numeric
NR.EMPLOYED Number of bank employees Numeric
CAMPAIGN Number of contacts performed during this campaign and for this client Numeric, includes last contact
PDAYS Number of days that passed by after the client was last contacted from a previous campaign Numeric, -1 means client was not previously contacted
PREVIOUS Number of contacts performed before this campaign and for this client Numerical
POUTCOME Outcome of the previous marketing campaign Categorical
Y Has the client subscribed a term deposit? Binary

SETUP

It is recommended to have Virtual Studio Code or any other standard code editor on your local machine.
Install the required packages locally to your computer.

It is recommended that you run a python version 3.0 and above. You can download the required python version from here.

Use these recommended steps to set up your local machine for this project:

  1. Clone the repo : To clone this repo, copy the url and paste it in your GitHub desktop or code editor on your local machine.

     https://github.com/elvis-darko/AZUBI-AFRICA---TALENT-MOBILITY-PROGRAM-ASSESSMENT.git
    
  2. Create the Python's virtual environment :
    This will isolate the required libraries of the project to avoid conflicts.
    Choose any of the line of code that will work on your local machine.

         python3 -m venv venv
         python -m venv venv
    
  3. Activate the Python's virtual environment :
    This will ensure that the Python kernel & libraries will be those of the created isolated environment.

         - for windows : 
                      venv\Scripts\activate
    
         - for Linux & MacOS :
                      source venv/bin/activate
    
  4. Upgrade Pip :
    Pip is the installed libraries/packages manager. Upgrading Pip will give an to up-to-date version that will work correctly.

         python -m pip install --upgrade pip
    
  5. Install the required libraries/packages :
    There are libraries and packages that are required for this project. These libraries and packages are listed in the requirements.txt file.
    The text file will allow you to import these libraries and packages into the python's scripts and notebooks without any issue.

         python -m pip install -r requirements.txt 
    

MACHINE LEARNING MODEL DEPLOYMENT

Run Streamlit App

A streamlit app was added for further exploration of the model. The streamlit app provides a simple Graphic User Interface where predicitons can be made from inputs.

  • Run the demo app (being at the root of the repository):

      Streamlit run streamlit.app.py
    

EVALUATION

The evaluation metric is the F1-score. The model would predict the likelihood of a client subscribing to a term limit given certain parameters. The model with the highest F-1 score will be the best performing model.

The final work would look like this:

        client 1d                                   TERM DEPOSIT
        00001                                            no
        000055                                           yes
        000081                                           no
  • no Stands for NO, meaning the client is not likely to suscribe to a term limit
  • yes Stands for YES, meaning the client is likely to suscribe to a term limit

RESOURCES

Here are some ressources you would read to have a good understanding of tools, packages and concepts used in the project:

Alt text Alt text Alt text Alt text

CONTRIBUTORS

NAME COUNTRY E-MAIL
ELVIS DARKO GHANA [email protected]

About

This project is part of an assessment for the Data Science Pathway of the Talent Mobility Program by Azubi Africa.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published