The objective of this project is to create a simple feature engineering module that generates some features/variables that are useful for further Machine & Deep Learning modeling. The python module (which is dockerized) is called via an API endpoint, returning some features/variables. 11 more endpoints were built to support the functionality
For project the following tech stack, APIs, architecture was used and applied:
- Python✅
- FastAPI ✅
- Docker Compose ✅
- Logger ✅
- Mark1 System Design + Configuration ✅
- Spreadsheet API (through gspread wrapper) ✅
api
│ requirements.txt
│ Dockerfile
│ docker-compose.yml
│
└───src
│ │ __init__.py
│ │ config.ini
│ │ config.py
│ │ data_loading.py
│ │ feature_engineering.py
│ │ data_reporting.py
│ │ main.py
│ │ models.py
│ └─── markI.py
│
└───data
│ │ data.csv
│ └─── data.json
│
└───features
│ │ features_loans.csv
│ └─── features_customers.csv
│
└───db
│ │ db.db
│ └─── db.py
│
└───config
└─── gspread
└─── service.json
Please feel free to hit all our endpoints by visiting http://localhost:8000/docs#/. The list of the endpoints is the following
[GET] /api/v1/features/{ontology}
: The basic endpoint which returns all the features generated after applying the feature engineering analysis. The permitted ontologies here are (a) customers (b) loans. In the project requirements we are asked to create the endpoint only for customers, however we expanded our work. This permit us eg. to create a machine learning model that is dedicated to loans analysis, so having features in a loan-based level may be proven useful[GET] /api/v1/api_status
: The endpoint which returns {“status” : “UP”} if both our basic endpoints /api/v1/features/customers and /api/v1/features/loans return a status code of 200 after being hit
[GET] /api/v1/loans/
: This endpoint is used when we need to fetch the list of loans currently existing[GET] /api/v1/loans/{loand_id}
: This endpoint is used when we need to fetch a specific loan currently existing[POST] /api/v1/loans/
: This endpoint can be used to register a new loan (for a specific customer)[DELETE] /api/v1/loans/{loan_id}
: This endpoint can be used to delete a specific loan
[GET] /api/v1/customers/
: This endpoint is used when we need to fetch the list of customers currently existing[GET] /api/v1/customers/{customer_id}
: This endpoint is used when we need to fetch a specific customer currently existing[POST] /api/v1/customers/
: This endpoint can be used to register a new customer[DELETE] /api/v1/customers/{customer_id}
: This endpoint can be used to delete a specific customer
[GET] /api/v1/database
: Fetches the list of customers defined in the ./db/optasia_db.py file, creates local dbs, serialises the data and pushes them into these 2 new data tables (of the same ./db/optasia_db.db database)[POST] /api/v1/database/{db_name}
: This endpoint is used to clear (recreate an empty data table) for a specific datatable options are (customers, loans)
In order for the unit testing to be successful we moved from a
- json loading logic to a dual
- json loading and local db loading logic, so that the code is not hard coded only for the provided given data.
Therefore, during the unit testing (all the tests can be found under /tests/test_api.py) we can now create customer and loan objects that are properly pushed to a local db (2 respective data tables). All the endpoints retrieve the necessary data from the local db (which is smoothly and systematically updated), except for the [POST] /api/v1/features/{ontology} endpoint (which works smoothly for both json and local db cases)
In the spreadsheet Feature Engineering API Reporter several tabs can be found and all are split into 2 clusters
- api analysis tabs (1)
customers_features
(2)loans_features
in which the features generated by the feature engineering process are pushed here - custom analysis tabs : Firstly, we can push (manually, or automatically) data to the following tabs : (1)
customers_raw
and (2)loans_raw
tabs. Then we calculated some statistics (normal distribution) for some of the features at (3)customers_statistics
and (4)loans_statistics
tabs and finally we gathered all the visualisations under the tab calledvisualisations
for further inspection. This approach is considered scalable and is useful when tech people should provide business insights to their stakeholders.
- apt-get update
- sudo apt install python3-pip
- download the installer for your python version (run "python -V" or "python3 -V" to see your version)
- run the installer and follow instructions
- source ~/.bashrc #or restart shell
- conda install python=3.8
- conda create --name featengapi python=3.8
- conda activate featengapi
export PYTHONPATH="${PYTHONPATH}:/path/to/your/project/"
Execute the command
$ docker-compose up --build
After the application starts, navigate to http://localhost:8000 in your web browser and you should see the following json response:
{
"Hello": "World"
}
Visit http://localhost:8000/docs#/.
- Hit endpoint
[POST] /api/v1/database/customers
with arguments (customers, customer_id) to flush the local db “customers” - Hit endpoint
[POST] /api/v1/database/loans
with arguments (loans, loan_id) to flush the local db “loans” - Hit endpoint
[GET] /api/v1/api_status
to validate that both [GET] /api/v1/features/customers and [GET] /api/v1/features/loans are running smoothly - Hit endpoint
[GET] /api/v1/features/customers
to retrieve the features of customers in json format - Hit endpoint
[GET] /api/v1/features/loans
to retrieve the features in json format - Hit endpoint
[POST] /api/v1/features/customers
to upload the customers features on the Optasia API Reporter - Hit endpoint
[POST] /api/v1/features/loans
to upload the loans features on the Optasia API Reporter
$ docker-compose down
Feature Work would be
- To build more endpoints
- To extract more features manually
- To build baseline machine learning models and extract *feature importances (eg. through the use of the LIME algorithm)
- To optimise *machine learning models by experimenting (by properly storing the experiment configurations and results eg. Spreadsheet)
- To optimise deep learning models