Skip to content

Latest commit

 

History

History
409 lines (312 loc) · 11.3 KB

ml_api.md

File metadata and controls

409 lines (312 loc) · 11.3 KB

Creating a Web API with Python (Remote Procedure Invocation)

In this exercise we will not just consume an API, but create one with Flask. The steps are very similar to the Docker/Flask exercise. The main difference is that we will return a JSON document instead of an HTML page. This tutorial is partially based on this page.

An alternative to Flask is FastAPI, which is an elegant and fast library to create Web APIs with Python. FastAPI uses Python type hints to infer the structure of the API from the function’s parameters. FastAPI creates also beautiful documentations of your API automatically. However, because we already know some Flask, and starting with Flask is easier, we use Flask.

  1. Create a folder mlapi, work inside the folder and open the folder in VS Code.

  2. We create a simple Flask API first. This is very similar to the Docker exercise. The difference is, that this Flask application will return JSON, not HTML.

We create the following files (exactly the same we as in the Flask example): Dockerfile:

# Use an official Python runtime as a parent image
FROM python:3.10-slim-buster

# Set the working directory to /app
WORKDIR /app

COPY app/requirements.txt requirements.txt

# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt

# Copy the current directory contents into the container at /app
COPY app/ /app

# Make port 80 available to the world outside this container
EXPOSE 80

# Run app.py when the container launches
CMD ["python", "app.py"]

Create a .gitignore file (with . at the beginning) with the following content:

.env
venv
.idea
.ipynb_checkpoints
.vscode
.DS_Store
.ipython
.local
.cache
.jupyter

Create a .dockerignore file with the following content:

.env
venv
.idea
.ipynb_checkpoints
.vscode
.DS_Store
.git
.gitignore
.ipython
.local
.cache
.jupyter

docker-compose.yml file:

services:
  web:
    build: .
    stop_signal: SIGINT
    ports:
      - '80:80'
    volumes:
      - ./app:/app

Create a folder app in the folder mlapi. Create the following files:

requirements.txt file with the content:

Flask

Create the app.py file in the app folder with the content:

from flask import Flask

app = Flask(__name__)

@app.route('/hello')
def hello():
    return {'message': 'Hello World'}, 200

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80, debug=True)

The only difference to the Flask app from the former exercise, is that we return JSON. The line:

    return {'message': 'Hello World'}, 200

Returns first a Python dictonary {'message': 'Hello World'}, that Flask is returning as JSON. The second value (200) is the HTTP return value, that means, everything went okay, and the result has been returned.

Run

docker-compose up

Download and install Insomnia https://insomnia.rest/download

Open Insomnia. Enter http://127.0.0.1/hello in the URL field in Insomnia. Select GET. Click send.

You see on the right side, an 200 OK result with the JSON return data.

Select POST (Dropdown left from the URL) and click send. You get the 405 error "Method Not Allowed"

Select GET and enter http://127.0.0.1/app and click send. You get the 404 error "Not Found".

Now we want to send a GET parameter. Change the code of app.py:

from flask import Flask, request

app = Flask(__name__)

@app.route('/hello')
def hello():
    name = request.args.get('name', '')
    message = f'Hello {name}'
    return {'message': message}, 200

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80, debug=True)

We import additionally the request module from the Flask package in the first line. Then we use the args function to get the parameter arguments with:

name = request.args.get('name')

Go to the Insomnia app and enter http://127.0.0.1/hello in the URL field. Click on the Query tab and enter the name value pair, name and BIMP. Click Send.

You should now see on the right:

{
	"message": "Hello BIPM"
}

Parameters in the URL are limited in size. Therefore it makes sense to use POST and JSON to also send data to the API.

Update app.py:

from flask import Flask, request

app = Flask(__name__)

@app.route('/hello', methods=['POST'])
def hello():
    data = request.get_json()
    name = data.get('name', '')
    message = f'Hello {name}'
    return {'message': message}, 200

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80, debug=True)

We changed two things:

@app.route('/hello', methods=['POST'])

This line says, that the function now accept the POST method.

    data = request.get_json()
    name = data.get('name')

Now we want to send the data as JSON. The first line uses again the request modul but this time get the sended JSON data with get_json. This is then saved as a Python dictonary data. data.get('name') just gets the item with the name key from the dictonary and returns the value.

In Insomnia, add http://127.0.0.1/hello and change to POST (from GET). Delete any Query parameters, if any were still there. Click on the drop-down menu next to Body and select JSON. Copy this JSON into it field:

{
	"name": "BIPM"
}

Check the results on the right side.

Creating an API for a ML model

Great. Now we will train a machine learning model and expose the trained model with a Web API.

You can create a seperate Python envirnoment (but you can also use your globale Python enviroment in the next stepts).

We will use the Iris flower data set. The data set consists of 150 samples from three species of Iris water lily flowers (Iris setosa, Iris virginica and Iris versicolor). The dataset has four features: the length and the width of the sepals and petals. Download the iris.csv file from Moodle. Create in the mlapi folder a new dev folder and save iris.csv in this folder.

Change the content of requirements.txt (in the app folder):

Flask
pandas
scikit-learn
joblib

Create in the dev folder a Jupyter Notebook file 01-training.ipynb. Open the Notebook file in VS Code.

The first cell in the notebook should install the requirements (if there are not yet installed):

%pip install -r ~/app/requirements.txt

Then in the next cell, we import the required functions:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn import metrics
import joblib

Read the iris.csv file as a DataFrame with the name data:

data = pd.read_csv('iris.csv')

Save the label as y and the features as X:

y = data['species']
X = data.drop(columns=['species'])

Train-test split:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=23)

Save Column Names:

column_names = list(X_train.columns)

We create a simple imputer that uses the median value of the column for missing values:

imp = SimpleImputer(strategy='median')

We apply this imputer on all columns:

ct = ColumnTransformer([('imputer', imp, column_names)])

We will use a Random Forest as the classifier:

clf = RandomForestClassifier(random_state=23)

The whole pipeline combines the preprocessing through the imputer and then the classifier:

pipe = Pipeline([
    ('preprocessor', ct),
    ('classifier', clf)]
)

Now we can train the pipeline:

pipe.fit(X_train, y_train)

To check the performance, we will apply the trained pipeline to the test data and compare the prediction with the real results in the test data:

y_pred = pipe.predict(X_test)
print(metrics.classification_report(y_test, y_pred))

How is the performance?

Save the model in the app folder:

joblib.dump(pipe, '../app/iris.mdl')

In VS Code, change the code of app.py:

from flask import Flask, request
import joblib
import pandas as pd

app = Flask(__name__)
pipe = joblib.load('iris.mdl')

@app.route('/hello', methods=['POST'])
def hello():
    data = request.get_json()
    name = data.get('name')
    message = f'Hello {name}'
    return {'message': message}, 200

@app.route('/predict', methods=['POST'])
def predict():
    column_names = ["sepal_length", "sepal_width", "petal_length", "petal_width"]
    data = request.get_json()
    data_vector = [data.get('sepal_length'), 
                   data.get('sepal_width'), 
                   data.get('petal_length'), 
                   data.get('petal_width')]
    X_new = pd.DataFrame([data_vector], columns=column_names)
    y_pred = pipe.predict(X_new)[0]
    return {'prediction': y_pred}, 200


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=80, debug=True)

We made the following changes:

  • We imported joblib and pandas
  • With pipe = joblib.load('iris.mdl') we load the pipeline from the stored iris.mdl file.
  • We added the predict() function
  • We define the column names
  • Then we get the data as JSON
  • Then we create a Python list based on the different features from the JSON data.
  • Then we transform the Python list into a Pandas DataFrame
  • With pipe.predict(X_new) we predict the species. Because we only have one row, we get the first row from the predictions: pipe.predict(X_new)[0]
  • Then we return the prediction as JSON with a HTTP code 200.

In the command line, start your web server with:

docker-compose up 

Open Insomnia.

  • On the left side, click on the arrow next to New Request and rename it to Hello World

  • On the left side, under Cookies, click on the plus and then on HTTP request. Rename it to Predict API

  • Change the HTTP method from GET to POST

  • Enter http://127.0.0.1/predict in the URL

  • Click on Body and select JSON

  • Copy in the Body:

{
    "petal_length": 2,
    "sepal_length": 2,
    "petal_width": 0.5,
    "sepal_width": 3
}

The result on the right side should be:

{
	"prediction": "Iris-setosa"
}

Try out these values:

{
    "petal_length": 5.8,
    "sepal_length": 2.6,
    "petal_width": 5.1,
    "sepal_width": 2.2
}

What about when you have missing data:

{
    "petal_length": 5.7,
    "sepal_length": 2.8,
    "sepal_width": 1.3
}

Now let us deploy it.

In VS Code, click on the left on the Source Control icon. Click Initialize Repository. Click on the plus icon next to changes. Enter a commit message "Initial commit" and commit. Publish the Branch.

Go to the CapRover Web GUI and create an app with the name iris.

In you terminal, type

caprover deploy

Chose the iris app and follow the instruction.

Go to the CapRover Web GUI and copy the app URL.

Go to Insomnia and exchange http://127.0.0.1/predict with your URL. with e.g. http://iris.dev.example.com/predict and try if it still works.

You might have to wait some seconds after the deploy until your server is ready. Just retry.

Submit on Moodle the web URL and the GitHub Link of your repository