https://github.com/davew-msft/MLOps-E2E
In this workshop, you will learn how to:
- setup and configure Azure Databricks for analytics
- setup and configure Azure ML Services to integrate your data science work into your DevOps pipelines.
- Think Like a Data Scientist by looking at a few different use cases
- Databricks (dbx)
- Azure Machine Learning Service (AMLS)
- Azure Container Instances (ACI)
- Azure Kubernetes Service (AKS)
- Azure DevOps (AzDO) or GitHub (and gh actions)
- JupyterLab Notebooks using vscode and AMLS Compute targets
- Data Scientists
- App Developers
- AI Engineers
- DevOps Engineers
- Anyone wanting to learn to think like a Data Scientist and take their analytics to the next level
The solution will look something like this:
Our company sells bikes. We have data about who does and does not buy bikes. We'd like to use that data to build a predictive model to determine who will buy bikes in the future.
- Lab 1: Setup the resources needed for this workshop
- Lab 2: Learn about Azure Machine Learning Services
- Lab 2a: Getting Databricks Ready for DevOps/MLOps
- Lab 3: Build and Deploy an ML Model
Our company is interested in building a safe driver prediction model that we want to integrate into our website. We'll use Jupyter notebooks, AMLS, and AzDO/github actions to manage the MLOps pipelines. This will build on Day 1 by adding in Pipelines
concepts. We'll determine how to retrieve the best model, package it with a WebApp, and deploy an inferencing web service. Then we'll figure out how to monitor the model's performance after it is deployed (on AKS). Our model will be deployed as ONNX format, which means it can be run on the Edge or anywhere else.
We are going to build something that looks similar to this:
The overall approach used in this lab is to orchestrate continuous integration and continuous delivery Azure Pipelines from Azure DevOps (or github actions). These pipelines are triggered by changes to artifacts that describe a machine learning pipeline, that is created with the Azure Machine Learning SDK. In the lab, you make a change to the model training script that executes the Azure Pipelines Build Pipeline, which trains the model and creates the container image. Then this triggers an Azure Pipelines Release pipeline that deploys the model as a web service to AKS, by using the Docker image that was created in the Build pipeline. Once in production, the scoring web service is monitored using a combination of Application Insights and Azure Storage.
Day 2 has only a few dependencies on Day 1 workshop. Make sure you run Lab 1 above and you should be fine
These labs can be divided between data scientists and DevOps engineers.
These tasks are geared toward data scientists:
-
- make sure you run Lab 1 above, which has the prerequisites.
-
Lab12: Refactor the Model Training Code to use AMLS
- we want to use an
experiment
to train and register amodel
and log variousmetrics
- we want to use an
-
Lab14: Deploy a Real-time Inferencing Service
- open this notebook in your AMLS compute environment and follow the steps
-
Lab15: Deploy a Real-time Inferencing Service to AKS (kubernetes)
- open this notebook in your AMLS compute environment and follow the steps to deploy to AKS
- this is very similar to ACI example in Lab14
-
Lab16: Monitoring the webservice with AppInsights
- wip
-
Lab19: A different example with deep learning on text
- There are a lot of steps to remember with those previous labs. In this example we'll look at a different example and won't get into the weeds and hopefully you will see the patterns.
- Start by uploading ./jupyter-notebooks/DeepLearningwithText.ipynb to your AMLS environment and begin there.
Hopefully your team now understands and has implemented the fundamental concepts in a local development "execute-from-my-notebook" experience and needs to apply all of these concepts to a production-ready workflow. A notebook is convenient for experimentation, but is not suited for automating a full workflow. You could use AMLS pipelines, like we did above, which is geared to data scientists. Or our workflows could be implemented in a true DevOps tool like Azure DevOps. Using Azure Pipelines to operationalize Azure ML pipelines enables powerful tools such as version management, model/data validation, model evaluation/selection, and staged deployments to QA/production. Your team will take the learnings and relevant python scripts from the previous labs to do this.
- The word 'pipeline' has started to take on multiple meanings - make sure you don't get pipeline types mixed up. See here for a description of the pipeline types. For clarity, these challenges are referring to 'Azure Pipelines' as 'DevOps pipelines'.
These tasks are geared toward Azure DevOps engineers and can be done in parallel with the tasks above, if desired. If you are using GitHub Actions please see Labs 30-34.
Overview of the MLOps/DevOps Approach for Data Science
- recommended reading
- a templatized approach to do MLOps using a starter repo. This should work for gh actions or azdo pipelines but focuses on the latter.
- Lab20: Setup AzDO.
- This also includes some Recommended Reading.
This task should be done by both the data scientist and DevOps engineer when using Azure DevOps:
- Lab21: Setup and Run a Build Pipeline
- Lab22: Setup and Run a Release Pipeline
- Lab23: Test Our Pipelines
- Lab24: Monitoring Model Performance
These tasks are geared toward GitHub Actions engineers and can be done in parallel with the tasks above, if desired. If you are using Azure DevOps please see Labs 20-24.
- Lab30: Setup GitHub: this is an alternate lab if you'd rather use github for git repos and CI/CD pipelines
This task should be done by both the data scientist and DevOps engineer when using GitHub Actions:
TODO: these labs are wip
- Lab31: Setup and Run a Build Workflow
- Lab32: Setup and Run a Release Pipeline
- Lab33: Test Our Pipelines
- Lab34: Monitoring Model Performance
These labs aren't specific to automl but they build upon each other. In these labs we'll look at employee attrition using a dataset provided by IBM.
- Lab40: Using Datasets and Datastores in AMLS: we'll first get the data into a dataset and explore the data
- Lab40a: Using "Filesets" Datasets in AMLS
- this is the pattern I use to connect to files in a data lake or blob store
- good for training CNNs with directories of image files
- Lab41: Automated Machine Learning (automl): we'll use the IBM dataset to look at the causes of employee attrition, using the AMLS GUI.
- Lab42: automl from a python notebook : We will look at running automl from within a python notebook using automl API calls. We'll forecast energy demand using NYCs open dataset. Please see the sample notebooks area for other approaches using ARIMA and deep learning.
- Lab43: automl full end-to-end MLOps pipelines. We use an open source accounts receivable dataset to predict late payments. We use that dataset and automl for predictions and we deploy the model to AKS or ACI.
- we use the AMLS UI for the initial setup
- we will do
continuous retraining
whenever the dataset changes using AMLS Pipelines. For this we will do everything programmatically from Jupyter notebooks.
- Lab80: Batch inferencing : generally most ML models are deployed for real-time inferencing and therefore are deployed on something like AKS as a container. But this pattern doesn't work well for batch inferencing. In this notebook we look at one possible pattern for batch inferencing by leveraging AMLS Pipelines feature.
- Lab85: Batch Scoring Videos Using Deep Learning Models With Azure Machine Learning
- demonstrates batch inferencing using NNs by doing neural style transfer to an uploaded video.
- Upload a video file to storage.
- The video file will trigger Logic App to send a request to the AML pipeline published endpoint.
- The pipeline will then process the video, apply style transfer with MPI, and postprocess the video. The output will be saved back to blob storage once the pipeline is completed.
- we can also do this using AKS
- Lab90: Time Series Analysis : we specifically look at time series analytics in these labs with a focus on how AMLS can help.
MLOps currently has very few industry-wide best practices to improve time-to-market. Obviously, we like MLOps using AMLS, but Kubeflow is an excellent alternative that we can integrate into AMLS. We'll build a solution using Kubeflow in these labs.
The kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow.
Kubeflow is really the following:
- JupyterHub, which allows you to request an instance of a dedicated Jupyter Notebook. We recommend using AMLS compute instances, but this is an excellent alternative. The problem is it's harder to control costs by spinning down Jupyter container instances with kubernetes HPA (horizontal pod autoscaler).
- Training controllers: this component makes training jobs deployment easier. We will only deploy training controllers for tf jobs in this workshop, but there are controllers for PyTorch and others. The AMLS analog is training compute instances, again the benefit being these are able to be better autoscaled when not in use.
- a model serving component: this isn't much different from AMLS inference clusters.
Kubeflow is meant to build E2E workflow pipelines, MLFlow is used to track metrics and deploy models. AMLS experiments and pipelines are really a superset of MLFlow and you can use the MLFlow APIs to talk to AMLS, essentially making AMLS a PaaS offering for an MLFlow server. Kubeflow is its own thing that you deploy into an AKS/k8s cluster.
- Lab100: Kubeflow Prerequisites, Background, and Motivation
- Lab101: Containerizing a TensorFlow Model
- Lab102: Kubeflow Installation
- Lab103: Running JupyterHub with Kubeflow on AKS
- Lab104: Using tfjob to run training jobs
MLflow is an OSS platform for tracking machine learning experiments and managing models. You can use MLflow logging APIs with Azure Machine Learning service: the metrics and artifacts are logged to your Azure ML Workspace. MLflow is deeply embedded with Databricks and is their model management tool-of-choice. MLflow is a great tool for local ML experimentation tracking. However, using it alone is like using git without GitHub.
- we do "local" training. The only requirement is AMLS SDK must be installed. To do that quickly we use an AMLS compute target.
- wip
These labs will get you a little more intimate with the AMLS service. You may want to start here on your journey with AMLS. Most of these are ipynb files and can be run either from your local workstation (vscode, pycharm, whatever) or using the JupyterLab notebooks on the AMLS compute engines (or anywhere else supporting .ipynb files)
Remember, we do all of these labs in code, but much of it can be examined using the AMLS workspace.
Lab | Decription |
---|---|
Lab200: The Azure ML SDK | |
Lab201: Running Experiments | |
Lab202: Working with conda and python envs using Jupyter notebooks | |
Lab203: Working with Pipelines | |
Lab204: Model Interpretability and Explainability with AMLS | we use features of AMLS and LIME |
Lab205: Monitoring a Model | |
Lab206: Monitor Data Drift | Over time, models can become less effective at predicting accurately due to changing trends in feature data. This phenomenon is known as data drift, and it's important to monitor your machine learning solution to detect it so you can retrain your models if necessary. |
These are alternate labs with different approaches to solving problems.
- Lab300: A Day in the Life of a Data Scientist...or...The Data Science Process in Action
You are somewhat new to data science and your boss hands you a new dataset and says, "make something out of this." Where do you even start? In this lab we do exactly that. In this lab we are given a dataset of support tickets and we are told, "unlock some insights to help the support team become more efficient and provide better service to customers." We work through how to get started on an analytics problem like this.
- What do we do?
- use standard data science techniques to explore the data
- determine deductively what are some interesting problems we can solve
- use automl to see if we can quickly predict those problems
- deploy the best model, assuming it meets our goals
- present your interesting analytics to executive leadership
- How we do it?:
- AMLS
- Jupyter/python/pandas/visualizations
- automl
- deploy an automl "no-code" container
- consume the model's REST API
- Power BI for the final data presentation
- Lab301: Text Analytics from Cognitive Services to a custom solution (wip) Companies would like to do text analytics but they want to move quickly and iterate. The goal is to have a solution quickly (Cognitive Services) and only if the project seems to have a positive NPV then build a custom model (if needed). CogSvc handles the "80/20" rule. It will handle 80% of the use cases you'll need and solve problems in 20% of the time. It's also not required to have a data scientist do this initial work. We will explore the complete path of integrating text analysis into our business processes, starting from pre-build models available as cognitive services, up to training a third-party neural custom model for Aspect-Based Sentiment Analysis available as part of Intel NLP Architect using Azure Machine Learning Service. We can talk about cases when one needs a custom model, and demonstrate quick ways to create such a model from scratch using AutoML, and show how to fine-tune model hyperparameters using HyperDrive
- Lab302: Integrating Pre-Built AI into your application(wip)
- We are going to use a pre-built e-commerce website and deploy 2 AI services to add a "human touch"
- First, the website will allow a customer to upload a picture of what they are looking for and it will give a recommendation. (With some add'l tweaks we could use our phone's camera to take the picture). The recommendation won't be very accurate using the pre-built Computer Vision model
- Next, we'll customize it and train it on additional photos using Azure Custom Vision Service. Then we'll take the generated model (in ONNX format) and deploy that into our website which will make the item identification much more accurate
- Finally, we'll look at using the Personalizer Service (which is reinforcement learning under-the-covers) to quickly make a recommender service for our e-commerce site.
- What you'll learn:
- How to integrate AI into a website
- How to re-train a pre-built AI model to make it better
- How to use pre-built AI to build a recommender engine...QUICKLY
- Lab303: Deep Learning Hack: Build a CNN from scratch, then make it perfect with transfer learning
We build a computer vision solution entirely from scratch using tensorflow. First we learn all of the concepts of how neural nets work. Then we show just how difficult it is to train a CNN from scratch. But don't dispair...we then look at how to use a pre-built tf model from the internet and use a trick called transfer learning to make our model almost perfect. - Lab304 wip: Computer Vision at the Edge, E2E with MLOps/DevOps deployments
- this is WIP
- works with both LVA (Live Video Analytics) and OpenCV module.
- create an IoT Edge deployment with simulated cameras
- upload simulated video and train/re-train the models and deploy them to the iot edge devices.
- Lab400: (wip) training and deployment using AMLS Pipelines
- we still keep the code in git but we use AMLS Pipelines (via its REST APIs) to call and handle the training and validation
- Lab401: (wip) a simplified Azure DevOps/MLOps solution
- We deploy a simple model quickly and don't worry about the implementation details of model development
- we do focus on how
score.py
works - we build up the azdo pipelines and focus there on how to do the automation
- the focus of this lab is simply to understand the patterns for the azdo pipelines
- Lab410: AMLS Compute auto-shutdown
For this set of labs we are going to use the CogSvcs SDKs. Instead of using a remote AMLS Compute instance we will use a Docker DevContainer and vscode to do our development. If you don't have Docker Desktop and vscode you can most definitely still use a remote AMLS Compute instance.
- Open a new instance of vscode to the folder:
MLOps-E2E\Lab500
- vscode will prompt you to open the folder in the container. Do that.
Now open each ipynb file within the container.
We do this because the docker container already has all of the Azure and AMLS dependencies baked in, saving us a lot of time and workstation configuration misery.
Lab | Description |
---|---|
Lab500: Computer Vision | Using CompViz with the SDK from Python |
Lab501: Face API | Using the Face API |
Lab502: OCR capabilities | optical character recognition |
Lab503: text analytics | text analytics |
Lab503a:sentiment analysis on Yelp! data | text analytics |
Lab504: Form Recognizer | Form Recognizer |
- Lab900: RFM Analytics
- Lab901: Association Rules
- if you have several events we can search for links between those events. This is useful to determine things like:
- cross-selling behaviors of customers
- find the links between many products
- analyze the paths customers take through our stores/website
- what products should be highlighted or removed
- if you have several events we can search for links between those events. This is useful to determine things like:
- Remove the RG you created to prevent incurring Azure costs