This document aims to outline the requirements we have identified for MLOps and how we might address them.
Category | Details | Software candidates |
---|---|---|
Data Versioning | Tools to track data versions | options |
Model registry | Tools to build end-to-end model training pipelines | options |
Model serving | Tools for model serving | options |
Model monitoring | Tools for model monitoring | options |
CI/CD Orchestration | Tools for streamlining CI/CD workflow | options |
This section summaries the main requirements we have identified for each.
A trained model is fundamentally related to the data it was trained on. It is important to track the versions of data much like tracking the versions of a code. Git is geared towards tracking plain text files and is a suboptimal solution for this, therefore we require a dedicated data versioning tool.
- Ability to store and recall any particular version of data that has been used to train a model.
- Ability to associate any particular tagged data version to a particular trained model.
When the right candidate for production is found, it is pushed to a model registry — a centralized hub capturing all metadata for published models such as name, version, date etc. The registry acts as a communication layer between research and production environments, providing a deployed model with the information it needs in runtime.
- Ability to associate any trained model with a particular code and data version
When a model is ready to be deployed a model serving tool can be used to automate and facilitate the process. Given we don't expect models to require frequent or continuous updates this requirement is a lower priority.
- Provides a simple method for moving a tested and approved model to deployment
An interface is required to evaluate model performance following changes to code and/or data. Additionally, after release to deployment, the model performance may be affected by numerous factors, such as an initial mismatch between training data and live populations. Monitoring is required to continuously review model performance on a live implementation.
- Ability to track testing and live model performance and associate performance metric with code and data versions.
CI/CD/CT will benefit from processing data and models on local hardware
- Functionality to run data processing/training on local hardware
- Automatic execution of training pipeline
- Conventional CI/CD tools: issue tracking, git integration, automated tests/pipelines.