[RFC]: Storage Optimization for OPEA Workloads proposal #1118

hle2 · 2025-01-07T07:52:37Z

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-GNR

Running nodes

Multiple Nodes

Description

Storage Optimization for OPEA Workloads proposal

This RFC proposes a solution to manage distributed storage and optimize data access performance (through leveraging 3rd party solution) for OPEA workloads.

Author(s)

qwren
ichbinblau
hhb584520
airren
hualongfeng
majianpeng
hle2

Status

Under Review

Objective

Goals:

A plugin based framework to manage and integrate local/distributed storage solutions and data cache solutions
Plugin for local storage and distributed storage solutions (e.g. NFS, Ceph)
Plugin for data cache solutions (e.g. Fluid)

Non-Goals:

Implement OPEA own distributed storage solution and data cache solutions
Deploy 3rd party storage solution and data cache solutions

Motivation

AI applications typically require large amounts of data that can be shared: e.g., models, training data, context data in RAG systems etc. While there are no components in OPEA to manage the data sharing and the OPEA users need to (1) Explicitly manage (e.g. create, delete etc.) their own data (2) Create Persistent Volumes (PV) and Persistent Volume Claims (PVC) and address sharing issues.
There is no effective way in OPEA to address data access latency and high bandwidth challenges stemming from remote data pull caused by the separation of compute and storage. Solution likes fluid helps to accelerate data access for data-intensive applications, while it needs the OPEA user to have the knowledge of backend storage system and usually hard to configure.
There is no support for offline environments or handling intermittent network connectivity in OPEA.

Design Proposal

We follow the k8s CSI specification to implement an OPEA CSI driver to communicate with K8s API Server and Kubelet then manage PV/PVC for OPEA workloads, the CSI Driver will manage backend storage solution through IDataEngine interface and cache solution through ICacheEngine interface, as shown in the follow diagram:

flowchart TD;
    A(K8s API Server)<-->B(OPEA CSI Plugin);
    Kubelet<-->B;
    B-- IDataEngine -->E(Local Disk);
    B-- IDataEngine -->F(Network Disk);
    B-- ICacheEngine -->Fluid;
    style B fill:#f9f;

API Definitions

IDataEngine
OPEA CSI Driver uses IDataEngine to manage plugins for local storage or distributed storage solutions: the engineParam parameter (includes information such as url, security information on how to communicate with the storage solution etc.) will be used to configure the plugin; Create/DeleteStorage interface will be invoked by the CSI plugin to create/delete real storage when required, then the storage will be used by the cache engine (when appliable) to manage data for OPEA workloads.

type IDataEngine interface {
	Type() string
	Configure(engineParam interface{}) error
	CreateStorage(key string) (string, *StorageInfo, error)
	DeleteStorage(id string) error
}

ICacheEngine
OPEA CSI Driver uses ICacheEngine to manage plugins for cache solutions: the engineParam parameter (includes information such as how to configure the cache engine ) will be used to configure the plugin; Create/DeleteCache interface will be invoked by the CSI plugin to create/delete cache in the local node, then the cache information will be used to generate PV for OPEA workloads.

type ICacheEngine interface {
	Type() string
	Configure(engineParam interface{}) error
	CreateCache(key string)  (string, *PVInfo, error)
	DeleteCache(id string) error
}

Working flow:

Deployment Stage
It is expected the System Admin to deploy the distributed storage solution and cache solution in the cluster, then create a StorageClass (as shown below) to let the OPEA CSI Driver know what the storage and cache solution is and how it should be used. Different plugin can define its own CRD for dataEngine and cacheEngine parameter the it will be transferred to engineParam parameter to configure DataEngine and CacheEngine plugin by the CSI Driver.
If no dataEngine parameter is configured or no data storage solution deployed, the CSI Driver will fall back to use node's local disk for data access of OPEA workloads.
If no cacheEngine parameter is configured or no cache solution deployed, the CSI Driver will fall back to use no cache to create PV for OPEA workloads.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-opea-storage1
provisioner: opea-storage.csi.k8s.io
reclaimPolicy: Delete
parameters:
   dataEngine: |
       kind:
       apiVersion:
       name:
   cacheEngine: |
       kind:
       apiVersion:
       name:

Working Stage
OPEA user will create a PVC (PersistentVolumeClaim) as below to use the OPEA storage solution.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-modelA
  annotations:
        opea-storage-key: modelA
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: csi-opea-storage1

The PVC should define a annotation with "opea-storage-key" key to enable data sharing between workloads. The OPEA CSI Driver maintains a "PV Meta Cache" internally. It will first check if the key for the PVC is existing or not, If existing, the OPEA CSI Driver will reuse the created remote storage for the workloads.

Key	Access	Backend	PVMeta
TenantId.ModelA	ReadOnly	NFS	…

General flow
The general flow is shown in below diagram:

Alternatives Considered

Using fluid solution directly
Fluid provides CSI Driver to enable it to be used by k8s workloads, while it:

Need install fluid solution
Need user to have the knowledge of backend storage solution, different solution will have different fluid configuration which cannot support "write once, run anywhere" experience.
Need user to have a global view on the cluster for cache management.

Compatibility

N/A, the solution will be used when the user define PVC explicitly.

Miscellaneous

List other information user and developer may care about, such as:

Staging plan

POC to integrate data abstraction and acceleration solution (e.g. Fluid) and distributed storage solution (e.g. Ceph), with OPEA
Implement the Opea CSI driver with None/Fluid as cache engine and local/nfs as data engine, Enable E2E flow for ChatQnA with optimized Opea storage solution
More DataEngine, More CacheEngine, Multiple Tenant support, Model Management, E2E flow for OPEA examples

The text was updated successfully, but these errors were encountered:

poussa · 2025-01-07T13:32:55Z

I would still like the OPEA project to use Fluid, or any existing caching solution. Fluid is an existing project with wide community targeting data intensive (e.g., AI) applications. OPEA project should focus on creating Fluid configurations and deployments for each use case.

I don't see the added value of creating OPEA CSI caching solution if there is already a solution which the project can leverage.

hle2 · 2025-01-08T03:28:43Z

@poussa Thanks much for the comments! Let me add some clarifications for reference: this RFC proposal is NOT target to create a OPEA CSI caching solution, instead, it targets to provide a solution to manage existing caching solution (e.g. fluid-https://github.com/fluid-cloudnative/fluid/, Rok-https://www.arrikto.com/rok-data-management-platform/ etc.) and data storage solution (e.g. local, nfs, ceph etc.). Based on the experiences of our POC (fluid + NFS/Ceph), We think the proposal can bring below values for OPEA users:

Make the use of cache solution and data storage solution transparent to OPEA users. OPEA users usually run their workloads in different environments such as test environment, production environment(private cloud, public cloud), and different environment usually has different data storage solution or cache solution deployed. It is usually a burden for OPEA users to know the details of data storage solution (which is deployed by cluster admin) if the user is not the owner of the cluster, e.g. to use fluid with ceph, the user needs to create a dataset CR with special ceph configuration like https://github.com/fluid-cloudnative/fluid/blob/8740eb3212e02b08e80d0692ec04b830d7f0499c/docs/en/samples/s3_configuration.md?plain=1. With the proposed solution, the OPEA CSI driver will configure the cache solution (e.g. create fluid dataset CR) automatically according to the StorageClass created by cluster admin, and the OPEA user can use the same yaml file to run his workload in different environment (the cluster admin creates StorageClass with different dataEngine and cacheEngine parameter to map to different Environments).
Manage the cache solution and data storage solution in a more efficient and unify way. e.g. workload1 and workload2 of user1 use model1, workload3 of user2 use model1 also. It is expected workload1 and workload2 can share the storage and dataset(if use fluid) of model1, while workload3 should use a seperate storage and dataset(if use fluid) of model1. With the proposed solution, the OPEA CSI driver can handle the logic efficient (e.g. based on tenency information) with a unified way (e.g. handle all the differences of data storage solution and cache solution internally).

hle2 added the feature New feature or request label Jan 7, 2025

joshuayao mentioned this issue Jan 8, 2025

[Exploration] Explore model aware storage (storage types, caching) #935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Storage Optimization for OPEA Workloads proposal #1118

[RFC]: Storage Optimization for OPEA Workloads proposal #1118

hle2 commented Jan 7, 2025

poussa commented Jan 7, 2025

hle2 commented Jan 8, 2025

[RFC]: Storage Optimization for OPEA Workloads proposal #1118

[RFC]: Storage Optimization for OPEA Workloads proposal #1118

Comments

hle2 commented Jan 7, 2025

Priority

OS type

Hardware type

Running nodes

Description

Storage Optimization for OPEA Workloads proposal

Author(s)

Status

Objective

Goals:

Non-Goals:

Motivation

Design Proposal

API Definitions

Working flow:

Alternatives Considered

Compatibility

Miscellaneous

poussa commented Jan 7, 2025

hle2 commented Jan 8, 2025