Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Design for Upload Speed Test with BSL Config for Object Storage #1558

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

shubham-pampattiwar
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar commented Oct 15, 2024

Why the changes were made

Design for Upload Speed Test CRD and Controller

How to test the changes made

Review the Design

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024
Copy link

openshift-ci bot commented Oct 15, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2024
config:
region: us-east-1
bucket: my-backup-bucket
uploadSpeedTest:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not going to support automatic speed checks, I'm confused why the DPA needs to change initially.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-Check or not depends on the user configuration here. For each BSL configured we have a uploadSpeedTest struct that has enabled flag and uploadSpeedTest config. If the user sets these then UST CR would be automatically created by OADP Operator for that particular BSL, else the user could create the UST CR themselves.

docs/design/speed-test.md Outdated Show resolved Hide resolved
## Abstract

This document presents the design of a Custom Resource Definition (CRD) and its controller to test the upload speed from an OpenShift cluster to cloud object storage.
The design leverages the BackupStorageLocation configuration from the OADP operator’s DPA CRD to specify object storage locations for testing. The controller will use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we intend to only measure BSLs created by DPA and not all BSLs in OADP namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are 2 separate things:

  • You can independently create UST CR for any BSL config (so this would work for OADP BSL and Non-OADP BSLs)
  • We have a trigger for OADP BSLs (in this case DPA controller will create a UST CR)
    So all the scenarios are covered I believe.

cloudProviderSecretRef:
name: cloud-credentials
namespace: openshift-adp
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change here to only have 2 fields in spec (uploadSpeedTestConfig and backupLocationName)?

cloudProviderSecretRef can be fetched from BSL, right? So does not seem needed

Having BSL spec copied here makes it harder for user to manually create one (even thinking about DPA integration, if BSL spec changes, UST must change. Having just name reference would not simplify workflow?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was thinking about the exact same thing. Making cloudProviderSecretRef optional maybe ? or just remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed cloudProviderSecretRef from UST CRD spec, tested AWS POC as well.

## High-Level Design
Components involved and their responsibilities:

- UploadSpeedTest (UST) CRD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading whole design, would make sense to change CRD name to something more generic (NetworkSpeedTest?). If more features, like latency and download speed, are added, the CRD name would not be telling users all things it can do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had thought about that, Any Network speed would consist of upload as well as download speed, the risk here with NetworkSpeedTest is that it might give a false impression that we are catering/calculating download speed as well, but thats not our goal here. Hence UploadSpeedTest.

@shubham-pampattiwar shubham-pampattiwar marked this pull request as ready for review November 12, 2024 02:03
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2024
@openshift-ci openshift-ci bot requested review from mrnold and sseago November 12, 2024 02:04
Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we said we want to ref by name so make .spec.backupLocation.<> optional if .spec.backupLocationName is specified.

metadata:
name: my-upload-speed-test
spec:
backupLocation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backupLocation:
backupStorageLocationName: <name of bsl to test>
backupLocation:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want UST to be able to be used independently (in addition to via DPA) by providing the BSLConfig. Thinking of the the use-case:

  • User has a BSL Config,
  • they test the BSLConfig via UST
  • now use the BSLConfig to create BSL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we do not want to reference by name to an already existing BSL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah because we already have the functionality to test a particular BSL via DPA, proposed here . Why keep it bi-directional ! I believe this keeps the CRD simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an upstream velero user would not be able to use this without copying the whole bsl.spec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps that was not the plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an upstream feature :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like an easy add. If it does not complicate our designs, I would propose we do things in an upstream usable way.

It'd be much cooler to say "you can try our team's tool to debug your bsl speed"
vs
"If you were using OADP, you'd have ability to debug your bsl speed"

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify which path it would be writing to?
How will it handle "bucket full" related errors?
Let say I wanted to test upload of 2TB, we hit bucket full at 1TB. Do we throw away current results and error? or will it preserve info gathered so far ie. how long it took to upload 1TB even tho user specified 2TB?

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have continuous test as an option?
ie. don't stop uploading until user change spec from running to false.

Will user see live progress of the current speed (ie. speed over short time interval like 1s)? or only after full test will they get results?

Copy link
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.. Can you please clarify that when an admin is configuring the DPA w/ usl settings that nothing ( velero ) is restarted etc.. Just to make that point obvious

Copy link

openshift-ci bot commented Dec 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shubham-pampattiwar, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [shubham-pampattiwar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Jan 21, 2025

@shubham-pampattiwar: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants