-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config, cluster: add an option to halt the cluster scheduling #6498
Conversation
Signed-off-by: JmPotato <[email protected]>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
|
||
// HaltScheduling is the option to halt the scheduling. Once it's on, PD will halt the scheduling, | ||
// and any other scheduling configs will be ignored. | ||
HaltScheduling bool `toml:"halt-scheduling" json:"halt-scheduling,string,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, I am trying to introduce a scheduling mode to cover this case. For me, it's ok to use an individual config to control it. Maybe we can name it enable-scheduling or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to use a configuration name with a default value of false
to control the global scheduling switch, in order to avoid unexpected behaviors in scenarios that require compatibility considerations such as upgrades. Therefore, from this perspective, I think descriptions like "disable" or "halt" are more appropriate. At the same time, this global shutdown scheduling behavior should not be long-term. In addition, we already have the concept and operation of "pause" for Scheduler. So I ultimately chose the word "halt". WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I work on #6553, I found that maybe it's better to use one config for both unsafe recovery or halt, so that we can decouple the dependencies between cluster and coordinator.
Signed-off-by: JmPotato <[email protected]>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6498 +/- ##
==========================================
+ Coverage 74.66% 74.97% +0.31%
==========================================
Files 414 410 -4
Lines 42323 41910 -413
==========================================
- Hits 31599 31421 -178
+ Misses 7936 7727 -209
+ Partials 2788 2762 -26
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Signed-off-by: JmPotato <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
"dashLength": 10, | ||
"dashes": false, | ||
"datasource": "${DS_TEST-CLUSTER}", | ||
"description": "The allowance status of the scheduling.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about putting is near "Scheduler is running"
But it makes sense where it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer leaving it here since it's more like a cluster-level status rather than the scheduler. Another reason is that if it is placed in the Scheduler
panel, it may cause many changes to the Grafana JSON file. If it is only appended here, there will be fewer changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: JmPotato <[email protected]>
@rleungx @binshi-bing PTAL, thx. |
/merge |
@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: e84b299
|
In response to a cherrypick label: new pull request created to branch |
ref tikv#6493 Signed-off-by: ti-chi-bot <[email protected]>
…#6558) ref #6493, ref #6498 Add an option to halt the cluster scheduling. Signed-off-by: husharp <[email protected]> Co-authored-by: husharp <[email protected]>
) ref tikv#6493 Add an option to halt the cluster scheduling. Signed-off-by: JmPotato <[email protected]>
What problem does this PR solve?
Issue Number: ref #6493.
What is changed and how does it work?
Check List
Tests
During the halt:
Release note