-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mcs: use a controller to manage scheduling jobs #7270
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Skipping CI for Draft Pull Request. |
212f0df
to
1684a14
Compare
6cdff51
to
85b941c
Compare
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
85b941c
to
36bc79d
Compare
Signed-off-by: Ryan Leung <[email protected]>
server/cluster/cluster.go
Outdated
c.enabledServices.Store(mcsutils.SchedulingServiceName, true) | ||
} else if !c.schedulingController.running.Load() { | ||
c.startSchedulingJobs() | ||
c.enabledServices.Delete(mcsutils.SchedulingServiceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enabledServices
naming is confusing to me, it seems more like servicesToEnable
since we will delete the service name from it after starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the scheduling server is started, the enabledServices will store it, otherwise, it will be deleted in enabledServices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also confusing to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about independentServices
? And do we need maintain this in PD mode?
31eee0e
to
2d6d331
Compare
server/server.go
Outdated
@@ -489,7 +489,7 @@ func (s *Server) startServer(ctx context.Context) error { | |||
s.safePointV2Manager = gc.NewSafePointManagerV2(s.ctx, s.storage, s.storage, s.storage) | |||
s.hbStreams = hbstream.NewHeartbeatStreams(ctx, s.clusterID, "", s.cluster) | |||
// initial hot_region_storage in here. | |||
if !s.IsAPIServiceMode() { | |||
if !s.IsServiceEnabled(mcs.SchedulingServiceName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running startServer
, is the RaftCluter
running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
sc.coordinator.Stop() | ||
sc.cancel() | ||
sc.wg.Wait() | ||
sc.running.Store(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use CompareAndSwap
to avoid stop more one time.
server/cluster/cluster.go
Outdated
checkFn := func() { | ||
if c.isAPIServiceMode { | ||
once.Do(c.initSchedulers) | ||
c.enabledServices.Store(mcsutils.SchedulingServiceName, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in this code, scheduling will not start again in api mode. I think we can delete runServiceCheckJob first and put it in the next PR. Because once the PR merged, the scheduling service must be deployed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
server/cluster/cluster.go
Outdated
c.enabledServices.Store(mcsutils.SchedulingServiceName, true) | ||
} else if !c.schedulingController.running.Load() { | ||
c.startSchedulingJobs() | ||
c.enabledServices.Delete(mcsutils.SchedulingServiceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about independentServices
? And do we need maintain this in PD mode?
1f37ab8
to
e2aee75
Compare
Signed-off-by: Ryan Leung <[email protected]>
e2aee75
to
2ce4ff0
Compare
Signed-off-by: Ryan Leung <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
sc.mu.Lock() | ||
defer sc.mu.Unlock() | ||
if sc.running { | ||
log.Warn("scheduling service is already running") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will print frequently
Signed-off-by: Ryan Leung <[email protected]>
func (c *RaftCluster) GetPausedSchedulerDelayUntil(name string) (int64, error) { | ||
return c.coordinator.GetSchedulersController().GetPausedSchedulerDelayUntil(name) | ||
// IsServiceIndependent returns whether the service is independent. | ||
func (c *RaftCluster) IsServiceIndependent(name string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a comment to indicate what service to be supported? only scheduling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe later, right now, we just have a scheduling service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
/merge |
@nolouch: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: e807c69
|
@rleungx: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
ref tikv#5839 Signed-off-by: Ryan Leung <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
What problem does this PR solve?
Issue Number: Ref #5839.
What is changed and how does it work?
This PR uses a controller to start/stop the scheduling jobs so that we can dynamically control the scheduling jobs according to if there is a scheduling service.
Check List
Tests
Release note