-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only scale up zone after all leader zone replicas are ready #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: if I understand this correctly, I don't believe it will solve the problem we're hoping it will. I think we need to wait until all replicas in the leader StatefulSet are ready.
If I'm following the change here correctly, this will scale up the follower StatefulSet based on the number of ready replicas in the leader StatefulSet.
However, I don't think this will work for the problem we're trying to solve in the context of Mimir's store-gateways, for two reasons.
First: I think it can still lead to a read outage for a tenant, as it's still possible to get into a state where all store-gateways for a tenant aren't ready. This is easiest illustrated with an example:
Imagine we currently have one store-gateway per zone and three zones, and a tenant configured with a replication factor of three (one per zone). Let's say we've just initiated a scale up to 10 store-gateways per zone, and zone B is configured to follow zone A, and zone C follows zone B.
Let's say the tenant is sharded to new store-gateway a-7
in zone A, and all other store-gateways in zone A are ready.
Zone B will then scale up to 9 replicas. Let's say the tenant is sharded to new store-gateway b-3
in zone B.
After some time, all store-gateways except a-7
and b-3
are ready. Zone C will then scale up to 8 replicas. Let's say the tenant is sharded to new store-gateway c-5
in zone C.
At this point, all three store-gateways for the tenant aren't ready, so queries for this tenant will fail.
Second: this will cause a lot of churn in zones B and C. For example, as more and more replicas in zone A come ready, more replicas will be added to zone B. However, each time the number of replicas change, this causes reshuffling of tenants and blocks between store-gateways. So if we gradually scale up zone B as new store-gateways in zone A become ready, zone B will experience quite a bit of churn, and the same will happen in zone C as zone B's new store-gateways become ready.
I think the solution to this is to only scale up the follower zones once all desired replicas of the leader zone are up and ready - this ensures we don't end up in a state with all store-gateways for a tenant in a non-ready state, and minimises the amount of churn in follower zones.
Good catch, thank you. I'll make it so scale-up only occurs once all pods are ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.