Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add custom topologySpreadConstraints support to coredns #16983

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions k8s/crds/kops.k8s.io_clusters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3742,6 +3742,183 @@ spec:
type: string
type: object
type: array
topologySpreadConstraints:
description: TopologySpreadConstraints describes how a group of
pods ought to spread across topology domains.
items:
description: TopologySpreadConstraint specifies how to spread
matching pods among the given topology.
properties:
labelSelector:
description: |-
LabelSelector is used to find matching pods.
Pods that match this label selector are counted to determine the number of pods
in their corresponding topology domain.
properties:
matchExpressions:
description: matchExpressions is a list of label selector
requirements. The requirements are ANDed.
items:
description: |-
A label selector requirement is a selector that contains values, a key, and an operator that
relates the key and values.
properties:
key:
description: key is the label key that the selector
applies to.
type: string
operator:
description: |-
operator represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists and DoesNotExist.
type: string
values:
description: |-
values is an array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. This array is replaced during a strategic
merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
matchLabels:
additionalProperties:
type: string
description: |-
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
map is equivalent to an element of matchExpressions, whose key field is "key", the
operator is "In", and the values array contains only "value". The requirements are ANDed.
type: object
type: object
x-kubernetes-map-type: atomic
matchLabelKeys:
description: |-
MatchLabelKeys is a set of pod label keys to select the pods over which
spreading will be calculated. The keys are used to lookup values from the
incoming pod labels, those key-value labels are ANDed with labelSelector
to select the group of existing pods over which spreading will be calculated
for the incoming pod. The same key is forbidden to exist in both MatchLabelKeys and LabelSelector.
MatchLabelKeys cannot be set when LabelSelector isn't set.
Keys that don't exist in the incoming pod labels will
be ignored. A null or empty list means only match against labelSelector.

This is a beta field and requires the MatchLabelKeysInPodTopologySpread feature gate to be enabled (enabled by default).
items:
type: string
type: array
x-kubernetes-list-type: atomic
maxSkew:
description: |-
MaxSkew describes the degree to which pods may be unevenly distributed.
When `whenUnsatisfiable=DoNotSchedule`, it is the maximum permitted difference
between the number of matching pods in the target topology and the global minimum.
The global minimum is the minimum number of matching pods in an eligible domain
or zero if the number of eligible domains is less than MinDomains.
For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same
labelSelector spread as 2/2/1:
In this case, the global minimum is 1.
| zone1 | zone2 | zone3 |
| P P | P P | P |
- if MaxSkew is 1, incoming pod can only be scheduled to zone3 to become 2/2/2;
scheduling it onto zone1(zone2) would make the ActualSkew(3-1) on zone1(zone2)
violate MaxSkew(1).
- if MaxSkew is 2, incoming pod can be scheduled onto any zone.
When `whenUnsatisfiable=ScheduleAnyway`, it is used to give higher precedence
to topologies that satisfy it.
It's a required field. Default value is 1 and 0 is not allowed.
format: int32
type: integer
minDomains:
description: |-
MinDomains indicates a minimum number of eligible domains.
When the number of eligible domains with matching topology keys is less than minDomains,
Pod Topology Spread treats "global minimum" as 0, and then the calculation of Skew is performed.
And when the number of eligible domains with matching topology keys equals or greater than minDomains,
this value has no effect on scheduling.
As a result, when the number of eligible domains is less than minDomains,
scheduler won't schedule more than maxSkew Pods to those domains.
If value is nil, the constraint behaves as if MinDomains is equal to 1.
Valid values are integers greater than 0.
When value is not nil, WhenUnsatisfiable must be DoNotSchedule.

For example, in a 3-zone cluster, MaxSkew is set to 2, MinDomains is set to 5 and pods with the same
labelSelector spread as 2/2/2:
| zone1 | zone2 | zone3 |
| P P | P P | P P |
The number of domains is less than 5(MinDomains), so "global minimum" is treated as 0.
In this situation, new pod with the same labelSelector cannot be scheduled,
because computed skew will be 3(3 - 0) if new Pod is scheduled to any of the three zones,
it will violate MaxSkew.
format: int32
type: integer
nodeAffinityPolicy:
description: |-
NodeAffinityPolicy indicates how we will treat Pod's nodeAffinity/nodeSelector
when calculating pod topology spread skew. Options are:
- Honor: only nodes matching nodeAffinity/nodeSelector are included in the calculations.
- Ignore: nodeAffinity/nodeSelector are ignored. All nodes are included in the calculations.

If this value is nil, the behavior is equivalent to the Honor policy.
This is a beta-level feature default enabled by the NodeInclusionPolicyInPodTopologySpread feature flag.
type: string
nodeTaintsPolicy:
description: |-
NodeTaintsPolicy indicates how we will treat node taints when calculating
pod topology spread skew. Options are:
- Honor: nodes without taints, along with tainted nodes for which the incoming pod
has a toleration, are included.
- Ignore: node taints are ignored. All nodes are included.

If this value is nil, the behavior is equivalent to the Ignore policy.
This is a beta-level feature default enabled by the NodeInclusionPolicyInPodTopologySpread feature flag.
type: string
topologyKey:
description: |-
TopologyKey is the key of node labels. Nodes that have a label with this key
and identical values are considered to be in the same topology.
We consider each <key, value> as a "bucket", and try to put balanced number
of pods into each bucket.
We define a domain as a particular instance of a topology.
Also, we define an eligible domain as a domain whose nodes meet the requirements of
nodeAffinityPolicy and nodeTaintsPolicy.
e.g. If TopologyKey is "kubernetes.io/hostname", each Node is a domain of that topology.
And, if TopologyKey is "topology.kubernetes.io/zone", each zone is a domain of that topology.
It's a required field.
type: string
whenUnsatisfiable:
description: |-
WhenUnsatisfiable indicates how to deal with a pod if it doesn't satisfy
the spread constraint.
- DoNotSchedule (default) tells the scheduler not to schedule it.
- ScheduleAnyway tells the scheduler to schedule the pod in any location,
but giving higher precedence to topologies that would help reduce the
skew.
A constraint is considered "Unsatisfiable" for an incoming pod
if and only if every possible node assignment for that pod would violate
"MaxSkew" on some topology.
For example, in a 3-zone cluster, MaxSkew is set to 1, and pods with the same
labelSelector spread as 3/1/1:
| zone1 | zone2 | zone3 |
| P P P | P | P |
If WhenUnsatisfiable is set to DoNotSchedule, incoming pod can only be scheduled
to zone2(zone3) to become 3/2/1(3/1/2) as ActualSkew(2-1) on zone2(zone3) satisfies
MaxSkew(1). In other words, the cluster can still be imbalanced, but scheduler
won't make it *more* imbalanced.
It's a required field.
type: string
required:
- maxSkew
- topologyKey
- whenUnsatisfiable
type: object
type: array
upstreamNameservers:
description: UpstreamNameservers sets the upstream nameservers
for queries not on the cluster domain
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/kops/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,8 @@ type KubeDNSConfig struct {
MemoryLimit *resource.Quantity `json:"memoryLimit,omitempty"`
// NodeLocalDNS specifies the configuration for the node-local-dns addon
NodeLocalDNS *NodeLocalDNSConfig `json:"nodeLocalDNS,omitempty"`
// TopologySpreadConstraints describes how a group of pods ought to spread across topology domains.
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}

// NodeLocalDNSConfig are options of the node-local-dns
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/kops/v1alpha2/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,8 @@ type KubeDNSConfig struct {
MemoryLimit *resource.Quantity `json:"memoryLimit,omitempty"`
// NodeLocalDNS specifies the configuration for the node-local-dns addon
NodeLocalDNS *NodeLocalDNSConfig `json:"nodeLocalDNS,omitempty"`
// TopologySpreadConstraints describes how a group of pods ought to spread across topology domains.
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}

// NodeLocalDNSConfig are options of the node-local-dns
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/kops/v1alpha2/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions pkg/apis/kops/v1alpha2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions pkg/apis/kops/v1alpha3/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -564,6 +564,8 @@ type KubeDNSConfig struct {
MemoryLimit *resource.Quantity `json:"memoryLimit,omitempty"`
// NodeLocalDNS specifies the configuration for the node-local-dns addon
NodeLocalDNS *NodeLocalDNSConfig `json:"nodeLocalDNS,omitempty"`
// TopologySpreadConstraints describes how a group of pods ought to spread across topology domains.
TopologySpreadConstraints []corev1.TopologySpreadConstraint `json:"topologySpreadConstraints,omitempty"`
}

// NodeLocalDNSConfig are options of the node-local-dns
Expand Down
2 changes: 2 additions & 0 deletions pkg/apis/kops/v1alpha3/zz_generated.conversion.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions pkg/apis/kops/v1alpha3/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions pkg/apis/kops/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,9 @@ spec:
{{ ToYAML .KubeDNS.Affinity | indent 8 }}
{{- end }}
topologySpreadConstraints:
{{- if .KubeDNS.TopologySpreadConstraints }}
{{ ToYAML .KubeDNS.TopologySpreadConstraints | indent 6 }}
{{- else }}
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: ScheduleAnyway
Expand All @@ -158,6 +161,7 @@ spec:
labelSelector:
matchLabels:
k8s-app: kube-dns
{{- end }}
containers:
- name: coredns
image: {{ if KubeDNS.CoreDNSImage }}{{ KubeDNS.CoreDNSImage }}{{ else }}registry.k8s.io/coredns/coredns:v1.11.3{{ end }}
Expand Down
Loading