Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker node role can't be set #6750

Open
vhurtevent opened this issue Jan 17, 2023 · 11 comments
Open

Worker node role can't be set #6750

vhurtevent opened this issue Jan 17, 2023 · 11 comments

Comments

@vhurtevent
Copy link

Bug Report

When creating a cluster, I want that the worker nodes have explicit role as displayed in a

kubectl describe node
command

I tried to set worker role by setting node labels in the machine config spec :

machine:
  nodeLabels:
    node-role.kubernetes.io/worker: "true"

When asking for NodeLabel with talosctl, the label exists :

But the label aren't set on nodes and their role is still <none>.

Logs

In logs, we can see this error :

[ 83.519643] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeLabelsApplyController", "error": "1 error(s) occurred:\n\tnodes \"dbaas1-worker-0\" is forbidden: is not allowed to modify labels: node-role.kubernetes.io/worker"}
Looks like a protected domain label, but how can we set role through Talos node provisionning ?

Environment

  • Talos version: 1.3.2
  • Kubernetes version: 1.24.9
  • Platform: OpenStack
@andrewrynhard
Copy link
Member

This label is not allowed to be set by the kubelet. Similarly it is unsafe for Talos to do the same. Allowing for this allows a worker node to promote itself amd potentially gain access to privileges it shouldn't have.

@vhurtevent
Copy link
Author

Hello @andrewrynhard,

Thank you for your answer, I understand the security problem.

In my use case I would like to distinguish worker nodes which are only workload executors and edge nodes which I dedicate to Ingress controllers executors and are the only backends members of my L4 loadbalancers.

Do you suggest me to drop the use of node-role.kubernetes.io/<any role> and to use a complete custom domain label and value which could be set by Talos through machine.nodeLabels specs ?

Thank you

@smira
Copy link
Member

smira commented Jan 18, 2023

You can set this label outside of Talos, as the last provisioning step, or make the node label itself as something like "my.dev/role", and have something with appropriate permissions to add a matching node-role label. But a worker node by Kubernetes design can't put a role label on itself. So there should be something else running, in the cluster, or outside of the cluster which does that.

@sergelogvinov
Copy link
Contributor

Can we add the node-label validation for it?

as I know this labels can be set by kubelet

node-role.kubernetes.io
kubernetes.io/role

@nogweii
Copy link

nogweii commented Jun 3, 2024

Adding validation to catch this configuration error would be very much appreciated, as I didn't realize this.

Adding special handling would also be very nice, but I think that would have to be some special handling of talosctl parsing a machine's configuration, rather than Talos itself doing that.

@sergelogvinov
Copy link
Contributor

@nogweii
Copy link

nogweii commented Jun 3, 2024

Interesting! @sergelogvinov , not to go too off-topic, does talos-ccm work in a bare-metal cluster, running in a homelab? (I'm running a Talos cluster on a Turing Pi 2 with RK1 compute modules.)

@sergelogvinov
Copy link
Contributor

Talos CCM works inside talos cluster ) It does not matter whether Talos is in a cloud or on bare metal.

@mydoomfr
Copy link

I'm unable to set any nodeLabels on the bootstrap of worker nodes

I'm using Talm to set up the worker node, but I don't think it is an issue on Talm's side because I can see the nodeLabels values in the machineConfiguration through talosctl command.

Reproduce the issue

  • Tested on Talos v1.7.4
  • Issue only concerns Worker machine type. It works as expected with Controlplane type.

1. Reset the worker node, then apply the configuration

machine:
  nodeLabels:
    node.cloudprovider.kubernetes.io/platform: proxmox
    topology.kubernetes.io/region: Region-1
    topology.kubernetes.io/zone: pve03
    # truncated
talm apply -f nodes/worker-01.yaml -i

2. Wait for the worker node to join the cluster and describe the node labels

kubectl describe node worker-01
Name:               worker-01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker-01
                    kubernetes.io/os=linux
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 20 Jun 2024 22:52:21 +0200
Taints:             <none>
Unschedulable:      false

3. Ensure nodeLabels is correctly setup in machineConfiguration

talosctl get mc --nodes 192.168.100.21 -e 192.168.100.21 --talosconfig=./talosconfig -oyaml |yq -r '.spec.machine.nodeLabels'
node.cloudprovider.kubernetes.io/platform: proxmox
topology.kubernetes.io/region: Region-1
topology.kubernetes.io/zone: pve03

Workaround: Set the labels via kubectl after the nodes join the cluster

kubectl label node worker-01 node.cloudprovider.kubernetes.io/platform=proxmox
kubectl label node worker-01 topology.kubernetes.io/region=Region-1
kubectl label node worker-01 topology.kubernetes.io/zone=pve03

I can open a new issue if needed.

@smira
Copy link
Member

smira commented Jun 28, 2024

Please see NodeRestriction documentation - this is by default enabled on Kubernetes side, and there's nothing we can do on Talos side to workaround it.

If you use labels which are not restricted, Kubernetes API server would allow them to be set. But in this case Talos Linux has same level of access as the kubelet running on the node.

There might be some better way to do config validation/documentation, but there is no "fix" whatsoever, except for changing the admission controller rules.

@danieljkemp
Copy link

Just throwing out that the docs were still somewhat missing on this. For a worker/storage node I had to dig up the kubelet args and set

machine:
  kubelet:
    extraArgs:
      node-labels: "node.kubernetes.io/instance-type=ceph-storage"
      register-with-taints: "node.kubernetes.io/instance-type=ceph-storage:NoSchedule"

instead of using the intuitive

machine:
  nodeLabels:
  nodeTaints:

Maybe those would work on a controlplane node?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants