Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

talosctl: installing rook on docker provisioned cluster corrupts host's LUKS partition #5519

Open
sauterp opened this issue May 9, 2022 · 7 comments

Comments

@sauterp
Copy link
Contributor

sauterp commented May 9, 2022

Bug Report

Description

  1. I provisioned a talos cluster with docker on Fedora 35: talosctl cluster create --wait --extra-disks 1 --workers 3
  2. I followed this guide and installed Rook.

After I rebooted my machine it didn't boot anymore. All my partitions were intact except the LUKS partition, which was reformatted as a cephBluestore.
I didn't reproduce the issue since it would require going through the whole setup of my machine again. It's possible that I did something else that caused the problem.

  • talosctl version
Client:
	Tag:         v1.0.1
	SHA:         65d872ed
	Built:
	Go version:  go1.17.8
	OS/Arch:     linux/amd64
  • Platform: Fedora 35
@smira
Copy link
Member

smira commented May 12, 2022

The root cause is that talosctl cluster create does the equivalent of docker run --privileged, and that exposes host block devices to the container, which in turn exposes them to pods running on Kubernetes in Talos inside the container. So Rook can detect and mistakenly try to use a host block device.

This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of --privileged via other options in the Docker API which would allow us to disable device passthrough.

@sanmai-NL
Copy link
Contributor

sanmai-NL commented Apr 16, 2024

@smira Why wrap the Docker CLI tool in the first place? It's opaque and pretty dangerous, as it turns out here. Someone used to running Linux containers shouldn't be discouraged by having to issue a lengthy command line. In fact, they could use the compose (Docker Engine, Podman, nerdctl) and/or kube play (Podman) subcommands and you could define a sample spec in YAML in the docs, to keep it brief.

@sanmai-NL
Copy link
Contributor

@smira

This feels like a bug to me, and we should fix it. The problem is that I don't see equivalent of --privileged via other options in the Docker API which would allow us to disable device passthrough.

Do I understand correctly that you want all of --privileged, but disable having host /dev/ or more specifically the block devices mounted inside the container? Would https://docs.docker.com/reference/cli/docker/container/run/#device-cgroup-rule help you restrict that?

But see also: #4385 (comment)

@rothgar
Copy link
Member

rothgar commented Oct 2, 2024

Maybe we should disable --extra-disks with docker as the provisioner. It's dangerous and if someone wants to add extra disks they probably want a stronger isolation boundary with their host and we should force them to use qemu or similar VM isolation.

@sanmai-NL
Copy link
Contributor

Your latter assertion does not apply in our case. Due to restrictions we cannot run Talos Linux on bare metal or as VM. But we still need raw block device access.

@rothgar
Copy link
Member

rothgar commented Oct 3, 2024

Running talos inside Docker isn't recommended for usage beyond testing and learning. It's similar to kind which you can run locally but it's not a full Kubernetes experience. Talos in Docker (or any container) will have a variety of limitations for things like networking and extensions.

@sanmai-NL
Copy link
Contributor

I know. Yet the alternative is not using Talos Linux at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants