Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple service to comment / in fstab #3372

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

champtar
Copy link

@champtar champtar commented Jan 18, 2025

ostree modes have conflictings / mount needs:

  • "hardlinks" mode need / to be rw
  • composefs mode need / to be ro

Some installation methods (at least Anaconda) add / to fstab,
so when systemd-remount-fs.service tries to remount
the composefs / rw, it fails because it can only be ro.

To be able to edit /etc this early during the boot it rely on
having the 'rw' kargs.

bootc has a systemd generator to edit fstab but not everyone uses bootc (yet),
and it adds 'ro' instead of commenting the whole line,
which breaks disabling composefs (downgrade).

We only comment when mount option is 'defaults'.

Fixes #3193

Notes:
This was tested with EL 9.5
The idea is to have a common fix for this step towards composefs, maybe disabled by default in the packages but ready to use by image maintainers if they know its safe

Copy link

openshift-ci bot commented Jan 18, 2025

Hi @champtar. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@champtar
Copy link
Author

Seeing https://bugzilla.redhat.com/show_bug.cgi?id=2332319#c1, I will change the regex a bit to match ' defaults '

@champtar champtar marked this pull request as draft January 18, 2025 03:42
@cgwalters
Copy link
Member

Only tangential to this PR: 👋 @champtar thanks for all of your recent comments and work, it'd very much appreciated! I would like to help support you for sure as well. One thing on the back of my mind is to try to recreate an "ostree/bootc community dev meeting" and you'd be one of the people I'd like to invite and make sure we can work through design and goals etc.

Moving on to the actual issue at hand here:

bootc has a systemd generator to edit fstab but not everyone uses bootc (yet).

Yes, but that systemd generator runs even if bootc is not used - so it should just work to add bootc to your image, right? Any reason that would be a problem for you?

I'm not opposed inherently to carrying a reimplementation of that here, but I'm certainly not excited by it, especially in shell script. The bootc version has unit tests, etc.

ostree modes have conflictings / mount needs:
- "hardlinks" mode need / to be rw
- composefs mode need / to be ro

Some installation methods (at least Anaconda) add / to fstab,
so when systemd-remount-fs.service tries to remount
the composefs / rw, it fails because it can only be ro.

To be able to edit /etc this early during the boot it rely on
having the 'rw' kargs.

bootc has a systemd generator to edit fstab but not everyone uses bootc (yet),
and it adds 'ro' instead of commenting the whole line,
which breaks disabling composefs (downgrade).

We only comment when mount option is 'defaults'.
@champtar champtar marked this pull request as ready for review January 20, 2025 02:41
@champtar
Copy link
Author

Only tangential to this PR: 👋 @champtar thanks for all of your recent comments and work, it'd very much appreciated!

My pleasure ! I've been using rpm-ostree for 3 years and it has been a pretty smooth ride,
even EL 8 -> EL 9 was invisible to our users.

I would like to help support you for sure as well.
One thing on the back of my mind is to try to recreate an "ostree/bootc community dev meeting" and you'd be one of the people I'd like to invite and make sure we can work through design and goals etc.

I could definitely join such meetings.

Right now we have 2 appliances products based on rpm-ostree and we want to switch more.
Switching to bootc would help us having one team responsible for the base OS and the products team just put their stuff on top with a simple Containerfile / without knowledge of rpm-ostree/bootc.

What we do that is a bit different than other rpm-ostree users:

  • offline: updates are pushed onto the servers (SSH or WebUI), the update package is the install ISO (simpler), it's common to have no IT infrastructure and no outgoing internet connection
  • upgrade: we try not to put any restrictions (no version path, manual actions, ...), doing big versions jump happens often
  • downgrade: (not rollback) is also allowed without version restriction, you might need to wipe the product config, but no need to reinstall the base OS, pretty important if you don't have OOB (or crappy OOB)

Moving on to the actual issue at hand here:

bootc has a systemd generator to edit fstab but not everyone uses bootc (yet).

Yes, but that systemd generator runs even if bootc is not used - so it should just work to add bootc to your image, right? Any reason that would be a problem for you?

Right now it adds 'ro' instead of commenting the line, breaking downgrades.

I'm not opposed inherently to carrying a reimplementation of that here, but I'm certainly not excited by it, especially in shell script. The bootc version has unit tests, etc.

Not excited either, it's cleaner to write in rust and have unit tests, but bootc is really not the right place for this IMO.

If we stick to commenting / with 'defaults', this simple call to sed is easy enough to review.

Side note using a generator for a single unit seems weird to me, and make it harder to inspect.

@cgwalters
Copy link
Member

cgwalters commented Jan 20, 2025

Not excited either, it's cleaner to write in rust and have unit tests, but bootc is really not the right place for this IMO.

But you didn't really answer the question: anything blocking you from just adding bootc to your images?

As far as it being the right place, I'd agree it's not: but I don't really think ostree is a lot more "right" either.

Also, there is the issue at the moment that this logically overlaps with the bootc one and I'd like to not support both.

EDIT: To be clear especially thinking about problems like "what if they run concurrently" etc

@cgwalters
Copy link
Member

Right now it adds 'ro' instead of commenting the line, breaking downgrades.

Sorry you did comment why not the bootc one here. Okay, but just commenting it out means anything that wasn't using rootflags= kargs and has non-default options in /etc/fstab for / stops having those honored.

I'm not quite comfortable in just doing that by default either, although it's probably a pretty small set.

@cgwalters
Copy link
Member

Right now it adds 'ro' instead of commenting the line, breaking downgrades.

I think we could change the bootc one to do the inverse change if we detect the situation where / is not composefs and we did find the fstab edit line

@champtar
Copy link
Author

Not excited either, it's cleaner to write in rust and have unit tests, but bootc is really not the right place for this IMO.

But you didn't really answer the question: anything blocking you from just adding bootc to your images?

I haven't played enough with bootc for now, if bootc-fstab-edit switch to commenting the whole line it would be ok I think,
but that would be 35MB for just bootc-fstab-edit (7.05 MB + skopeo 29.18 MB)
Also we use local layering with rpm-ostree right now, so I might need to wait for https://gitlab.com/fedora/bootc/tracker/-/issues/4 to really switch to bootc

As far as it being the right place, I'd agree it's not: but I don't really think ostree is a lot more "right" either.

It affects all ostree users (depending on installer), and having the migration script with the project that needs it make sense to me

Also, there is the issue at the moment that this logically overlaps with the bootc one and I'd like to not support both.
EDIT: To be clear especially thinking about problems like "what if they run concurrently" etc

I've already put a Before=bootc-fstab-edit.service, we could turn it into an After if prefered

@cgwalters
Copy link
Member

but that would be 35MB for just bootc-fstab-edit (7.05 MB + skopeo 29.18 MB)

Do you already ship podman?

@champtar
Copy link
Author

Right now it adds 'ro' instead of commenting the line, breaking downgrades.

I think we could change the bootc one to do the inverse change if we detect the situation where / is not composefs and we did find the fstab edit line

A bit over engineered IMO, and doesn't work if you downgrade to a version without this new bootc fix (or a version without bootc)
I would comment the whole line if options == 'defaults' (what is now done here), else print a warning sending the user to some documentation to manually fix it.
Or if we are confident fstab is correct, move the non default options to rootflags= automatically and comment the line in fstab

@champtar
Copy link
Author

but that would be 35MB for just bootc-fstab-edit (7.05 MB + skopeo 29.18 MB)

Do you already ship podman?

No, k8s / containerd for the container tools for now

@cgwalters
Copy link
Member

One thing I don't like about the unit here is that the way sed works, it applies changes line by line, and writes a new version of the file, even if nothing changed. The bootc generator at least is careful to no-op if nothing needs changing.

The more I think about the more I'd like try to followup on a systemd side fix (ref #3193 (comment) ).

Also an overall fundamental issue with editing /etc/fstab like this in boot (that applies to the bootc one too) is that by the time we do it, systemd-fstab-generator has already run. It mostly works because systemd-remount-fs parses it again, but I could imagine that changing.

So...if we were to try an ostree side generic fix like this, I think I'd prefer adding this as part of deployment finalization, similarly to handle we handle selinux policy recompilation. Perhaps in general, we could try to split up ostree-finalize-staged.service into two phases:

  • Create the new deployment dir with the new /etc (not running one)
  • Run arbitrary units on the new deployment root which can edit that /etc
  • Perform bootloader swap, etc.

Actually something that has come up in the past too is supporting running code from the new deployment (ref containers/bootc#640 ) - if we did that it'd be very powerful generic tool as it eases transitions like this - you don't need to have intermediate "fix up state so we can go to new OS version" releases.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking requested changes per review

@cgwalters
Copy link
Member

Side note, I will probably change bootc to flock when writing /etc/fstab and this service probably should too (if we do end up choosing to ship it here, or you choose to ship it in your OS builds). Unit ordering is good of course but I think locking here would add a lot more reliability.

@champtar
Copy link
Author

champtar commented Feb 1, 2025

One thing I don't like about the unit here is that the way sed works, it applies changes line by line, and writes a new version of the file, even if nothing changed. The bootc generator at least is careful to no-op if nothing needs changing.

TIL ! I though sed was a bit smarter, but just looked at strace and you are 100% right it always replace the file even when there is no changes, could be fixed by using ExecCondition=grep ...

The more I think about the more I'd like try to followup on a systemd side fix (ref #3193 (comment) ).

Is systemd-remount-fs actually needed when we have rw kargs ? maybe instead of fixing /etc/fstab we can just skip systemd-remount-fs with something like ConditionKernelCommandLine=!rw (sent an email to systemd-devel with you in CC)

Also an overall fundamental issue with editing /etc/fstab like this in boot (that applies to the bootc one too) is that by the time we do it, systemd-fstab-generator has already run. It mostly works because systemd-remount-fs parses it again, but I could imagine that changing.

Hopefully by that time this migration is not needed anymore :), and the advantage is that it doesn't need an intermediate version for the upgrade.

So...if we were to try an ostree side generic fix like this, I think I'd prefer adding this as part of deployment finalization, similarly to handle we handle selinux policy recompilation. Perhaps in general, we could try to split up ostree-finalize-staged.service into two phases:
* Create the new deployment dir with the new /etc (not running one)
* Run arbitrary units on the new deployment root which can edit that /etc
* Perform bootloader swap, etc.

Actually something that has come up in the past too is supporting running code from the new deployment (ref containers/bootc#640 ) - if we did that it'd be very powerful generic tool as it eases transitions like this - you don't need to have intermediate "fix up state so we can go to new OS version" releases.

This is a good long term plan, and will definitely be good for the next migration, but for this composefs migration it gets us back to needing an intermediate version.

@champtar
Copy link
Author

champtar commented Feb 1, 2025

Side note, I will probably change bootc to flock when writing /etc/fstab and this service probably should too (if we do end up choosing to ship it here, or you choose to ship it in your OS builds). Unit ordering is good of course but I think locking here would add a lot more reliability.

When you do rework the bootc side, my whishlist:

  • comment the whole / line if the option part is defaults, that would make it usable for the downgrade case
  • make it a simple unit without generator, using a generator here just hides the fact that we have something always running ready to edit /etc/fstab, just run the unit and exit if there is nothing to do. It would also make the unit name more official so we can order against or maybe conflicts with.

@champtar
Copy link
Author

champtar commented Feb 4, 2025

Is systemd-remount-fs actually needed when we have rw kargs ? maybe instead of fixing /etc/fstab we can just skip systemd-remount-fs with something like ConditionKernelCommandLine=!rw (sent an email to systemd-devel with you in CC)

https://lists.freedesktop.org/archives/systemd-devel/2025-February/051163.html
systemd-remount-fs is needed for stuff like usrquota, so we can't just skip it for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd-remount-fs.service fail with composefs enabled
2 participants