-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenZFS cannot write/create/delete on parent dataset #14011
Comments
I would note here, that I always heard how is ZFS enterprise grade, but so far it seems extremely unreliable and without any ability to debug the issue.
After scrub:
|
The issue may be related to my earlier attempt to move the encrypted child data store to a new pool. This was the target pool. But no error really happened during the process. |
How can I debug the inability to create/modify files on zfs? |
(no output)
|
|
No change. |
|
@exander77 Can you give us more details on how you have moved the dataset(s) between pools? Could you check your shell history for the exact commands, in so we may odd out edge cases that don't apply here, and find out what has happened to your data? Are you able to set a different mountpoint (as an absolute path) for I am also not perfectly sure on how the TrueNAS ZFS implementation works, or if there are any security mechanisms at work here.
|
I did it from TrueNAS interface (
|
The I can also unmount I can then create files in that directory, so the dataset is writable, just not directly in the
|
It is expected that the directory created for mounting a dataset remains, even when the dataset is unmounted. The issue then seems to be with your zfs get all secure2
zfs get all secure2/charts
zfs get all secure2/nextcloud2 I would suggest to also consult the TrueNAS community and to find people who are knowledgeable about their ZFS implementation. |
|
I am seeing that both the
which is not present on the |
I have no idea what that property is, should I add it to root dataset? I have another dataset called
|
I would assume it is some kind of TrueNAS metadata and should not affect the zfs behavior? You can also compare output of |
It is a user property (indicated by the |
Diff of
I don't really see anything suspicious. |
My main question is what can prevent writing into dataset (mounted rw, zfs readonly=off) and how to get rid of it. There is no error anywhere whatsoever. |
Yes, I am not perfectly sure. It smells a little if this was connected to:
Can you try to set different mountpoints for The TrueNAS Storage community should be a good place to ask for ZFS related questions with TrueNAS: Reading their documentation also tells me that v10 is not supported anymore since many years. Is your TrueNAS system version up to date, e.g. on version 13.0-U2? (https://www.truenas.com/software-status/) The custom property suggests otherwise, but it could also be a remnant from the time of dataset creation. You could also try to reach out on the |
I can mount and unmount those datasets without issues, but even if they are unmounted I cannot write into the I have latest TrueNAS (currently even nightly of Bluefin as I tried to see if nightly fixes this). |
Where you got v10? It's 22.12 (Scale, not Core). |
Zfs version is in second post:
|
I got that from this property: At this point I would like to ask others from the community to step up and bring in new ideas, as I'm unable to dig deeper into this. |
That's an IP address. |
Btw, I could back up data and recreate the pool, but this needs to be investigated. This is a small drive. But what if something like that happens on 20TB pool? |
It is generally good to consider the root dataset of a pool readonly, if even mounted, and have empty encryptionroots (see #12000 (comment)) with subsequent userdata in separate datasets below those. When recreating the pool, please be aware of these issues, too:
|
@almereyda It was in reality used read only. There were only directories for mounting the child datasets. But I can't create new datasets if the root is really read only. Actually I can, but I can't mount them in there, I can only mount them elsewhere, as I can't create the mount folders. |
This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system. |
please close accordingly. |
How is this not related to openzfs when I cannot write to zfs? If openzfs has some functionality to make dataset read only, I should be able to list it and view it and remove it, shouldn't I? |
Because iX-Systems locks things down for you. It's an appliance not an OS. You need to file this as an issue with iX Systems (on JIRA, NOT github). They manage this for you, not the OpenZFS project. |
I am not sure if I follow this argument. First: I haven't touched ix-applications directly, I used their GUI. Secondly, as far as I can tell, it ran standard zfs commands through zfs command line utilities. Thirdly, if I take the pool drives from TrueNAS system and put them into clean BSD or Linux environment, would I be able to write to that pool's root dataset? I doubt that. If their zfs drivers are not modified in any way and zfs pool was managed through standard zfs command line infrastructure, then it is a zfs issue. It may be caused by weird combination of encryption and dataset moving TrueNAS did, but it was still caused and the issue is within zfs itself. If I create a new pool in a separate system and extract commands TrueNAS is using and run them there, would it not break that pool the same way? |
I think I need to explain how this works in opensource a bit: In Opensource you've somethings called "downstreams" and "upsteams" When you encounter issues, you need to go to the lowest downstream, report it their and either: they fix it, they ask the upstream what's wrong or they forward you to ask the upstream instead. In some cases your lowest downstream is completely unresponsive, in that case you should ask the upstream anyway. The reason things are done that way is customisation, you don't precisely now all customisations downstreams do to the code/usecase. So you're never going to be able to 100% give the upstream all the info they need to fix it. It's also distracting for downstream maintainers when they get flooded with issues that are, potentially, caused by the downstream. Most cases the lowest downstream is your vendor or packager. In your case iX-Systems is your vendor, you need to report it. Wait for their response on how it can be fixed and/or wait for a bugfix. Give it a month or so, before going "up the ladder" to the next upstream. It's of no use to ask this project "how is iX-Systems locking this down", as they basically have nothing to do with that. So, i'm not saying your bug doesn't exist. It looks like it does. |
I have zfs send the dataset and imported as a child and the issue persists. I have a 54kB dump of the affected dataset and I can share it including encryption key for debug purposes. |
|
To comment on @Ornias1993's comment above:
This should probably be dubbed upstream for clarity purposes. Thanks for making the effort to explain the process. To comment on @exander77 trying to debug their situation:
Can you replicate the issue on another machine, when raw sending the datasets to another pool on a non-TrueNAS device, keeping the encryptionroot, IV and salt intact? Here we don't know how TrueNAS locks down their system, and are also not supposed to know. We need additional information from the vendor (iX-Systems) or power users within their community to be able to answer this in all detail. I would suggest to convert this issue into a discussion, since this does not appear to be a technical deficiency with OpenZFS itself. And thanks for moving the conversation over to |
@almereyda https://nextcloud.t4d.cz/s/Z9BfEGrMXYKQ6zp I have dumped the dataset, and it's encryption key. The issue persists after importing it. There are only directories and one file, so it is just 50 kB. |
So, if anybody can Behavior:
|
I just remembered a case where I saw something similar once. If there are extended ACLs in a format OpenZFS on Linux doesn't support (e.g. if acltype=nfsv4 were used on FreeBSD), it can behave strangely in a way similar to this, where it looks like your operations should succeed, but then they fail in strange ways. So I would suspect if you go peering around on it, on FreeBSD, you'll spot some less invisible ACLs present, though I don't have my FBSD VMs available to check. |
How would they got there? The drive was not in any BSD system and I try to list postixals etc. |
I can't speak to how they got there, I was just remarking on a time that I saw something similar, and what caused it. I think you could probably coax zdb into telling you if that's the case. That link to what I assume was a send stream or a pool image also just refuses connection for me. |
@rincebrain Somebody already asked if it's not ZFS made on BSD FreeNAS, but no. I already played with zdb to today, I check if I can't get acls from that dataset and some other and compare them. |
As you solves the issue and it wasnt ZFS related in the end. Please close. |
@Ornias1993 I don't see a resolution on here, other than you saying "close this" over and over again. And as far as my earlier suspicion, having received it on FBSD... So yes, it does have invisible ACL bits compared to the baseline dataset. Tada. We should probably make that more obvious on Linux, if that's going to break... |
In stead of being passive agressive, you could also assume it was not solved here... Considering the user basically posted an issue everywhere under the sun ;-) Also: I asked for this closed precisely two times. So far your claimed "over and over again". |
@Ornias1993 Not everywhere. |
I don't think I was being passive aggressive, I think I was just directly telling you it wasn't resolved and stop posting "close this" over and over in a bug. If that wasn't clear enough - "don't do that." Interestingly, the fact that that works seems like a bug, since Linux doesn't think +i is set if you ask on the dataset linked above, at least. |
If we consider this a bug, then we need a resolution path. This was presented with a screenshot from an undisclosed location. From what I am seeing this is about the immutable bit being present, but not visible to the VFS layer. This is not due to cross-platform contingencies between Linux and BSD, as the latter was not involved. To get the reproducer working on Linux, I am also blocked by an internal server error at https://nextcloud.t4d.cz/s/Z9BfEGrMXYKQ6zp In my local tests with a random file on a random dataset, I could create the following output:
May someone in possession of the Sorry I didn't think of the Yet still, I would not consider this an OpenZFS bug. It's rather the VFS which is not very verbose about why the file system state changes are prohibited. This is Linux yelling at us, and not the file system underneath. Therefore I am of the same opinion that this bug can be closed in OpenZFS, or moved to a (rather interesting!) discussion (about side effects of opaque third-party developer choices in handling the file system). In case of a situation where no actual evidence is present that the error in question is caused by the suspected system, as it is here, I would favour if we could go with the flow and keep each other informed about results of the investigation in other places, to be able to voluntarily choose to move the discussion there when deemed appropriate. Also it would have been nice if the TrueNAS developers would have left us a hint somewhere about what is going on, but that cannot be expected. As for the tone of the conversation, please bear in mind that it is actual real people on the other end of the screen, and that we need to self-moderate ourselves to keep this community friendly and alive. |
TrueCharts discord. |
I would still be happy to keep this open as an OpenZFS bug, since even if it's fallout from the Linux VFS layer, we still need to care, since we're not going to convince them to change things there, but that's just my $0.02. Linux does show, with lsattr, +i set, which surprises me, as I would have expected ls -l to have shown a + when that's true, as it does on FreeBSD 13 (where that screenshot is from), not just when facls exist. But other filesystems react strangely in the same way, I see, if you set chattr +i on a dir and then try writing to it - the create doesn't appear to fail, but still doesn't happen. Wild. I love complex systems. |
Please don't strawman me. As I said, I just said it twice. I single needless repetition, is not saying something "over and over". On topic: |
This is not a ZFS issue. TrueNAS middleware manipulates the immutable flag on the mountpoints of locked encrypted datasets to prevent accidentally writing in them and thinking they were mounted. It's possible this was handled incorrectly after the replication you performed. If you have not already then please report your issue here: https://ixsystems.atlassian.net/jira/software/c/projects/NAS/issues I'm closing this as it doesn't seem there's anything OpenZFS needs to fix here at this time. Feel free to reopen the issue if you disagree. P.S. you might want to |
System information
It works for child datastores without issues.
FS is mounted rw:
No hw error, scrub without issues. I use encryption and had some issues, see: #12000 (comment)
I am not sure what is the cause and what is the effect, I solved encryption issue, but still have some troubles. I cannot seem to mount any child datastore, most likey due to inability to create directory for it to be mounted.
The text was updated successfully, but these errors were encountered: