Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenZFS cannot write/create/delete on parent dataset #14011

Closed
exander77 opened this issue Oct 10, 2022 · 52 comments
Closed

OpenZFS cannot write/create/delete on parent dataset #14011

exander77 opened this issue Oct 10, 2022 · 52 comments

Comments

@exander77
Copy link

exander77 commented Oct 10, 2022

System information

Type Version/Name
Distribution Name TrueNas Scale
Distribution Version Bluefin
Kernel Version 5.15.62+truenas
Architecture x86_64
OpenZFS Version 2.1.5-1
root@silverrock[/mnt/secure2]# ls -la 
total 35
drwxr-xr-x 5 root root      5 Sep 26 16:01 .
drwxr-xr-x 6 root root      6 Oct 10 06:04 ..
drwxr-xr-x 6 root root      6 Oct 10 19:45 charts
drwxr-xr-x 2 root root      2 Sep 26 16:01 ix-applications
drwxrwx--- 8 root www-data 13 Oct 10 01:57 nextcloud2
root@silverrock[/mnt/secure2]# rm -rf ix-applications 
rm: cannot remove 'ix-applications': Operation not permitted
root@silverrock[/mnt/secure2]# touch test
touch: setting times of 'test': No such file or directory
root@silverrock[/mnt/secure2]# touch ix-applications/test
touch: setting times of 'ix-applications/test': No such file or directory
root@silverrock[/mnt/secure2]# touch charts/test
root@silverrock[/mnt/secure2]# rm charts/test

It works for child datastores without issues.
FS is mounted rw:

secure2/nextcloud2 on /mnt/secure2/nextcloud2 type zfs (rw,noatime,xattr,posixacl,casesensitive)

No hw error, scrub without issues. I use encryption and had some issues, see: #12000 (comment)

I am not sure what is the cause and what is the effect, I solved encryption issue, but still have some troubles. I cannot seem to mount any child datastore, most likey due to inability to create directory for it to be mounted.

@exander77
Copy link
Author

I would note here, that I always heard how is ZFS enterprise grade, but so far it seems extremely unreliable and without any ability to debug the issue.

Linux silverrock 5.15.62+truenas #1 SMP Wed Sep 28 10:32:59 UTC 2022 x86_64 GNU/Linux
zfs-2.1.5-1
zfs-kmod-2.1.5-

After scrub:

root@silverrock[/mnt/secure2]# zpool status secure2
  pool: secure2
 state: ONLINE
  scan: scrub repaired 0B in 00:10:13 with 0 errors on Mon Oct 10 14:09:00 2022
config:

  NAME                                      STATE     READ WRITE CKSUM
  secure2                                   ONLINE       0     0     0
    mirror-0                                ONLINE       0     0     0
      bf9e63ff-9a22-4fcb-ba95-d1a6769d50c0  ONLINE       0     0     0
      367a663c-c9e7-4d6e-9fb9-48ab2ecdfebe  ONLINE       0     0     0

@exander77
Copy link
Author

The issue may be related to my earlier attempt to move the encrypted child data store to a new pool. This was the target pool. But no error really happened during the process.

@exander77
Copy link
Author

How can I debug the inability to create/modify files on zfs?

@exander77
Copy link
Author

root@silverrock[/mnt/secure2]# getfattr -d -m ".*" /mnt/secure2

(no output)

root@silverrock[/mnt/secure2]# getfacl /mnt/secure2
getfacl: Removing leading '/' from absolute path names
# file: mnt/secure2
# owner: root
# group: root
user::rwx
group::r-x
other::r-x

@exander77
Copy link
Author

root@silverrock[/mnt/secure2]# zfs list -o space secure2
NAME     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
secure2   757G   157G        0B    296K             0B       157G

@exander77
Copy link
Author

zfs set readonly=off secure2

No change.

@exander77
Copy link
Author

root@silverrock[/mnt/secure2]# zfs create secure2/test 
cannot mount '/mnt/secure2/test': failed to create mountpoint: Operation not permitted
filesystem successfully created, but not mounted
root@silverrock[/mnt/secure2]# zfs destroy secure2/test

@exander77 exander77 changed the title OpenZFS cannot write/create/delete on parent datastore OpenZFS cannot write/create/delete on parent dataset Oct 10, 2022
@almereyda
Copy link

The issue may be related to my earlier attempt to move the encrypted child data store to a new pool. This was the target pool. But no error really happened during the process.

@exander77 Can you give us more details on how you have moved the dataset(s) between pools? Could you check your shell history for the exact commands, in so we may odd out edge cases that don't apply here, and find out what has happened to your data? Are you able to set a different mountpoint (as an absolute path) for nextcloud2, e.g. /mnt/nextcloud2, and have it mounted there?

I am also not perfectly sure on how the TrueNAS ZFS implementation works, or if there are any security mechanisms at work here.

  • Did you use some kind of UI, or were all maintenance commands issued in a terminal?
  • In your example, is the charts directory a separate dataset, or a directory within the secure2 dataset? How about ix-applications?

@exander77
Copy link
Author

exander77 commented Oct 10, 2022

The issue may be related to my earlier attempt to move the encrypted child data store to a new pool. This was the target pool. But no error really happened during the process.

@exander77 Can you give us more details on how you have moved the dataset(s) between pools? Could you check your shell history for the exact commands, in so we may odd out edge cases that don't apply here, and find out what has happened to your data? Are you able to set a different mountpoint (as an absolute path) for nextcloud2, e.g. /mnt/nextcloud2, and have it mounted there?

I am also not perfectly sure on how the TrueNAS ZFS implementation works, or if there are any security mechanisms at work here.

* Did you use some kind of UI, or were all maintenance commands issued in a terminal?

* In your example, is the `charts` directory a separate dataset, or a directory within the `secure2` dataset? How about `ix-applications`?

I did it from TrueNAS interface (ix-applications data set), but it did some standard zfs send and zfs receive (I watched processes), and the system worked after that or at least I didn't notice the issue. I already removed the copied dataset (as it didn't work). The charts (writeng etc. work there without any issues) is separate dataset and it works, ix-applications was a dataset I copied and is now removed, for some reason there is left over mount directory. I cannot remove it and I cannot write files into it.

root@silverrock[~]# zfs list -r secure2
NAME                 USED  AVAIL     REFER  MOUNTPOINT
secure2              157G   757G      296K  /mnt/secure2
secure2/charts       776M   757G      776M  /mnt/secure2/charts
secure2/nextcloud2   157G   757G      157G  /mnt/secure2/nextcloud2

root@silverrock[~]# ls -la /mnt/secure2
total 35
drwxr-xr-x 5 root root      5 Sep 26 16:01 .
drwxr-xr-x 6 root root      6 Sep 27 17:00 ..
drwxr-xr-x 6 root root      6 Oct 10 19:45 charts
drwxr-xr-x 2 root root      2 Sep 26 16:01 ix-applications
drwxrwx--- 8 root www-data 13 Oct 10 01:57 nextcloud2

@exander77
Copy link
Author

exander77 commented Oct 10, 2022

The nextcloud2 dataset works fine as well now, but I had issues unlocking it.

I can also unmount nextcloud2 dataset and mount directory stays there (most likely cannot be remove as well).

I can then create files in that directory, so the dataset is writable, just not directly in the /mnt/secure2 and /mnt/secure2/ix-applications/, but I can write to dataset charts in /mnt/secure2/charts and dataset nextcloud2 in /mnt/secure2/nextcloud2. And I can also write to the remnant directories if I unmount these datasets. See:

root@silverrock[/mnt/secure2]# zfs umount secure2/nextclou2
cannot open 'secure2/nextclou2': dataset does not exist
root@silverrock[/mnt/secure2]# zfs umount secure2/nextcloud2
root@silverrock[/mnt/secure2]# ls -la
total 35
drwxr-xr-x 5 root root     5 Sep 26 16:01 .
drwxr-xr-x 6 root root     6 Sep 27 17:00 ..
drwxr-xr-x 6 root root     6 Oct 10 19:45 charts
drwxr-xr-x 2 root root     2 Sep 26 16:01 ix-applications
drwxrwx--- 2 root www-data 2 Oct 10 15:50 nextcloud2
root@silverrock[/mnt/secure2]# cd nextcloud2 
root@silverrock[/mnt/secure2/nextcloud2]# ls -la
total 17
drwxrwx--- 2 root www-data 2 Oct 10 15:50 .
drwxr-xr-x 5 root root     5 Sep 26 16:01 ..
root@silverrock[/mnt/secure2/nextcloud2]# touch test
root@silverrock[/mnt/secure2/nextcloud2]# 
root@silverrock[/mnt/secure2]# touch test
touch: setting times of 'test': No such file or directory

@almereyda
Copy link

It is expected that the directory created for mounting a dataset remains, even when the dataset is unmounted.

The issue then seems to be with your secure2 root dataset of the secure2 pool. As requested in #12000 (comment), can you provide us with the output of these commands for comparison?

zfs get all secure2
zfs get all secure2/charts
zfs get all secure2/nextcloud2

I would suggest to also consult the TrueNAS community and to find people who are knowledgeable about their ZFS implementation.

@exander77
Copy link
Author

@almereyda

NAME     PROPERTY              VALUE                  SOURCE
secure2  type                  filesystem             -
secure2  creation              Sat Jul 16  0:42 2022  -
secure2  used                  157G                   -
secure2  available             757G                   -
secure2  referenced            296K                   -
secure2  compressratio         1.01x                  -
secure2  mounted               yes                    -
secure2  quota                 none                   default
secure2  reservation           none                   default
secure2  recordsize            128K                   default
secure2  mountpoint            /mnt/secure2           default
secure2  sharenfs              off                    default
secure2  checksum              on                     default
secure2  compression           lz4                    local
secure2  atime                 off                    local
secure2  devices               on                     default
secure2  exec                  on                     default
secure2  setuid                on                     default
secure2  readonly              off                    local
secure2  zoned                 off                    default
secure2  snapdir               hidden                 default
secure2  aclmode               discard                local
secure2  aclinherit            passthrough            local
secure2  createtxg             1                      -
secure2  canmount              on                     default
secure2  xattr                 on                     default
secure2  copies                1                      default
secure2  version               5                      -
secure2  utf8only              off                    -
secure2  normalization         none                   -
secure2  casesensitivity       sensitive              -
secure2  vscan                 off                    default
secure2  nbmand                off                    default
secure2  sharesmb              off                    default
secure2  refquota              none                   default
secure2  refreservation        none                   default
secure2  guid                  12095579002055522207   -
secure2  primarycache          all                    default
secure2  secondarycache        all                    default
secure2  usedbysnapshots       0B                     -
secure2  usedbydataset         296K                   -
secure2  usedbychildren        157G                   -
secure2  usedbyrefreservation  0B                     -
secure2  logbias               latency                default
secure2  objsetid              54                     -
secure2  dedup                 off                    default
secure2  mlslabel              none                   default
secure2  sync                  standard               default
secure2  dnodesize             legacy                 default
secure2  refcompressratio      1.00x                  -
secure2  written               296K                   -
secure2  logicalused           159G                   -
secure2  logicalreferenced     110K                   -
secure2  volmode               default                default
secure2  filesystem_limit      none                   default
secure2  snapshot_limit        none                   default
secure2  filesystem_count      none                   default
secure2  snapshot_count        none                   default
secure2  snapdev               hidden                 default
secure2  acltype               posix                  local
secure2  context               none                   default
secure2  fscontext             none                   default
secure2  defcontext            none                   default
secure2  rootcontext           none                   default
secure2  relatime              off                    default
secure2  redundant_metadata    all                    default
secure2  overlay               on                     default
secure2  encryption            aes-256-gcm            -
secure2  keylocation           prompt                 local
secure2  keyformat             hex                    -
secure2  pbkdf2iters           0                      default
secure2  encryptionroot        secure2                -
secure2  keystatus             available              -
secure2  special_small_blocks  0                      default
NAME            PROPERTY               VALUE                  SOURCE
secure2/charts  type                   filesystem             -
secure2/charts  creation               Mon Aug 15 19:26 2022  -
secure2/charts  used                   776M                   -
secure2/charts  available              757G                   -
secure2/charts  referenced             776M                   -
secure2/charts  compressratio          1.62x                  -
secure2/charts  mounted                yes                    -
secure2/charts  quota                  none                   local
secure2/charts  reservation            none                   local
secure2/charts  recordsize             128K                   default
secure2/charts  mountpoint             /mnt/secure2/charts    default
secure2/charts  sharenfs               off                    default
secure2/charts  checksum               on                     default
secure2/charts  compression            lz4                    inherited from secure2
secure2/charts  atime                  off                    inherited from secure2
secure2/charts  devices                on                     default
secure2/charts  exec                   on                     default
secure2/charts  setuid                 on                     default
secure2/charts  readonly               off                    inherited from secure2
secure2/charts  zoned                  off                    default
secure2/charts  snapdir                hidden                 default
secure2/charts  aclmode                discard                inherited from secure2
secure2/charts  aclinherit             discard                local
secure2/charts  createtxg              456582                 -
secure2/charts  canmount               on                     default
secure2/charts  xattr                  sa                     local
secure2/charts  copies                 1                      local
secure2/charts  version                5                      -
secure2/charts  utf8only               off                    -
secure2/charts  normalization          none                   -
secure2/charts  casesensitivity        sensitive              -
secure2/charts  vscan                  off                    default
secure2/charts  nbmand                 off                    default
secure2/charts  sharesmb               off                    default
secure2/charts  refquota               none                   local
secure2/charts  refreservation         none                   local
secure2/charts  guid                   18407887342451819439   -
secure2/charts  primarycache           all                    default
secure2/charts  secondarycache         all                    default
secure2/charts  usedbysnapshots        0B                     -
secure2/charts  usedbydataset          776M                   -
secure2/charts  usedbychildren         0B                     -
secure2/charts  usedbyrefreservation   0B                     -
secure2/charts  logbias                latency                default
secure2/charts  objsetid               101                    -
secure2/charts  dedup                  off                    default
secure2/charts  mlslabel               none                   default
secure2/charts  sync                   standard               default
secure2/charts  dnodesize              legacy                 default
secure2/charts  refcompressratio       1.62x                  -
secure2/charts  written                776M                   -
secure2/charts  logicalused            1.08G                  -
secure2/charts  logicalreferenced      1.08G                  -
secure2/charts  volmode                default                default
secure2/charts  filesystem_limit       none                   default
secure2/charts  snapshot_limit         none                   default
secure2/charts  filesystem_count       none                   default
secure2/charts  snapshot_count         none                   default
secure2/charts  snapdev                hidden                 default
secure2/charts  acltype                posix                  local
secure2/charts  context                none                   default
secure2/charts  fscontext              none                   default
secure2/charts  defcontext             none                   default
secure2/charts  rootcontext            none                   default
secure2/charts  relatime               off                    default
secure2/charts  redundant_metadata     all                    default
secure2/charts  overlay                on                     default
secure2/charts  encryption             aes-256-gcm            -
secure2/charts  keylocation            none                   default
secure2/charts  keyformat              hex                    -
secure2/charts  pbkdf2iters            0                      default
secure2/charts  encryptionroot         secure2                -
secure2/charts  keystatus              available              -
secure2/charts  special_small_blocks   0                      default
secure2/charts  org.truenas:managedby  10.9.9.1               local
NAME                PROPERTY               VALUE                    SOURCE
secure2/nextcloud2  type                   filesystem               -
secure2/nextcloud2  creation               Mon Aug 15 19:26 2022    -
secure2/nextcloud2  used                   157G                     -
secure2/nextcloud2  available              757G                     -
secure2/nextcloud2  referenced             157G                     -
secure2/nextcloud2  compressratio          1.01x                    -
secure2/nextcloud2  mounted                yes                      -
secure2/nextcloud2  quota                  none                     local
secure2/nextcloud2  reservation            none                     local
secure2/nextcloud2  recordsize             128K                     default
secure2/nextcloud2  mountpoint             /mnt/secure2/nextcloud2  default
secure2/nextcloud2  sharenfs               off                      default
secure2/nextcloud2  checksum               on                       default
secure2/nextcloud2  compression            lz4                      inherited from secure2
secure2/nextcloud2  atime                  off                      inherited from secure2
secure2/nextcloud2  devices                on                       default
secure2/nextcloud2  exec                   on                       default
secure2/nextcloud2  setuid                 on                       default
secure2/nextcloud2  readonly               off                      inherited from secure2
secure2/nextcloud2  zoned                  off                      default
secure2/nextcloud2  snapdir                hidden                   default
secure2/nextcloud2  aclmode                discard                  inherited from secure2
secure2/nextcloud2  aclinherit             discard                  local
secure2/nextcloud2  createtxg              456593                   -
secure2/nextcloud2  canmount               on                       default
secure2/nextcloud2  xattr                  sa                       local
secure2/nextcloud2  copies                 1                        local
secure2/nextcloud2  version                5                        -
secure2/nextcloud2  utf8only               off                      -
secure2/nextcloud2  normalization          none                     -
secure2/nextcloud2  casesensitivity        sensitive                -
secure2/nextcloud2  vscan                  off                      default
secure2/nextcloud2  nbmand                 off                      default
secure2/nextcloud2  sharesmb               off                      default
secure2/nextcloud2  refquota               none                     local
secure2/nextcloud2  refreservation         none                     local
secure2/nextcloud2  guid                   15970448320002913650     -
secure2/nextcloud2  primarycache           all                      default
secure2/nextcloud2  secondarycache         all                      default
secure2/nextcloud2  usedbysnapshots        0B                       -
secure2/nextcloud2  usedbydataset          157G                     -
secure2/nextcloud2  usedbychildren         0B                       -
secure2/nextcloud2  usedbyrefreservation   0B                       -
secure2/nextcloud2  logbias                latency                  default
secure2/nextcloud2  objsetid               1415                     -
secure2/nextcloud2  dedup                  off                      default
secure2/nextcloud2  mlslabel               none                     default
secure2/nextcloud2  sync                   standard                 default
secure2/nextcloud2  dnodesize              legacy                   default
secure2/nextcloud2  refcompressratio       1.01x                    -
secure2/nextcloud2  written                157G                     -
secure2/nextcloud2  logicalused            158G                     -
secure2/nextcloud2  logicalreferenced      158G                     -
secure2/nextcloud2  volmode                default                  default
secure2/nextcloud2  filesystem_limit       none                     default
secure2/nextcloud2  snapshot_limit         none                     default
secure2/nextcloud2  filesystem_count       none                     default
secure2/nextcloud2  snapshot_count         none                     default
secure2/nextcloud2  snapdev                hidden                   default
secure2/nextcloud2  acltype                posix                    local
secure2/nextcloud2  context                none                     default
secure2/nextcloud2  fscontext              none                     default
secure2/nextcloud2  defcontext             none                     default
secure2/nextcloud2  rootcontext            none                     default
secure2/nextcloud2  relatime               off                      default
secure2/nextcloud2  redundant_metadata     all                      default
secure2/nextcloud2  overlay                on                       default
secure2/nextcloud2  encryption             aes-256-gcm              -
secure2/nextcloud2  keylocation            none                     default
secure2/nextcloud2  keyformat              hex                      -
secure2/nextcloud2  pbkdf2iters            0                        default
secure2/nextcloud2  encryptionroot         secure2                  -
secure2/nextcloud2  keystatus              available                -
secure2/nextcloud2  special_small_blocks   0                        default
secure2/nextcloud2  org.truenas:managedby  10.9.9.1                 local

@almereyda
Copy link

I am seeing that both the charts and nextcloud2 datasets have a property

org.truenas:managedby  10.9.9.1                 local

which is not present on the secure2 dataset. Would you know why this could be the case?

@exander77
Copy link
Author

I am seeing that both the charts and nextcloud2 datasets have a property

org.truenas:managedby  10.9.9.1                 local

which is not present on the secure2 dataset. Would you know why this could be the case?

I have no idea what that property is, should I add it to root dataset? I have another dataset called secure, it doesn't have it:

NAME    PROPERTY              VALUE                  SOURCE
secure  type                  filesystem             -
secure  creation              Thu Jan 20 18:58 2022  -
secure  used                  5.85T                  -
secure  available             1.28T                  -
secure  referenced            5.85T                  -
secure  compressratio         1.04x                  -
secure  mounted               yes                    -
secure  quota                 none                   default
secure  reservation           none                   default
secure  recordsize            128K                   default
secure  mountpoint            /mnt/secure            default
secure  sharenfs              off                    default
secure  checksum              on                     default
secure  compression           lz4                    local
secure  atime                 off                    local
secure  devices               on                     default
secure  exec                  on                     default
secure  setuid                on                     default
secure  readonly              off                    default
secure  zoned                 off                    default
secure  snapdir               hidden                 default
secure  aclmode               discard                local
secure  aclinherit            passthrough            local
secure  createtxg             1                      -
secure  canmount              on                     default
secure  xattr                 on                     default
secure  copies                1                      default
secure  version               5                      -
secure  utf8only              off                    -
secure  normalization         none                   -
secure  casesensitivity       sensitive              -
secure  vscan                 off                    default
secure  nbmand                off                    default
secure  sharesmb              off                    default
secure  refquota              none                   default
secure  refreservation        none                   default
secure  guid                  10444130510769249471   -
secure  primarycache          all                    default
secure  secondarycache        all                    default
secure  usedbysnapshots       0B                     -
secure  usedbydataset         5.85T                  -
secure  usedbychildren        1.19G                  -
secure  usedbyrefreservation  0B                     -
secure  logbias               latency                default
secure  objsetid              54                     -
secure  dedup                 off                    default
secure  mlslabel              none                   default
secure  sync                  standard               default
secure  dnodesize             legacy                 default
secure  refcompressratio      1.04x                  -
secure  written               5.85T                  -
secure  logicalused           6.07T                  -
secure  logicalreferenced     6.07T                  -
secure  volmode               default                default
secure  filesystem_limit      none                   default
secure  snapshot_limit        none                   default
secure  filesystem_count      none                   default
secure  snapshot_count        none                   default
secure  snapdev               hidden                 default
secure  acltype               posix                  local
secure  context               none                   default
secure  fscontext             none                   default
secure  defcontext            none                   default
secure  rootcontext           none                   default
secure  relatime              off                    default
secure  redundant_metadata    all                    default
secure  overlay               on                     default
secure  encryption            aes-256-gcm            -
secure  keylocation           prompt                 local
secure  keyformat             hex                    -
secure  pbkdf2iters           0                      default
secure  encryptionroot        secure                 -
secure  keystatus             available              -
secure  special_small_blocks  0                      default

@exander77
Copy link
Author

I would assume it is some kind of TrueNAS metadata and should not affect the zfs behavior? You can also compare output of secure (working) and secure2 (not working).

@almereyda
Copy link

It is a user property (indicated by the : colon in its name) set by TrueNAS, probably to mark the datasets as theirs. I believe it does not interfere with ZFS mounting the dataset (secure2 mounted yes -), so we can probably ignore it.

@exander77
Copy link
Author

Diff of secure and secure:

1c1
< NAME    PROPERTY              VALUE                  SOURCE
---
> NAME     PROPERTY              VALUE                  SOURCE
3,7c3,7
< creation              Thu Jan 20 18:58 2022  -
< used                  5.86T                  -
< available             1.28T                  -
< referenced            5.85T                  -
< compressratio         1.04x                  -
---
> creation              Sat Jul 16  0:42 2022  -
> used                  157G                   -
> available             757G                   -
> referenced            560K                   -
> compressratio         1.01x                  -
12c12
< mountpoint            /mnt/secure            default
---
> mountpoint            /mnt/secure2           default
38c38
< guid                  10444130510769249471   -
---
> guid                  12095579002055522207   -
42,43c42,43
< usedbydataset         5.85T                  -
< usedbychildren        2.82G                  -
---
> usedbydataset         560K                   -
> usedbychildren        157G                   -
51,54c51,54
< refcompressratio      1.04x                  -
< written               5.85T                  -
< logicalused           6.07T                  -
< logicalreferenced     6.07T                  -
---
> refcompressratio      1.32x                  -
> written               560K                   -
> logicalused           159G                   -
> logicalreferenced     408K                   -
73c73
< encryptionroot        secure                 -
---
> encryptionroot        secure2                -

I don't really see anything suspicious.

@exander77
Copy link
Author

My main question is what can prevent writing into dataset (mounted rw, zfs readonly=off) and how to get rid of it.

There is no error anywhere whatsoever.

@almereyda
Copy link

almereyda commented Oct 10, 2022

Yes, I am not perfectly sure. It smells a little if this was connected to:

Can you try to set different mountpoints for charts and nextcloud2, e.g. /srv/charts//srv/nextcloud2 or /mnt/charts//mnt/nextcloud2 and try unmounting and remounting the secure2 dataset in its place, or eventually another? This should not affect the availability status of the key.

The TrueNAS Storage community should be a good place to ask for ZFS related questions with TrueNAS:

Reading their documentation also tells me that v10 is not supported anymore since many years. Is your TrueNAS system version up to date, e.g. on version 13.0-U2? (https://www.truenas.com/software-status/) The custom property suggests otherwise, but it could also be a remnant from the time of dataset creation.

You could also try to reach out on the zfs-discuss mailing list or r/zfs, where more people are more regularily checking than here:

@exander77
Copy link
Author

exander77 commented Oct 11, 2022

I can mount and unmount those datasets without issues, but even if they are unmounted I cannot write into the secure2. I can mount and unmount secure2 as well. I tried mounting secure elsewhere, no issue. Should I mount the child datasets?

I have latest TrueNAS (currently even nightly of Bluefin as I tried to see if nightly fixes this).

@exander77
Copy link
Author

Where you got v10? It's 22.12 (Scale, not Core).

@exander77
Copy link
Author

Zfs version is in second post:

zfs-2.1.5-1
zfs-kmod-2.1.5-

@almereyda
Copy link

Where you got v10?

I got that from this property: org.truenas:managedby 10.9.9.1 local, but I don't know if it's accurate, since I don't know TrueNAS.

At this point I would like to ask others from the community to step up and bring in new ideas, as I'm unable to dig deeper into this.

@exander77
Copy link
Author

Where you got v10?

I got that from this property: org.truenas:managedby 10.9.9.1 local, but I don't know if it's accurate, since I don't know TrueNAS.

At this point I would like to ask others from the community to step up and bring in new ideas, as I'm unable to dig deeper into this.

That's an IP address.

@exander77
Copy link
Author

Btw, I could back up data and recreate the pool, but this needs to be investigated. This is a small drive. But what if something like that happens on 20TB pool?

@almereyda
Copy link

almereyda commented Oct 11, 2022

It is generally good to consider the root dataset of a pool readonly, if even mounted, and have empty encryptionroots (see #12000 (comment)) with subsequent userdata in separate datasets below those.

When recreating the pool, please be aware of these issues, too:

@exander77
Copy link
Author

exander77 commented Oct 11, 2022

@almereyda It was in reality used read only. There were only directories for mounting the child datasets. But I can't create new datasets if the root is really read only. Actually I can, but I can't mount them in there, I can only mount them elsewhere, as I can't create the mount folders.

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Oct 11, 2022

This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system.
Users should never use the command line to alter anything with those datasets.

@PrivatePuffin
Copy link
Contributor

please close accordingly.

@exander77
Copy link
Author

This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system.

How is this not related to openzfs when I cannot write to zfs? If openzfs has some functionality to make dataset read only, I should be able to list it and view it and remove it, shouldn't I?

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Oct 11, 2022

This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system.

How is this not related to openzfs when I cannot write to zfs? If openzfs has some functionality to make dataset read only, I should be able to list it and view it and remove it, shouldn't I?

Because iX-Systems locks things down for you. It's an appliance not an OS.
It's intended behavior, users are not supposed to try to even touch ix-applications directly.

You need to file this as an issue with iX Systems (on JIRA, NOT github). They manage this for you, not the OpenZFS project.

@exander77
Copy link
Author

exander77 commented Oct 11, 2022

This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system.

How is this not related to openzfs when I cannot write to zfs? If openzfs has some functionality to make dataset read only, I should be able to list it and view it and remove it, shouldn't I?

Because iX-Systems locks things down for you. It's an appliance not an OS. It's intended behavior, users are not supposed to try to even touch ix-applications directly.

I am not sure if I follow this argument. First: I haven't touched ix-applications directly, I used their GUI. Secondly, as far as I can tell, it ran standard zfs commands through zfs command line utilities. Thirdly, if I take the pool drives from TrueNAS system and put them into clean BSD or Linux environment, would I be able to write to that pool's root dataset? I doubt that. If their zfs drivers are not modified in any way and zfs pool was managed through standard zfs command line infrastructure, then it is a zfs issue. It may be caused by weird combination of encryption and dataset moving TrueNAS did, but it was still caused and the issue is within zfs itself.

If I create a new pool in a separate system and extract commands TrueNAS is using and run them there, would it not break that pool the same way?

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Oct 11, 2022

This has nothing to do with OpenZFS, these settings and locks are managed by iX Systems for their kubernetes system.

How is this not related to openzfs when I cannot write to zfs? If openzfs has some functionality to make dataset read only, I should be able to list it and view it and remove it, shouldn't I?

Because iX-Systems locks things down for you. It's an appliance not an OS. It's intended behavior, users are not supposed to try to even touch ix-applications directly.

I am not sure if I follow this argument. First: I haven't touched ix-applications directly, I used their GUI. Secondly, as far as I can tell, it ran standard zfs commands through zfs command line utilities. Thirdly, if I take the pool drives from TrueNAS system and put them into clean BSD or Linux environment, would I be able to write to that pool's root dataset? I doubt that. If their zfs drivers are not modified in any way and zfs pool was managed through standard zfs command line infrastructure, then it is a zfs issue. It may be caused by weird combination of encryption and dataset moving TrueNAS did, but it was still caused and the issue is within zfs itself.

If I create a new pool in a separate system and extract commands TrueNAS is using and run them there, would it not break that pool the same way?

I think I need to explain how this works in opensource a bit:

In Opensource you've somethings called "downstreams" and "upsteams"
"Upsteams" are the dependencies, those get used and/or modified by their "downsteams"

When you encounter issues, you need to go to the lowest downstream, report it their and either: they fix it, they ask the upstream what's wrong or they forward you to ask the upstream instead.

In some cases your lowest downstream is completely unresponsive, in that case you should ask the upstream anyway.

The reason things are done that way is customisation, you don't precisely now all customisations downstreams do to the code/usecase. So you're never going to be able to 100% give the upstream all the info they need to fix it. It's also distracting for downstream maintainers when they get flooded with issues that are, potentially, caused by the downstream.

Most cases the lowest downstream is your vendor or packager. In your case iX-Systems is your vendor, you need to report it. Wait for their response on how it can be fixed and/or wait for a bugfix.

Give it a month or so, before going "up the ladder" to the next upstream.

It's of no use to ask this project "how is iX-Systems locking this down", as they basically have nothing to do with that.


So, i'm not saying your bug doesn't exist. It looks like it does.
But it's a iX-Systems bug, not a OpenZFS bug by the looks of it.

@exander77
Copy link
Author

I have zfs send the dataset and imported as a child and the issue persists. I have a 54kB dump of the affected dataset and I can share it including encryption key for debug purposes.

@exander77
Copy link
Author

root@ghost[/mnt/test]# find
.
./nextcloud2
./nextcloud2/nextcloud.log
./charts
./ix-applications
root@ghost[/mnt/test]# touch charts/test
root@ghost[/mnt/test]# touch ix-applications/test
touch: setting times of 'ix-applications/test': No such file or directory
root@ghost[/mnt/test]# touch nextcloud2/test
root@ghost[/mnt/test]# touch test
touch: setting times of 'test': No such file or directory

@almereyda
Copy link

almereyda commented Oct 11, 2022

To comment on @Ornias1993's comment above:

It's also distracting for downstream maintainers when they get flooded with issues that are, potentially, caused by the downstream.

This should probably be dubbed upstream for clarity purposes. Thanks for making the effort to explain the process.

To comment on @exander77 trying to debug their situation:

I have zfs send the dataset and imported as a child and the issue persists.

Can you replicate the issue on another machine, when raw sending the datasets to another pool on a non-TrueNAS device, keeping the encryptionroot, IV and salt intact?

Here we don't know how TrueNAS locks down their system, and are also not supposed to know. We need additional information from the vendor (iX-Systems) or power users within their community to be able to answer this in all detail.


I would suggest to convert this issue into a discussion, since this does not appear to be a technical deficiency with OpenZFS itself.

And thanks for moving the conversation over to

@exander77
Copy link
Author

@almereyda https://nextcloud.t4d.cz/s/Z9BfEGrMXYKQ6zp

I have dumped the dataset, and it's encryption key. The issue persists after importing it. There are only directories and one file, so it is just 50 kB.

@exander77
Copy link
Author

So, if anybody can zfs recieve it and confirm it persists with them as well, it would be great.

Behavior:

root@ghost[/mnt/test]# find
.
./nextcloud2
./nextcloud2/nextcloud.log
./charts
./ix-applications
root@ghost[/mnt/test]# touch charts/test
root@ghost[/mnt/test]# touch ix-applications/test
touch: setting times of 'ix-applications/test': No such file or directory
root@ghost[/mnt/test]# touch nextcloud2/test
root@ghost[/mnt/test]# touch test
touch: setting times of 'test': No such file or directory

@rincebrain
Copy link
Contributor

rincebrain commented Oct 11, 2022

I just remembered a case where I saw something similar once.

If there are extended ACLs in a format OpenZFS on Linux doesn't support (e.g. if acltype=nfsv4 were used on FreeBSD), it can behave strangely in a way similar to this, where it looks like your operations should succeed, but then they fail in strange ways.

So I would suspect if you go peering around on it, on FreeBSD, you'll spot some less invisible ACLs present, though I don't have my FBSD VMs available to check.

@exander77
Copy link
Author

I just remembered a case where I saw something similar once.

If there are extended ACLs in a format OpenZFS on Linux doesn't support (e.g. if acltype=nfsv4 were used on FreeBSD), it can behave strangely in a way similar to this, where it looks like your operations should succeed, but then they fail in strange ways.

So I would suspect if you go peering around on it, on FreeBSD, you'll spot some less invisible ACLs present, though I don't have my FBSD VMs available to check.

How would they got there? The drive was not in any BSD system and I try to list postixals etc.

@rincebrain
Copy link
Contributor

I can't speak to how they got there, I was just remarking on a time that I saw something similar, and what caused it. I think you could probably coax zdb into telling you if that's the case.

That link to what I assume was a send stream or a pool image also just refuses connection for me.

@exander77
Copy link
Author

@rincebrain Somebody already asked if it's not ZFS made on BSD FreeNAS, but no. I already played with zdb to today, I check if I can't get acls from that dataset and some other and compare them.

@PrivatePuffin
Copy link
Contributor

@rincebrain Somebody already asked if it's not ZFS made on BSD FreeNAS, but no. I already played with zdb to today, I check if I can't get acls from that dataset and some other and compare them.

As you solves the issue and it wasnt ZFS related in the end. Please close.

@rincebrain
Copy link
Contributor

@Ornias1993 I don't see a resolution on here, other than you saying "close this" over and over again.

And as far as my earlier suspicion, having received it on FBSD...

image
image
image

So yes, it does have invisible ACL bits compared to the baseline dataset. Tada.

We should probably make that more obvious on Linux, if that's going to break...

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Oct 12, 2022

@Ornias1993 I don't see a resolution on here, other than you saying "close this" over and over again.

And as far as my earlier suspicion, having received it on FBSD...

image image image

So yes, it does have invisible ACL bits compared to the baseline dataset. Tada.

We should probably make that more obvious on Linux, if that's going to break...

In stead of being passive agressive, you could also assume it was not solved here... Considering the user basically posted an issue everywhere under the sun ;-)

Also: I asked for this closed precisely two times. So far your claimed "over and over again".

Here is the solution:
image

@exander77
Copy link
Author

@Ornias1993 Not everywhere.

@rincebrain
Copy link
Contributor

I don't think I was being passive aggressive, I think I was just directly telling you it wasn't resolved and stop posting "close this" over and over in a bug.

If that wasn't clear enough - "don't do that."

Interestingly, the fact that that works seems like a bug, since Linux doesn't think +i is set if you ask on the dataset linked above, at least.

@almereyda
Copy link

almereyda commented Oct 12, 2022

If we consider this a bug, then we need a resolution path. This was presented with a screenshot from an undisclosed location. From what I am seeing this is about the immutable bit being present, but not visible to the VFS layer. This is not due to cross-platform contingencies between Linux and BSD, as the latter was not involved.

To get the reproducer working on Linux, I am also blocked by an internal server error at https://nextcloud.t4d.cz/s/Z9BfEGrMXYKQ6zp

In my local tests with a random file on a random dataset, I could create the following output:

$ touch test
$ lsattr test
---------------------- test
$ getfacl test
# file: test
# owner: yala
# group: yala
user::rw-
group::rw-
other::r--
$ sudo chattr +i test
$ getfacl test       
# file: test
# owner: yala
# group: yala
user::rw-
group::rw-
other::r--
$ lsattr test
----i----------------- test
$ getfattr test  
$ rm test           
rm: das Entfernen von 'test' ist nicht möglich: Vorgang nicht zulässig
$ echo $?                                                                                                                                                                   
1
$ sudo chattr -i test
$ lsattr test
---------------------- test
$ rm test
$ echo $?                                                                                                                                                                   
0

May someone in possession of the secret2 dataset please check its mountpoint (when being mounted) with the lsattr command, to see if any immutable bits are set? getfacl didn't show them for me.

Sorry I didn't think of the xattrs in the first place. It might be that the presented workaround works with sudo chattr -i /mnt/secure2 already.


Yet still, I would not consider this an OpenZFS bug. It's rather the VFS which is not very verbose about why the file system state changes are prohibited. This is Linux yelling at us, and not the file system underneath.

Therefore I am of the same opinion that this bug can be closed in OpenZFS, or moved to a (rather interesting!) discussion (about side effects of opaque third-party developer choices in handling the file system).

In case of a situation where no actual evidence is present that the error in question is caused by the suspected system, as it is here, I would favour if we could go with the flow and keep each other informed about results of the investigation in other places, to be able to voluntarily choose to move the discussion there when deemed appropriate. Also it would have been nice if the TrueNAS developers would have left us a hint somewhere about what is going on, but that cannot be expected.

As for the tone of the conversation, please bear in mind that it is actual real people on the other end of the screen, and that we need to self-moderate ourselves to keep this community friendly and alive.

@PrivatePuffin
Copy link
Contributor

This was presented with a screenshot from an undisclosed location.

TrueCharts discord.

@rincebrain
Copy link
Contributor

I would still be happy to keep this open as an OpenZFS bug, since even if it's fallout from the Linux VFS layer, we still need to care, since we're not going to convince them to change things there, but that's just my $0.02.

Linux does show, with lsattr, +i set, which surprises me, as I would have expected ls -l to have shown a + when that's true, as it does on FreeBSD 13 (where that screenshot is from), not just when facls exist.

But other filesystems react strangely in the same way, I see, if you set chattr +i on a dir and then try writing to it - the create doesn't appear to fail, but still doesn't happen. Wild. I love complex systems.

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Oct 12, 2022

over and over

If that wasn't clear enough - "don't do that."

Please don't strawman me. As I said, I just said it twice. I single needless repetition, is not saying something "over and over".
Byond that: luckily I'm entitled to explaining my reasoning without your permission.


On topic:
It's important to note that we still don't know how that flag got set. It's not unlikely it was set by TrueNAS itself (considering this behavior seemed to start after a middleware triggered dataset migration and the fact that iX is actively trying to prevent users from modifying the ix-applications dataset which also left reminants behind that shouldn't be there either)

@ghost
Copy link

ghost commented Oct 12, 2022

This is not a ZFS issue. TrueNAS middleware manipulates the immutable flag on the mountpoints of locked encrypted datasets to prevent accidentally writing in them and thinking they were mounted. It's possible this was handled incorrectly after the replication you performed. If you have not already then please report your issue here: https://ixsystems.atlassian.net/jira/software/c/projects/NAS/issues

I'm closing this as it doesn't seem there's anything OpenZFS needs to fix here at this time. Feel free to reopen the issue if you disagree.

P.S. you might want to zfs inherit readonly secure2 to undo your zfs set readonly=off secure2 if you haven't yet done so.

@ghost ghost closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2022
@ghost ghost removed the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 12, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants