-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Handle fail cases caused by missing LVM devices. #326
Comments
Today this issue was repeated on different cluster, the resource was stuck on resizing, because of missing LV: # linstor r l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+-------------------------------------------------------------------------------------------------------------------------------------+
| ResourceName | Node | Port | Usage | Conns | State | CreatedOn |
|=====================================================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | slt-dev-kube-system-01 | 7000 | InUse | Ok | Resizing, UpToDate | 2022-10-06 09:32:06 |
+-------------------------------------------------------------------------------------------------------------------------------------+
# linstor vd l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+------------------------------------------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size | Gross | State |
|================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | 0 | 1000 | 100 GiB | | resizing |
+------------------------------------------------------------------------------------------------+
# linstor vd set-size pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 0 100G
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' modified.
Details:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' UUID is: a58b59cd-ce4a-46c2-b9cd-1d7a7eca1b4e
ERROR:
(Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Show reports:
linstor error-reports show 639FE3FF-E8C1E-000009
command terminated with exit code 10
# linstor vd l
Defaulted container "linstor-controller" out of: linstor-controller, kube-rbac-proxy
+------------------------------------------------------------------------------------------------+
| ResourceName | VolumeNr | VolumeMinor | Size | Gross | State |
|================================================================================================|
| pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 | 0 | 1000 | 100 GiB | | resizing |
+------------------------------------------------------------------------------------------------+
# linstor error-reports show 639FE3FF-E8C1E-000009
ERROR REPORT 639FE3FF-E8C1E-000009
============================================================
Application: LINBIT�� LINSTOR
Module: Satellite
Version: 1.20.0
Build ID: 9c6f7fad48521899f7a99c564b1d33aeacfdbfa8
Build time: 2022-11-07T16:37:38+00:00
Error time: 2022-12-28 11:16:25
Node: slt-dev-kube-system-01
============================================================
Reported error:
===============
Description:
Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Category: LinStorException
Class name: VolumeException
Class canonical name: com.linbit.linstor.core.devmgr.exceptions.VolumeException
Generated at: Method 'hasMetaData', Source file 'DrbdLayer.java', Line #1087
Error message: Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Error context:
An error occurred while processing resource 'Node: 'slt-dev-kube-system-01', Rsc: 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434''
Call backtrace:
Method Native Class:Line number
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1087
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
Caused by:
==========
Category: Exception
Class name: NoSuchFileException
Class canonical name: java.nio.file.NoSuchFileException
Generated at: Method 'translateToIOException', Source file 'UnixException.java', Line #92
Error message: /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
Call backtrace:
Method Native Class:Line number
translateToIOException N sun.nio.fs.UnixException:92
rethrowAsIOException N sun.nio.fs.UnixException:111
rethrowAsIOException N sun.nio.fs.UnixException:116
newFileChannel N sun.nio.fs.UnixFileSystemProvider:182
open N java.nio.channels.FileChannel:292
open N java.nio.channels.FileChannel:345
readObject N com.linbit.linstor.layer.drbd.utils.MdSuperblockBuffer:74
hasMetaData N com.linbit.linstor.layer.drbd.DrbdLayer:1082
adjustDrbd N com.linbit.linstor.layer.drbd.DrbdLayer:622
process N com.linbit.linstor.layer.drbd.DrbdLayer:396
process N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:900
processResourcesAndSnapshots N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:358
dispatchResources N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:168
dispatchResources N com.linbit.linstor.core.devmgr.DeviceManagerImpl:309
phaseDispatchDeviceHandlers N com.linbit.linstor.core.devmgr.DeviceManagerImpl:1083
devMgrLoop N com.linbit.linstor.core.devmgr.DeviceManagerImpl:735
run N com.linbit.linstor.core.devmgr.DeviceManagerImpl:631
run N java.lang.Thread:829
END OF ERROR REPORT. I know that this is not linstor issue, but since we relying on existing technologies we need to know how to live and how to overcome their bugs. The issue above was fixed by recreating symlink manually: # lvscan | grep pvc
ACTIVE '/dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000' [100.02 GiB] inherit
# ls -lah /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
ls: cannot access '/dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000': No such file or directory
# dmsetup ls | grep pvc
data-pvc--96665a02--7aaa--4f19--b10a--74ec53fac434_00000 (253:0)
# ls -lah /dev/dm-* | grep "253, 0"
brw-rw---- 1 root disk 253, 0 Dec 28 10:06 /dev/dm-0
# ln -s /dev/dm-0 /dev/data/pvc-96665a02-7aaa-4f19-b10a-74ec53fac434_00000
# linstor vd set-size pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 0 100G
SUCCESS:
Description:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' modified.
Details:
Volume definition with number '0' of resource definition 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' UUID is: a58b59cd-ce4a-46c2-b9cd-1d7a7eca1b4e Thus symlinks can be recovered without invoking |
Today I faced again with problem of missing symlink. I went through the many bugs trying to fix that attempt to resize, eg: root@slt-dev-kube-system-02:/# linstor r l
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 ┊ slt-dev-kube-system-01 ┊ 7000 ┊ InUse ┊ Ok ┊ Resizing, UpToDate ┊ 2022-10-06 09:32:06 ┊
┊ pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 ┊ slt-dev-kube-system-02 ┊ 7000 ┊ ┊ Ok ┊ Resizing, Unknown ┊ 2023-01-31 15:38:42 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@slt-dev-kube-system-02:/# linstor r d slt-dev-kube-system-02 pvc-96665a02-7aaa-4f19-b10a-74ec53fac434
SUCCESS:
Description:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 preparing for deletion.
Details:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434 UUID is: 8691638c-2caf-4779-a462-a6b54f13cd71
SUCCESS:
Preparing deletion of resource on 'slt-dev-kube-system-02'
ERROR:
(Node: 'slt-dev-kube-system-01') Failed to access DRBD super-block of volume pvc-96665a02-7aaa-4f19-b10a-74ec53fac434/0
Show reports:
linstor error-reports show 63D51331-E8C1E-000017
ERROR:
Description:
Deletion of resource 'pvc-96665a02-7aaa-4f19-b10a-74ec53fac434' on node 'slt-dev-kube-system-02' failed due to an unknown exception.
Details:
Node: slt-dev-kube-system-02, Resource: pvc-96665a02-7aaa-4f19-b10a-74ec53fac434
Show reports:
linstor error-reports show 63CACC00-00000-000007 linstor error-reports show 63CACC00-00000-000007
linstor error-reports show 63D51331-E8C1E-000017
I found that |
|
Seems related to piraeusdatastore/piraeus@9a9e383 and https://bugs.debian.org/932433 |
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly manifest during device resizing and deactivation. causing LINSTOR expceptions when accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly manifest during device resizing and deactivation. causing LINSTOR expceptions when accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly occur during device resizing and deactivation, causing LINSTOR expceptions diring accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly occur during device resizing and deactivation, causing LINSTOR expceptions diring accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly occur during device resizing and deactivation, causing LINSTOR expceptions diring accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly occur during device resizing and deactivation, causing LINSTOR expceptions diring accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
Introduce an additional handler that checks for the device path before each such modification. If the device is not found, it attempts to fix the symlink using dmsetup output. This change is workaround for specific set of issues, often related to udev, which lead to the disappearance of symlinks for LVM devices on a working system. These issues commonly occur during device resizing and deactivation, causing LINSTOR expceptions diring accessing DRBD super-block of volume. fixes: LINBIT#326 Signed-off-by: Andrei Kvapil <[email protected]>
I faced the same problem |
Same here |
This has been fixed with recent Piraeus Operator releases. This has not a lot to do with LINSTOR, as that is just calling the usual See piraeusdatastore/piraeus-operator#728 for details on the fix. So the issue with "detecting" the issue in LINSTOR: there is nothing to detect until it is already too late, because the missing link is caused by the resize command. So please upgrade to the latest Piraeus Operator (at least 2.7.0) version, that issue will go away. |
After the upgrade I'm getting "No space" error when resizing:
|
@WanzenBug we can't use several volumes now because of this. Could you please look? |
The weird thing is that both LVM and XFS are already resized to the requested size. Manually mounting the drbd device and running xfs_growfs produces the same error. |
Please open an issue in piraeus operator, this has nothing to do with LINSTOR itself. This may also be an issue with the kernel. I remember a version of RHEL 9.2 that produces this error when trying to resize a volume that does not need to be resized. So perhaps try to update everything? |
Hi, I just faced with issue of resizing the volume:
# linstor r l -r pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊ ╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ ┊ pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d ┊ monster-killer ┊ 7004 ┊ InUse ┊ Ok ┊ Resizing, UpToDate ┊ 2022-09-13 09:47:31 ┊ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
I tired to invoke resize operation manually:
error report:
Seems wasn't able to find
/dev/linstor/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000
device, okay, let's exec into pod:LVM found (already resized):
# lvs | grep pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d_00000 linstor -wi-ao---- 18.63g
DRBD found (not resized):
# lsblk /dev/drbd1004 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT drbd1004 147:1004 0 10G 0 disk /var/lib/kubelet/pods/56332201-3640-4de8-9ebb-52244111c406/volumes/kubernetes.io~csi/pvc-e5cc28a6-44f0-4afd-b831-502bb0882d1d/mount
Drbdadm adjust does not make anything:
Drbdadm down/up wasn't completed because of missing device:
lvchange make this device appears back on the node:
That is not a first case when I see that LVM devices are disappearing from the node this way.
Since we can't make influence on LVM to make it working more predictable. I suggest a few enhancements in linstor-server to improve diagnostics and troubleshooting process:
InUse
, rundrbdadm down; lvchange -an; lvchange -ay; drbdadm up
. Or is there any better method?The text was updated successfully, but these errors were encountered: