Sometimes misses drive detection with lots of drives #1133

martinpitt · 2023-06-29T12:56:48Z

In cockpit we have a storage scaling integration test. It creates 200 SCSI devices/drives with modprobe scsi_debug num_tgts=200, and then checks that they all propagate correctly through the kernel → udev → udisks → dbus → cockpit chain.

This test now fails very often on Arch Linux: It's often showing not the expected 200 SCSI drives (plus the virto root disk, and the cloud-init iso, thus 202 in total), but fewer, e.g. 188 or 193.

This seems to be an udisks bug. In the "good" case, udisks should pick up all 200 devices and drives:

# udisksctl dump | grep devices/sd|wc -l
200
# udisksctl dump | grep ^/org.*drives/Linux_scsi_debug_|wc -l
200

But when the test fails, it has fewer drives:

# udisksctl dump | grep devices/sd|wc -l
200
# udisksctl dump | grep ^/org.*drives/Linux_scsi_debug_|wc -l
179

The udisksctl dump diff between the good and the broken state shows that the device isn't properly connected to the drive, and the corresponding drive object is missing:

--- /tmp/good	2023-06-29 12:58:11.013662135 +0200
+++ /tmp/broken	2023-06-29 12:58:05.021631876 +0200
@@ -4707,7 +4707,7 @@
     CryptoBackingDevice:        '/'
     Device:                     /dev/sdfu
     DeviceNumber:               33536
-    Drive:                      '/org/freedesktop/UDisks2/drives/Linux_scsi_debug_18500'
+    Drive:                      '/'
     HintAuto:                   false
     HintIconName:
     HintIgnore:                 false

However, it is good in udev. The udevadm info --export-db diff between the
good and broken situation looks identical aside from time stamps and minor reordering:

@@ -11034,8 +11034,8 @@
 D: b 131:0
 N: sdfu
 L: 0
-S: disk/by-diskseq/188
 S: disk/by-id/scsi-3333333300002d2a8
+S: disk/by-diskseq/188
 S: disk/by-id/wwn-0x333333300002d2a8
 Q: 188
 E: DEVPATH=/devices/pseudo_0/adapter0/host3/target3:0:177/3:0:177:0/block/sdfu
@@ -11061,7 +11061,7 @@
 E: ID_TARGET_PORT=400
 E: ID_SCSI_SERIAL=18500
 E: ID_BUS=scsi
-E: DEVLINKS=/dev/disk/by-diskseq/188 /dev/disk/by-id/scsi-3333333300002d2a8 /dev/disk/by-id/wwn-0x333333300002d2a8
+E: DEVLINKS=/dev/disk/by-id/scsi-3333333300002d2a8 /dev/disk/by-diskseq/188 /dev/disk/by-id/wwn-0x333333300002d2a8
 E: TAGS=:systemd:
 E: CURRENT_TAGS=:systemd:

So far we only see this on Arch Linux, not (yet?) on Fedora/Debian/RHEL.

# pacman -Q udisks2 systemd linux
udisks2 2.9.4-4
systemd 253.5-1
linux 6.3.7.arch1-1

Arch's udisks package has no patches. Fedora 38 has kernel 6.3.9 and systemd 253.3 (same version), so arch is not really ahead here. So I'm not yet sure what the significant difference is, might just be sheer timing luck.

The text was updated successfully, but these errors were encountered:

martinpitt · 2023-06-29T12:58:25Z

I am trying to reproduce this outside of cockpit with a repr.sh script (it has to end in .txt so that I can attach it here -- please rename). But so far no luck -- that does fail after a few iterations, but on a different issue. So this needs something more, perhaps because Cockpit is connected to udisks and has signal listeners, etc.? I'll try to refine this and report back if I have a working shell reproducer.

Update: I ran busctl monitor org.freedesktop.UDisks2 > bus.log in parallel, and that didn't help. Nor with running udisksctl monitor > udisks.log (or logging to stderr), repr.sh keeps working fine. This is tricky.

udisks bug report: storaged-project/udisks#1133 Known issue cockpit-project#4942

tbzatek · 2023-06-29T16:29:22Z

One potential issue on a first sight - it takes a while until udiskd processes all uevents. There's a side probing thread that may potentially do additional I/O (e.g. cd-rom drives). Even after udevadm settle and similar it is not guaranteed that everything is up-to-date on UDisks side. See similar issue #711.

What happens if you put sleep(60) or something in there?

martinpitt · 2023-06-29T16:32:19Z

I forgot to mention -- it's not a race condition. I've kept the VM up in that state for an hour, and it didn't catch up. I can reload the cockpit page or do udisksctl dump all I want, it's a stable stabe. Only restarting udisks helped.

tbzatek · 2023-06-29T16:42:09Z

Thanks, I don't like race conditions :-)

Any chance to get ssh access to that machine? I'd be interested in:

lsscsi -wiu
ls -al /dev/disk/by-id/

We've seen some scsi_debug weirdness with missing identifiers before but so far I haven't managed to find the root cause: https://bugzilla.redhat.com/show_bug.cgi?id=2139318

And perhaps the obvious:

udisksctl dump
udevadm info -e

martinpitt · 2023-06-29T17:54:57Z

udisks and udevadm dumps are already linked from the description (the "good" and "broken" links).

I reproduced the failure again, and grabbed all four things. Unfortunately uploading to this github issue is broken right now, so I put them onto my server: https://piware.de/tmp/udisks-info/

I'm happy to give you ssh access to that machine. It's a VM running on my laptop, so requires some port forwarding, but that can be arranged. Please send me your public SSH key, and let's coordinate on Slack or so? (My laptop isn't always on, and I'll be working on a train tomorrow afternoon). Alternatively I can talk you to running the test on your machine, in a container (it's quite a lot of stuff to download, but no danger to your OS or home dir)

martinpitt · 2023-06-30T06:54:29Z

I tried to run the daemon with /usr/lib/udisks2/udisksd --debug, but that does not show any extra message -- like at all: Even when running the above repr.sh the whole output stays at

udisks-Message: 05:50:41.039: udisks daemon version 2.9.4 starting
udisks-Message: 05:50:41.060: Acquired the name org.freedesktop.UDisks2 on the system message bus

With G_MESSAGES_DEBUG=all I get tons of

(udisksd:32988): GLib-GIO-DEBUG: 05:51:26.255: GTask 0x5607c9739930 (source object: 0x5607c97395d0, source tag: (nil)) finalized without ever returning (using g_task_return_*()). This potentially indicates a bug in the program.

but that's not very useful, and unrelated to this bug (I suppose).

I played around further with repr.sh, but no luck so far. I updated the comment above with the results. So I am trying my luck with catching it from the other side, and modifying our integration test. That does the following steps:

boot VM
modprobe scsi_debug
log into cockpit, go to storage page
wait for expected number of devices

I tried to vary this. "stable" means it survived 3 parallel runs of 10 iterations each. "sleep" means adding time.sleep(5) ; "settle" means calling udevadm settle.

boot, sleep, modprobe, login: fail -- okay, so it's not a race against stuff that is still happening during boot
boot, modprobe, sleep, login: stable -- not surprising, udisks is less disturbed by the cockpit login and d-bus listeners
boot, modprobe, settle, login: stable -- dito
boot, login, modprobe: stable - !! that is surprising -- I had expected this to disturb udisksd more, not less
boot, login to overview, modprobe, open storage page: fail
boot, login to overview, sleep, modprobe, open storage page [1]: fail -- okay, so it's really not the login or the overview page which disturb here, it's the storage page connecting to udisks.

Current theory: happens when connecting to udisks while events are going on. I tried to simulate that with an updated
repr.sh, with varying sleeps, and serial/parallel monitor commands, but still no luck.

There must be something specific about cockpit's udisks monitoring. The Storage page does a single D-Bus connection to udisks (it also connects to Stratis). Right after connection, it enables the LVM2 and iscsi modules, that also sounds significant. I'll play around with that.

[1] diff, for posterity:

--- test/verify/check-storage-scaling
+++ test/verify/check-storage-scaling
@@ -17,6 +17,8 @@
 # You should have received a copy of the GNU Lesser General Public License
 # along with Cockpit; If not, see <http://www.gnu.org/licenses/>.
 
+import time
+
 import storagelib
 import testlib
 
@@ -27,9 +29,14 @@ class TestStorageScaling(storagelib.StorageCase):
         m = self.machine
         b = self.browser
 
+        self.login_and_go()
+        time.sleep(5)
         m.execute("modprobe scsi_debug num_tgts=200")
 
-        self.login_and_go("/storage")
+        b.go("/storage")
+        b.enter_page("/storage")
+        b.wait_visible("#devices")
+
         with b.wait_timeout(60):
             b.click("#drives button:contains(Show all 202 drives)")
             b.wait_not_in_text("#drives", "Show all")

martinpitt · 2023-06-30T07:18:05Z

I reworked repr.sh a bit to be closer to what cockpit does: initially udisksd is not running, then it gets started and modules enabled. But still no luck with reproducing that way 😢 I need to stop here.

Is there some other debugging that I could turn on during the test? --debug unfortunately doesn't seem to help at all.

You can also try to run it yourself. To make sure you don't have to change your running system, it's safest and easiest too do that in a toolbox with our devel/CI container:

toolbox create --image quay.io/cockpit/tasks -c cockpit
toolbox enter cockpit

Inside, check out cockpit, build an arch image:

git clone https://github.com/cockpit-project/cockpit
cd cockpit/
test/image-prepare -q arch

Then open some 2 or 3 terminals (for more parallelism, and failing faster), and run

for i in `seq 10`; do TEST_OS=arch test/verify/check-storage-scaling -stv || break; done

at some point this should fail and "sit":

wait_js_cond(ph_is_present("#drives button:contains(Show all 202 drives):not([disabled]):not([aria-disabled=true])")): Uncaught (in promise) Error: condition did not become true
[...]
SSH ACCESS
  $ ssh -p 2401 -i bots/machine/identity [email protected]

COCKPIT
  http://127.0.0.2:9291
Press RET to continue...

Then you can log into the VM using the provided command (the port number will vary depending on which parallel test failed). You can log in as user "admin" or "root", both have password "foobar".

udisks bug report: storaged-project/udisks#1133 Known issue #4942

tbzatek · 2023-07-03T16:29:53Z

(rather busy here at the moment, sorry, and PTO until Jul 17)

Just another thing that I've noticed:

sdc           8:32   0    8M  0 disk  
└─mpathc    253:0    0    8M  0 mpath 
sdd           8:48   0    8M  0 disk  
└─mpathb    253:2    0    8M  0 mpath 
sde           8:64   0    8M  0 disk  
└─mpatha    253:3    0    8M  0 mpath 
sdf           8:80   0    8M  0 disk  
sdg           8:96   0    8M  0 disk  
sdh           8:112  0    8M  0 disk  
sdi           8:128  0    8M  0 disk  
sdj           8:144  0    8M  0 disk  
└─mpathd    253:1    0    8M  0 mpath

For some reason, multipathd is kicking in.

martinpitt · 2023-07-04T07:56:24Z

@tbzatek : Very well spotted!

There is no multipathd process running. However, there is /usr/lib/udev/rules.d/56-multipath.rules, which triggers a lot of processes like "/sbin/multipath -u sdbv". I tried

--- test/verify/check-storage-scaling
+++ test/verify/check-storage-scaling
@@ -27,6 +27,7 @@ class TestStorageScaling(storagelib.StorageCase):
         m = self.machine
         b = self.browser
 
+        m.execute("rm /usr/lib/udev/rules.d/56-multipath.rules; udevadm control --reload")
         m.execute("modprobe scsi_debug num_tgts=200")
 
         self.login_and_go("/storage")

I have run it 3x parallel for 15 minutes now, and it eventually failed. It may have taken longer than usual.

lsscsi -wiu is "clean" -- it has exactly 200 entries like

[3:0:0:0]    disk    3333333000001f40                  0x33333333000001f40                 /dev/sda   33333333000001f40

However, that was already the case in my original lsscsi output, with multipath. What command did you run which produce the "sdc-mpathc" output? I don't see that anywhere in any of the dumps. If it was lsblk, this also looks clean in the failed instance with disabled mpath:

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0    8M  0 disk 
[... they all look the same, no mpath ...]
sdgr   132:112  0    8M  0 disk 
sr0     11:0    1  366K  0 rom  
vda    254:0    0   10G  0 disk 
├─vda1 254:1    0    1M  0 part 
├─vda2 254:2    0  300M  0 part /efi
└─vda3 254:3    0  9.7G  0 part /

martinpitt · 2023-08-25T12:58:49Z

This now also affects Fedora 38.

tbzatek · 2023-11-16T13:36:47Z

Testing this on F39 cloud image:

# modprobe scsi_debug num_tgts=200
# udisksctl dump | grep devices/sd|wc -l
184
# udisksctl dump | grep ^/org.*drives/Linux_scsi_debug_|wc -l
184
# udisksctl dump | grep '^/org/freedesktop/UDisks2/block_devices/sd' | wc -l
184
# lsblk | grep '^sd' | wc -l
200
# ls -1 /dev/sd* | wc -l
200
# ls -1 /dev/disk/by-id/wwn-0x33333330000* | wc -l
200
# ls -1 /dev/disk/by-id/scsi-333333330000* | wc -l
200
# udevadm info --export-db | grep -e '^P: /devices/pseudo_.*/adapter.*/host.*/target.*/block/sd.*' | wc -l
200

Not good.

tbzatek · 2023-11-16T14:26:39Z

Btw. same thing just like last time - if I start udisksd from a root shell, everything works. Only when started through the systemd service it fails. Same binary, no extra args. SELinux is permissive.

martinpitt · 2023-11-16T14:32:22Z

Weird.. I suppose the next thing to look at is udev, which sits in the middle? I checked that the udev dump is identical in good/bad situations, but it would be good to find out if it's missing the sending of uevents, or if udisks is missing to receive/handle some of them?

tbzatek · 2023-11-16T15:42:23Z

It's a performance issue mixed with what looks like a ring buffer on the kernel side, read through netlink. Reading the gudev and libudev sources, records are polled and read directly and not buffered anywhere.

Just putting a 100ms sleep at the beginning of udisks uevent processing chain results in about 90% uevent loss after a burst of about 250-300 uevents are read, likely being overwritten by other records in the ring buffer. Looking at the uevent sequence numbers the after the buffer is exhausted, there are significant gaps. Also noticed random order of incoming records (but that's something generally not guaranteed and we're kinda ready for that).

So we'll need a dedicated high-priority side thread that only reads and queues all incoming uevents that are then put into a probing queue processed in another thread and the results are then sent to the main thread where appropriate d-bus objects are created.

martinpitt · 2023-11-16T16:17:05Z

Nice debugging! I'm not entirely sure if we aren't asking too much here. I added that test case back then as we got a bug report from users with a gazillion drives -- but they usually aren't being hotplugged all at the same time.

We could certainly also rewrite this in a way to create 200 drives and restart udisks, to treat them as "coldplug".

tbzatek · 2023-11-16T17:02:01Z

Oh no worries, the change will be reasonably small I think. Sooner or later we'd hit the limit anyway and it might have actually been source of some random failures we've been seeing.

The problem is that there's a lot of going on in the main thread and things would only get worse once new features are added.

tbzatek · 2023-11-21T17:12:36Z

With G_MESSAGES_DEBUG=all I get tons of

(udisksd:32988): GLib-GIO-DEBUG: 05:51:26.255: GTask 0x5607c9739930 (source object: 0x5607c97395d0, source tag: (nil)) finalized without ever returning (using g_task_return_*()). This potentially indicates a bug in the program.

Another good catch, addressed in #1224.

martinpitt · 2023-11-21T18:21:59Z

Great work @tbzatek ! 💯

martinpitt mentioned this issue Jun 29, 2023

udisks sometimes misses drive detection with lots of drives cockpit-project/bots#4942

Closed

martinpitt added a commit to martinpitt/bots that referenced this issue Jun 29, 2023

naughty: Add pattern for udisks drive detection bug

9c6c85c

udisks bug report: storaged-project/udisks#1133 Known issue cockpit-project#4942

martinpitt mentioned this issue Jun 29, 2023

naughty: Add pattern for udisks drive detection bug cockpit-project/bots#4943

Merged

jelly pushed a commit to cockpit-project/bots that referenced this issue Jun 30, 2023

naughty: Add pattern for udisks drive detection bug

698f617

udisks bug report: storaged-project/udisks#1133 Known issue #4942

tbzatek mentioned this issue Nov 16, 2023

udiskslinuxprovider: Move udev monitor in a separate thread #1221

Merged

tbzatek closed this as completed in #1221 Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes misses drive detection with lots of drives #1133

Sometimes misses drive detection with lots of drives #1133

martinpitt commented Jun 29, 2023 •

edited

Loading

martinpitt commented Jun 29, 2023 •

edited

Loading

tbzatek commented Jun 29, 2023

martinpitt commented Jun 29, 2023

tbzatek commented Jun 29, 2023 •

edited

Loading

martinpitt commented Jun 29, 2023

martinpitt commented Jun 30, 2023

martinpitt commented Jun 30, 2023 •

edited

Loading

tbzatek commented Jul 3, 2023

martinpitt commented Jul 4, 2023

martinpitt commented Aug 25, 2023

tbzatek commented Nov 16, 2023

tbzatek commented Nov 16, 2023 •

edited

Loading

martinpitt commented Nov 16, 2023

tbzatek commented Nov 16, 2023

martinpitt commented Nov 16, 2023

tbzatek commented Nov 16, 2023

tbzatek commented Nov 21, 2023

martinpitt commented Nov 21, 2023

Sometimes misses drive detection with lots of drives #1133

Sometimes misses drive detection with lots of drives #1133

Comments

martinpitt commented Jun 29, 2023 • edited Loading

martinpitt commented Jun 29, 2023 • edited Loading

tbzatek commented Jun 29, 2023

martinpitt commented Jun 29, 2023

tbzatek commented Jun 29, 2023 • edited Loading

martinpitt commented Jun 29, 2023

martinpitt commented Jun 30, 2023

martinpitt commented Jun 30, 2023 • edited Loading

tbzatek commented Jul 3, 2023

martinpitt commented Jul 4, 2023

martinpitt commented Aug 25, 2023

tbzatek commented Nov 16, 2023

tbzatek commented Nov 16, 2023 • edited Loading

martinpitt commented Nov 16, 2023

tbzatek commented Nov 16, 2023

martinpitt commented Nov 16, 2023

tbzatek commented Nov 16, 2023

tbzatek commented Nov 21, 2023

martinpitt commented Nov 21, 2023

martinpitt commented Jun 29, 2023 •

edited

Loading

martinpitt commented Jun 29, 2023 •

edited

Loading

tbzatek commented Jun 29, 2023 •

edited

Loading

martinpitt commented Jun 30, 2023 •

edited

Loading

tbzatek commented Nov 16, 2023 •

edited

Loading