High Disk usage without any messages queued #852

viktorerlingsson · 2024-11-18T12:57:49Z

Describe the bug
After running LavinMQ for a while, the disk fills up even if there are no messages left in any queues.

du reports usage, but df does not show it, pointing towards memory mapped files being deleted but not unmapped.

Describe your setup

How to reproduce
Not sure yet.

Expected behavior
LavinMQ should not use any significant disk space if there are no messages queued.

The text was updated successfully, but these errors were encountered:

fkollmann · 2025-01-08T16:26:01Z

We experience this behavior on our PROD system. Maybe file descriptors are being leaked?

We keep an eye on this issue and will be injecting lsof into the image to continue our diagnosis.

fkollmann · 2025-01-08T16:28:51Z

This is the behavior of the disk usage we see (on Dec 31th 2024):

It only happens on PROD and runs totally fine on DEV. And it only affects the master node, never one of the followers.

This is before the restart:

And after the restart:

viktorerlingsson · 2025-01-09T09:09:13Z

Thanks for the extra information @fkollmann 👍

Sorry for not updating earlier, but we're aware of what's causing the issue, but we haven't been able to create a good solution to the problem yet.
The issue is that some memory mapped files sometimes are deleted, but not unmapped (finalize is not properly called for them). So when this happens you should see a bunch of memory mapped files with lsof that are marked as deleted, but not being released by the process.
It seems to happen somewhat randomly, and we think garbage collection is the culprit. And it seems to happen more frequently when there's high load on an instance, explaining why you see this more often in PROD than in DEV environments.

Sending multiple USR2 signals in succession (killall -USR2 lavinmq or pkill -USR2 lavinmq), forcing LavinMQ to run GC, might work as a work-around for now if you do not wish to restart LavinMQ, but we've had mixed results with it.

fkollmann · 2025-01-09T12:45:52Z

Thanks for the feedback! This fits that we see more open files than actual existing files in the file system:

kubectl exec lavinmq-2 --namespace uplift -c lavinmq -- sh -c 'find /var/lib/lavinmq/42099b4af021e53fd8fd4e056c2568d7c2e3ffa8 -type f | wc -l'
--> 293

kubectl exec lavinmq-2 --namespace uplift -c lavinmq -- sh -c 'lsof +D /var/lib/lavinmq/42099b4af021e53fd8fd4e056c2568d7c2e3ffa8 | wc -l'
--> 352

The workaround indeed does help. Running it once freed the disk space:

kubectl exec lavinmq-2 --namespace uplift -c lavinmq -- pkill -USR2 lavinmq

Thank you very much for this! We will add a container which runs this on a regular basis.

fkollmann · 2025-01-11T12:22:57Z

The workaround works fine for us:

This is what we did:

In the k8s manifest, we added a container which sends the USR2 signal to the LavinMQ process:

    spec:
      shareProcessNamespace: true # required to allow sending signal from 'garbage-collect' to 'lavinmq'

      containers:
      - name: lavinmq
        ....

      - name: garbage-collect
        ....

        command: [ "/usr/local/bin/sp_garbage_collect.sh" ]

We use the following script to send the signal:

#!/bin/sh

# This script manually triggers the garbage collection of LavinMQ.
#
# There currently is a bug which prevents LavinMQ to actually free disk
# space, because the garbage is not correctly triggered.
#
# For more details, see https://github.com/cloudamqp/lavinmq/issues/852

echo "Starting garbage collection every 100 minutes..."

while true
do
    sleep 100m

    echo "Triggering garbage collection..."

    pkill -USR2 lavinmq
done

Hope this helpy anyone else who has this issue.

viktorerlingsson · 2025-01-20T15:06:43Z

This should be fixed with the release of LavinMQ v2.1.0. Please upgrade to that and let us know if any problems remain!

fkollmann · 2025-01-23T07:50:26Z

I can confirm that the issue is fixed for our envionment:

Thanks for the hard work!

viktorerlingsson · 2025-01-23T08:55:32Z

I can confirm that the issue is fixed for our envionment

That's great, thanks for verifying!

viktorerlingsson changed the title ~~High Disk usage~~ High Disk usage without any messages queued Nov 18, 2024

kickster97 added the bug label Nov 18, 2024

viktorerlingsson assigned viktorerlingsson and spuun Nov 19, 2024

viktorerlingsson removed their assignment Dec 17, 2024

viktorerlingsson closed this as completed Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Disk usage without any messages queued #852

High Disk usage without any messages queued #852

viktorerlingsson commented Nov 18, 2024 •

edited

Loading

fkollmann commented Jan 8, 2025

fkollmann commented Jan 8, 2025 •

edited

Loading

viktorerlingsson commented Jan 9, 2025

fkollmann commented Jan 9, 2025

fkollmann commented Jan 11, 2025

viktorerlingsson commented Jan 20, 2025

fkollmann commented Jan 23, 2025

viktorerlingsson commented Jan 23, 2025

High Disk usage without any messages queued #852

High Disk usage without any messages queued #852

Comments

viktorerlingsson commented Nov 18, 2024 • edited Loading

fkollmann commented Jan 8, 2025

fkollmann commented Jan 8, 2025 • edited Loading

viktorerlingsson commented Jan 9, 2025

fkollmann commented Jan 9, 2025

fkollmann commented Jan 11, 2025

viktorerlingsson commented Jan 20, 2025

fkollmann commented Jan 23, 2025

viktorerlingsson commented Jan 23, 2025

viktorerlingsson commented Nov 18, 2024 •

edited

Loading

fkollmann commented Jan 8, 2025 •

edited

Loading