Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lvcreate: --quiet doesn't suppress overallocation warning. #57

Open
dseomn opened this issue Aug 24, 2021 · 20 comments
Open

lvcreate: --quiet doesn't suppress overallocation warning. #57

dseomn opened this issue Aug 24, 2021 · 20 comments

Comments

@dseomn
Copy link

dseomn commented Aug 24, 2021

Is there any way to suppress the warning below? I only want to see errors.

dseomn@ilus:~$ sudo lvcreate --quiet --quiet -s -n delete-me ilus-vg/data_test
  WARNING: Sum of all thin volume sizes (<2.01 TiB) exceeds the size of thin pool ilus-vg/thin_pool and the size of whole volume group (465.28 GiB).

In case it helps:

dseomn@ilus:~$ sudo lvcreate --version
  LVM version:     2.03.11(2) (2021-01-08)
  Library version: 1.02.175 (2021-01-08)
  Driver version:  4.43.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline
@utkonos
Copy link

utkonos commented May 6, 2022

My own use case is that I make a large number of snapshots which eventually trigger all of these warnings even though the actual used space is nowhere near the size that will endanger the pool. When I am done with a particular task, all the snapshots are deleted leaving the base volume. I don't need any of these warnings and I would like them all suppressed. I know of other projects and users who would like to suppress this warning as well along with the other three that you are suppressing via --quiet. Here is the relevant source code:

if (sz != UINT64_C(~0)) {
log_warn("WARNING: Sum of all thin volume sizes (%s) exceeds the "
"size of thin pool%s%s%s (%s).",
display_size(cmd, thinsum),
more_pools ? "" : " ",
more_pools ? "s" : display_lvname(pool_lv),
txt,
(sz > 0) ? display_size(cmd, sz) : "no free space in volume group");
if (max_threshold > 99 || !min_percent)
log_print_unless_silent("WARNING: You have not turned on protection against thin pools running out of space.");
if (max_threshold > 99)
log_print_unless_silent("WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.");
if (!min_percent)
log_print_unless_silent("WARNING: Set activation/thin_pool_autoextend_percent above 0 to specify by how much to extend thin pools reaching the threshold.");
/* FIXME Also warn if there isn't sufficient free space for one pool extension to occur? */
}

As you can see, the first warning is logged using a different method than the other three: log_warn vs log_print_unless_silent. A simple solution would be to make all four of these warnings log_print_unless_silent. However, I think this is a suboptimal solution that may suppress other useful warnings throwing out the baby with the bathwater.

I have read what I think are all of the historical mailing list threads and forum threads on this particular topic. I have found one proposed fix that appears to be optimal for my use case: adding an envvar LVM_SUPPRESS_POOL_WARNINGS. This was proposed by @zkabelac here:
https://listman.redhat.com/archives/linux-lvm/2017-September/024332.html

As stated in that proposal, there are two precedents for this type of suppression that are already implemented in LVM2: LVM_SUPPRESS_FD_WARNINGS and LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES.

Here are two historical threads about this feature request from the mailing list:
https://listman.redhat.com/archives/linux-lvm/2016-April/023529.html
https://listman.redhat.com/archives/linux-lvm/2017-September/024323.html

Here is an issue in the bugtracker at RedHat (unfortunately marked WONTFIX):
https://bugzilla.redhat.com/show_bug.cgi?id=1465974

Here is a forum thread over at Proxmox tracking this issue downstream:
https://forum.proxmox.com/threads/solved-you-have-not-turned-on-protection-against-thin-pools-running-out-of-space.91055/

I posted about this all to the lvm-linux mailing list, and there is interest in this change from the Qubes OS project and @DemiMarie:
https://listman.redhat.com/archives/linux-lvm/2022-May/026169.html

I am now going to suggest that all users and projects that want this change add a 👍 this issue so that there is a measurement of interest.

@utkonos
Copy link

utkonos commented May 8, 2022

I've done a bit more digging. These are the locations where messages/warnings are suppressed elsewhere:

LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES

lvm2/lib/locking/locking.c

Lines 128 to 157 in 8dccc23

int init_locking(struct cmd_context *cmd,
int file_locking_sysinit, int file_locking_readonly, int file_locking_ignorefail)
{
int suppress_messages = 0;
if (file_locking_sysinit || getenv("LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES"))
suppress_messages = 1;
_blocking_supported = find_config_tree_bool(cmd, global_wait_for_locks_CFG, NULL);
_file_locking_readonly = file_locking_readonly;
_file_locking_sysinit = file_locking_sysinit;
_file_locking_ignorefail = file_locking_ignorefail;
log_debug("File locking settings: readonly:%d sysinit:%d ignorelockingfailure:%d global/metadata_read_only:%d global/wait_for_locks:%d.",
_file_locking_readonly, _file_locking_sysinit, _file_locking_ignorefail,
cmd->metadata_read_only, _blocking_supported);
if (!init_file_locking(&_locking, cmd, suppress_messages)) {
log_error_suppress(suppress_messages, "File locking initialisation failed.");
_file_locking_failed = 1;
if (file_locking_sysinit || file_locking_ignorefail)
return 1;
return 0;
}
return 1;
}

LVM_SUPPRESS_FD_WARNINGS

lvm2/tools/lvmcmdline.c

Lines 3550 to 3586 in 8dccc23

if (getenv("LVM_SUPPRESS_FD_WARNINGS"))
suppress_warnings = 1;
if (!(d = opendir(_fd_dir))) {
if (errno != ENOENT) {
log_sys_error("opendir", _fd_dir);
return 0; /* broken system */
}
/* Path does not exist, use the old way */
if (getrlimit(RLIMIT_NOFILE, &rlim) < 0) {
log_sys_error("getrlimit", "RLIMIT_NOFILE");
return 1;
}
for (fd = 3; fd < (int)rlim.rlim_cur; fd++) {
if ((fd != custom_fds->out) &&
(fd != custom_fds->err) &&
(fd != custom_fds->report)) {
_close_descriptor(fd, suppress_warnings, command, ppid,
parent_cmdline);
}
}
return 1;
}
while ((dirent = readdir(d))) {
fd = atoi(dirent->d_name);
if ((fd > 2) &&
(fd != dirfd(d)) &&
(fd != custom_fds->out) &&
(fd != custom_fds->err) &&
(fd != custom_fds->report)) {
_close_descriptor(fd, suppress_warnings,
command, ppid, parent_cmdline);
}
}

@utkonos
Copy link

utkonos commented May 8, 2022

Would this change do the trick?

if (sz != UINT64_C(~0)) { => if (sz != UINT64_C(~0) and !(getenv("LVM_SUPPRESS_POOL_WARNINGS"))) {

Is an envvar available in this way to thin_manip.c?

@justinclift
Copy link

Interesting. This issue has 41 upvotes on the issue submission comment, so it looks like it has fairly wide appeal.

Anyone interested at taking a crack at getting this done?

@shodanx2
Copy link

shodanx2 commented Sep 9, 2024

Hello,
Just wanted to chime in, this warning creates a rabbithole for occasional lvm users.
And this rabbithole leads to the conclusion "this warning doesn't matter and there is nothing to be done"

As a first step I would change the warning to give better advice or be less dramatic. Thinpools, the entire point is to allow larger volumes to coexist in a smaller space than the total allocated space.

All it's really saying is "You're going to run out of space in your thin pool if your volumes exceed the space of the thin pool"

Here is one discussion that demonstrates the confusion

https://forum.proxmox.com/threads/solved-you-have-not-turned-on-protection-against-thin-pools-running-out-of-space.91055/

There is also the warning that

"WARNING: You have not turned on protection against thin pools running out of space."

Which suggest that you have to do that, and it also is not clear, how will the system handle running out of space ? Gracefully I hope ? Or will all the VMs crash horribly all at once when they try to write into empty space that doesn't exist ?

In most cases, I think thin pools are going to take all the space on the disk, so autoextend won't help anyway ?

Preferably I would like a more suggestive language "If you use more space than there is in total in your thinpool, bad things (specify) will happen. If you have extra space in your volume group, you can allow the thinpool to autoextend"

Also, maybe autoextend being enabled should be the default setting for new thinpools ?

@zkabelac
Copy link
Contributor

zkabelac commented Sep 9, 2024

It would be first good to understood that 'running out of space' a thin-pool is nowhere near similar to running out of space in your filesystem - it may possibly get seriously more worse for the user of a thin-pool and he can lose a big amount of data (hopefully users do make their backups...)

So yes - the message should 'scary a user' to avoid assuming this is something he should run into such state on daily bases and expect it's an 'easy-to-fix' scenario as it's simply not. So we try to kindly push users towards some 'monitored' usage of thin-pool. So there is at least log evidence that possible disaster is coming..

It is usually not complex to 'recover' thin-pool itself so it starts again, but the much more worse is to fix the consistency of all the data stored within the thin-pool.

So if there would be 'a trivial' knob to disable all these warning - we are pretty sure many distros would be picking the easiest path and set this as 'a new default' - then we just get requests for recovery helps and we would be blamed about 'unexpected' data loss...

One of idea we could think about here is to actually detect 'real' state of thin-pool and if we are past certain threshold (70% maybe 80%) - emit this warning - although for small volume size this may simply be not early enough.... But at least there would be no messages for nearly empty pools....

But the main part is - thin-pool is NOT supposed to be operated at 'full pool state' - this should always prevented!

@dseomn
Copy link
Author

dseomn commented Sep 9, 2024

can lose a big amount of data

much more worse is to fix the consistency of all the data stored within the thin-pool

Is that still true with --errorwhenfull y? I thought running out of space would effectively just make the thin volumes read-only in that case. How would that cause consistency issues or data loss?

@shodanx2
Copy link

shodanx2 commented Sep 9, 2024

Does LVM "do the right thing" when a thin pool runs out of space ?

I would imagine, like previous generations, that running out of space would be treated as a failing block device when the last free extent is filled up.

I think it should send a signal similar to a failing hard drive and remount all associated file systems to read only.

I notice the warning suggest to enable "auto extend" the thin pool.

In my case, there is never any space left to extend the thin pool into, my basic systems only have had one thin pool that uses all space left in the volume group.

But is there a reason not to have auto extend enabled by default if there were free space available in the volume group ?

Do most users prefer to crash their system rather than step over the empty space left in their volume group if any ?

And if you have auto extend enabled, but you also use up all the space left in your volume group, then what happens ? Does it break worse than your thin pool running out of space internally ?

These are all the question that this warning make me ask. That I imagine any responsible system admin will have to find the answer to on their own when confronted with that warning.

Frankly, that's a lot of extra research that you don't expect to have to do when you were just trying to setup partitions, it could
easily eat up an entire sunday afternoon !

I would like the reassurance that "LVM does the right thing in all cases" and I think that means "every filesystem flips to read only as it tried to write into new extents that you don't have, your write call fails but existing data is not lost".

Also I would expect LVM to have a kind extent reserve system, a kind of landing strip that starts to have serious alarm bells when you start using the reserve. Something similar to "tune2fs -m " but for extents.

And imagine the following case that could happen to anyone, one of your VM has a bug and it starts to write bug data at 1GB/s until something breaks. Any overprovisionned system will fail (that's everyone using thinpool), auto extend will not help.

EDIT:

I ran this whole discussion through chatgpt, which you can review here

Its responses are

Summary:

    What happens when a thin pool runs out of space? LVM stops writes, potentially causing data corruption, unless --errorwhenfull is used to make volumes read-only.

    Why isn't auto-extend enabled by default? To avoid unintentional space usage without admin oversight.

    What happens when both the pool and volume group are full? Writes fail, and the system may behave worse than just a full pool.

    Can admin workload be reduced? Yes, through auto-extend and monitoring thresholds.

    Is there a reserve system like tune2fs -m? No, but thresholds and monitoring can serve a similar function.

    Does --errorwhenfull improve failure modes? Yes, it makes failures more graceful by preventing further writes.

    How does LVM handle rapid data consumption? With proper monitoring and thresholds, admins can mitigate such situations.

So I take it the warning can be dealt with in two ways, "ignore or build your own LVM2 monitoring infrastructure, either way running out of space will fail the last write and go readonly like any broken hard drive"

Can anyone actually knowledgeable on the topic back these answers up for the future admin who end up on this page on their search for the warning ?

@zkabelac
Copy link
Contributor

can lose a big amount of data

much more worse is to fix the consistency of all the data stored within the thin-pool

Is that still true with --errorwhenfull y? I thought running out of space would effectively just make the thin volumes read-only in that case. How would that cause consistency issues or data loss?

This option only changes the behavior for 'instant' erroring - so there is no delay and you get 'write' errors as soon as they happen - otherwise there is by default a 60sec window so lvm2 has plenty of time to extend data or metadata volume.

Now let me describe couple issues here - where people do not realize how problematic is out-of-space thin-pool.
The main trouble is that when there are some 'blocks' that fail to write while others still proceed to the provisioned area - i.e. so one user can use fully provisioned thin - while some other one is being 'punished' by missing new writable chunks. One can also guess that if there is some interleave on written/unwritten/written blocks/sectors across different thin-pool chunks - and those may even sit within filesystem journals - it may get the filesystem into a muddy position - surely filesystems do get better over the time - but still it's not very well tested area and thin-pool was a reason for long list of fixes....
So recovering filesystems on overfilled pool is 'a joy' - especially when there is not much new space user can add - even booting such system can be a lot of fun if the rootfs is within some thin-pool... also i.e. readonly fs switch doesn't rescue all the cases as well - for ext4 this works well when you write 'data' - but it's problematic with 'metadata'...

@dseomn
Copy link
Author

dseomn commented Sep 10, 2024

Thank you! That finally makes sense why it's bad to let it run out of space.

Any chance of changing it so that all writes fail when it's out of space, instead of only writes that need the space? (That would solve the consistency and data loss issues, right?) Or would that introduce too much of a performance hit to check for free space before all writes? Would it work to have an atomic int somewhere in memory that's initialized to zero, checked on all writes, and flipped to non-zero by any thread that detects an out-of-space condition? That might allow some blocks to get written that shouldn't because of the race condition though, but hopefully that can be addressed by using more expensive locks to handle sync calls? Or maybe there could be an atomic int that represents when the thin pool usage is over ~95%, and when it's set, more expensive locks are always used. So for pools <95%, the performance penalty is minimal, and pools >=95% pay the performance penalty but get the data consistency benefit. I'm not familiar with linux's block device interface, but I'd be happy to take a look and try to come up with something if you want.

If not, any chance of updating the lvmthin man page to explain what you did about some writes erroring but others going through "successfully"?

@utkonos
Copy link

utkonos commented Sep 10, 2024

I'm glad there is some movement on this issue. I want to also point out that I am not asking for warnings to be removed or changed nor am I wanting to argue about the validity of warnings. What I see in the source code is that some of the warnings are sent to log via a function that allows warnings to be silenced, and the first warning is not able to be silenced.

My main goal is to simply have an option to silence this warning as an end user option. I am 1000% ok with this option be default off so that the current behavior is retained, whereas I am able to control it myself.

Additionally, I proposed using an environment variable, but I'm also not a project maintainer. I defer to whatever solution maintainers feel is the correct way to go for adding a knob to silence this warning as a user option.

@teigland
Copy link
Contributor

I think we should remove the warnings about overprovisioning and just replace them with improved warnings about thin pools running low on space, at percent-full levels that are configurable by the user, e.g.

WARNING: thin pool "foo" has reached 90% full, and will be autoextended at 95% full.
WARNING: thin pool "bar" has reached 75% full, and will require manual lvextend!
WARNING: thin pool "foo" has reached 80% full, and the VG lacks 1GB for the next autoextend!

I suggested something like this back in 2017 in https://bugzilla.redhat.com/show_bug.cgi?id=1465974

@utkonos
Copy link

utkonos commented Sep 10, 2024

I love any solution that you think is appropriate as long as the warning that says "WARNING: Sum of all thin volume sizes (%s) exceeds the size of thin pool%s%s%s (%s)." can be suppressed by a configuration change or other appropriate user option.

@zkabelac
Copy link
Contributor

No

I think we should remove the warnings about overprovisioning and just replace them with improved warnings about thin pools running low on space, at percent-full levels that are configurable by the user, e.g.

WARNING: thin pool "foo" has reached 90% full, and will be autoextended at 95% full. WARNING: thin pool "bar" has reached 75% full, and will require manual lvextend! WARNING: thin pool "foo" has reached 80% full, and the VG lacks 1GB for the next autoextend!

This is not going to work for our needs - we need to be sure we provided warning for a user who uses 'defaults' and activates empty pool and eventually fill the pool completely on the first activation.

However - after looking at our option list - this can possibly work when --monitor option is explicitly listed on cmdline.
While lvm.conf monitoring==0 would be still still showing warning.

lvcreate --monitor n -T....
lvchange -ay --monitor n vg/thinLV

This might be a way - lvm2 code can skip the message - when user intentionally created unmonitored volume - at the same time we have metadata logged with this content about users 'opt-out' .

@utkonos
Copy link

utkonos commented Sep 11, 2024

@zkabelac

I agree that warnings shoud absolutely not be removed. My original concern way way up top in my first comment is that for some reason this one single warning is not wrapped in log_print_unless_silent but written using log_warn. And therefore that one warning cannot be silenced no matter what a user does.

@teigland
Copy link
Contributor

This is not going to work for our needs - we need to be sure we provided warning for a user who uses 'defaults' and activates empty pool and eventually fill the pool completely on the first activation.

However - after looking at our option list - this can possibly work when --monitor option is explicitly listed on cmdline. While lvm.conf monitoring==0 would be still still showing warning.

lvcreate --monitor n -T.... lvchange -ay --monitor n vg/thinLV

This might be a way - lvm2 code can skip the message - when user intentionally created unmonitored volume - at the same time we have metadata logged with this content about users 'opt-out' .

You're talking about warning the user when a thin pool is created or activated without monitoring (and --monitor n can silence the warning). That's reasonable, and I included similar in my list of suggestions ("will need manual extension" is about the same as "not monitored".) So, I think that a rough plan would be:

  • remove the current "Sum of all..." warning about over-provisioning.
  • add warning from lvcreate and lvchange -ay if a pool is not monitored (--monitor n to silence)
  • add warning from lvs if a pool has reached N% full and is not monitored (where N is configurable, N=0 to silence)
  • add warning from lvs if a pool has reached N% full and the VG has insufficient space to autoextend.

@teigland
Copy link
Contributor

WARNING: thin pool "foo" is not monitored and will require manual lvextend (currently N% full.)

This message could be printed from lvcreate, lvchange -ay, and lvs. Maybe --ignoremonitoring could silence the warning for all of these?

WARNING: VG "vg" has insufficient space for the next autoextend of thin pool "foo" (currently N% full.)

This message about insufficient VG space could be printed from the same commands.

@shodanx2
Copy link

Hi,

This is tangentially related and for the people who end up on this thread from google.
The issue of monitoring the space left in the thin pool is a bit of a puzzle

The webmin dash doesn't see it

image

df -h doesn't see it

image

The way to find out is this command

root@proxmox:~# lvs -a -o+lv_size,data_percent,metadata_percent
  LV                          VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert LSize    Data%  Meta%
  ct-template                 pve Vwi-aotz--  200.00g data        2.07                                     200.00g 2.07
  data                        pve twi-aotz-- <319.61g             97.33  3.75                             <319.61g 97.33  3.75
  [data_tdata]                pve Twi-ao---- <319.61g                                                     <319.61g
  [data_tmeta]                pve ewi-ao----   <3.26g                                                       <3.26g
  iso                         pve Vwi-aotz--  200.00g data        44.42                                    200.00g 44.42
  large-language-models       pve Vwi-aotz--  200.00g data        2.07                                     200.00g 2.07
  [lvol0_pmspare]             pve ewi-------   <3.26g                                                       <3.26g
  root                        pve -wi-ao----   96.00g                                                       96.00g
  stable-diffusion-extensions pve Vwi-aotz--  200.00g data        2.07                                     200.00g 2.07
  stable-diffusion-models     pve Vwi-aotz--  200.00g data        31.94                                    200.00g 31.94
  stable-diffusion-webui      pve Vwi-aotz--  200.00g data        2.07                                     200.00g 2.07
  swap                        pve -wi-ao----    8.00g                                                        8.00g
  vm-100-disk-0               pve Vwi-a-tz--   32.00g data        4.24                                      32.00g 4.24
  vm-1001-disk-0              pve Vwi-aotz--    2.00g data        5.52                                       2.00g 5.52
  vm-101-disk-0               pve Vwi-a-tz--    4.00m data        14.06                                      4.00m 14.06
  vm-101-disk-1               pve Vwi-a-tz--   32.00g data        37.31                                     32.00g 37.31
  vm-102-disk-0               pve Vwi-a-tz--   64.00g data        26.09                                     64.00g 26.09
  vm-103-disk-0               pve Vwi-aotz--   32.00g data        0.00                                      32.00g 0.00
  vm-104-disk-0               pve Vwi-aotz--    4.00m data        14.06                                      4.00m 14.06
  vm-104-disk-1               pve Vwi-aotz--   32.00g data        30.62                                     32.00g 30.62
  vm-105-disk-0               pve Vwi-aotz--    4.00m data        14.06                                      4.00m 14.06
  vm-105-disk-1               pve Vwi-aotz--    4.00m data        1.56                                       4.00m 1.56
  vm-105-disk-2               pve Vwi-aotz--   64.00g data        60.20                                     64.00g 60.20
  vm-106-disk-0               pve Vwi-aotz--    4.00m data        14.06                                      4.00m 14.06
  vm-106-disk-1               pve Vwi-aotz--   64.00g data        15.43                                     64.00g 15.43
  vm-107-disk-0               pve Vwi-a-tz--   64.00g data        3.47                                      64.00g 3.47
  vm-115-disk-0               pve Vwi-aotz--  128.00g data        11.25                                    128.00g 11.25
  vm-116-disk-0               pve Vwi-aotz--  128.00g data        22.05                                    128.00g 22.05
  vm-125-disk-0               pve Vwi-a-tz--  128.00g data        2.76                                     128.00g 2.76
  vm-130-disk-0               pve Vwi-aotz--   32.00g data        9.50                                      32.00g 9.50
  vm-131-disk-0               pve Vwi-aotz--   32.00g data        6.12                                      32.00g 6.12
  vm-999-disk-0               pve Vwi-aotz--    2.00g data        5.52                                       2.00g 5.52

And for something as important as this appears to be, it's a surprisingly obscure command to find !

I'm saying that because if there MUST be such a warning about thin pools getting overfilled, maybe it should include the command to check at that point ?

And also, why isn't there a simple command to check that ?

(
Okay, actually there is, if you already know what you're looking for

lvdisplay /dev/pve/data
  --- Logical volume ---
  LV Name                data
  VG Name                pve
  LV UUID                ec8Jod-V47J-tsCN-3uaE-8XlO-9K0j-HPfwWu
  LV Write Access        read/write (activated read only)
  LV Creation host, time proxmox, 2024-09-07 21:44:22 -0500
  LV Pool metadata       data_tmeta
  LV Pool data           data_tdata
  LV Status              available
  # open                 0
  LV Size                <319.61 GiB
  Allocated pool data    97.33%
  Allocated metadata     3.75%
  Current LE             81820
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:5

)

Or better yet, since LVM is a core linux component, why doesn't df doesn't tell of the space left in the thin pool parent of mounted filesystems ? That was the first place I checked, probably the first place new lvm users would check !

Lastly, now I am kind of curious to see what's going to happen to this system if I bust the thin pool (it's a test system anyway)

@zkabelac
Copy link
Contributor

Hi,

This is tangentially related and for the people who end up on this thread from google. The issue of monitoring the space left in the thin pool is a bit of a puzzle

The webmin dash doesn't see it
df -h doesn't see it

Clearly both tools are not realated to 'lvm2' - so if you want to see info in 'webmin' - you would need to bother authors of that project.

And 'df' tool is 'filesystem' specific - so it's not its job to i.e. report disk errors, out of space of thin-pool' or broken raid1 leg - these are essentially same type of errors.
And while you could propose maintainers of 'df' tool to change things, I'd be here rather skeptical....

The way to find out is this command

root@proxmox:~# lvs -a -o+lv_size,data_percent,metadata_percent

Yep - this is correct command to work the 'LVs' - i.e. in a similar way when you work with md raid you are using 'mdadm' and for encrypted DMs we have cryptsetup tool.

And for something as important as this appears to be, it's a surprisingly obscure command to find !

Well we were thinking about possibly adding maybe 'colors' in colorized terms - or maybe '!' prefix on the LVs that do need some attention - but these were just some ideas - nothing yet materialized...

I'm saying that because if there MUST be such a warning about thin pools getting overfilled, maybe it should include the command to check at that point ?

There is surely a warning in journal/syslog - but I admit there are not many admins checking it...

And also, why isn't there a simple command to check that ?

lvs has tons of option.

So i.e. you could let it select to show all thinpools with percentage of usage higher then X...

( Okay, actually there is, if you already know what you're looking for

lvdisplay /dev/pve/data

lvdisplay is 'older' version of more universal 'lvs' tool - lvs is fully configurable and can be used in any scripting without any complicated parsing - you can just select columns & format and even get json output...

Or better yet, since LVM is a core linux component, why doesn't df doesn't tell of the space left in the thin pool parent of mounted filesystems ? That was the first place I checked, probably the first place new lvm users would check !

lvm2 is managing block device layer.

df is reporting 'filesystem layer'

and 'btrfs' can tell stories....

So you are mixing apples and oranges - 'df' has no way how to interpret 'free space' in thin-pool.

And strictly speaking 'df' is laying about free space in filesystem all the time anyway ;) - as you can have holes in files.....

@shodanx2
Copy link

shodanx2 commented Oct 3, 2024

For reference, here is the kind of damage that running out a thin pool causes
I think it might have failed mid installation of ffmpeg via apt

root@sdweb:~# apt install -y ffmpeg
E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct                                                                                                                                                                                                                                the problem.
root@sdweb:~# apt install -y ffmpeg^C
root@sdweb:~# dpkg --configure -a
Setting up libspeex1:amd64 (1.2.1-2) ...
Setting up libshine3:amd64 (3.1.1-2) ...
Setting up libsnappy1v5:amd64 (1.1.9-3) ...
dpkg: dependency problems prevent configuration of librsvg2-2:amd64:
 librsvg2-2:amd64 depends on libcairo-gobject2 (>= 1.12.16); however:
  Package libcairo-gobject2 is not installed.
 librsvg2-2:amd64 depends on libgdk-pixbuf-2.0-0 (>= 2.31.1); however:
  Package libgdk-pixbuf-2.0-0:amd64 is not installed.

dpkg: error processing package librsvg2-2:amd64 (--configure):
 dependency problems - leaving unconfigured
Setting up libasound2-data (1.2.8-1) ...
Setting up libva2:amd64 (2.17.0-1) ...
Setting up alsa-topology-conf (1.2.5.1-2) ...
Setting up libva-drm2:amd64 (2.17.0-1) ...
Setting up libasound2:amd64 (1.2.8-1+b1) ...
Setting up libmfx1:amd64 (22.5.4-1) ...
Setting up libwebpmux3:amd64 (1.2.4-0.2+deb12u1) ...
Setting up alsa-ucm-conf (1.2.8-1) ...
Processing triggers for libc-bin (2.36-9+deb12u3) ...
ldconfig: /lib/x86_64-linux-gnu/libvorbisenc.so.2.0.12 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libspeex.so.1.5.2 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0.4200.10 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libx264.so.164 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/librsvg-2.so.2 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libgdk_pixbuf-2.0.so.0 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libspeex.so.1 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/librsvg-2.so.2.48.0 is not an ELF file - it has the wrong magic bytes at the start.

ldconfig: /lib/x86_64-linux-gnu/libvorbisenc.so.2 is not an ELF file - it has the wrong magic bytes at the start.

Errors were encountered while processing:
 librsvg2-2:amd64
root@sdweb:~# apt update
Ign:1 https://download.webmin.com/download/newkey/repository stable InRelease
Hit:2 https://download.webmin.com/download/newkey/repository stable Release
Hit:4 http://deb.debian.org/debian bookworm InRelease
Hit:5 http://security.debian.org bookworm-security InRelease
Hit:6 http://deb.debian.org/debian bookworm-updates InRelease
Hit:7 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
66 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@sdweb:~# apt install -y ffmpeg
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 ffmpeg : Depends: libavcodec59 (>= 7:5.0)
          Depends: libavdevice59 (>= 7:5.0) but it is not going to be installed
          Depends: libavfilter8 (>= 7:5.1)
          Depends: libavformat59 (>= 7:5.1)
          Depends: libavutil57 (>= 7:5.1) but it is not going to be installed
          Depends: libpostproc56 (>= 7:5.0) but it is not going to be installed
          Depends: libsdl2-2.0-0 (>= 2.0.12) but it is not going to be installed
          Depends: libswresample4 (>= 7:5.1) but it is not going to be installed
          Depends: libswscale6 (>= 7:5.0) but it is not going to be installed
 libgdk-pixbuf-2.0-0 : Depends: libgdk-pixbuf2.0-common (>= 2.42.10+dfsg-1+deb12u1) but it is not going to be installed
                       Recommends: libgdk-pixbuf2.0-bin but it is not going to be installed
 librsvg2-2 : Depends: libcairo-gobject2 (>= 1.12.16) but it is not going to be installed
              Recommends: librsvg2-common but it is not going to be installed
 libvorbisenc2 : Depends: libvorbis0a (= 1.3.7-1) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
root@sdweb:~# apt install -y ffmpeg --fix-broken
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 ffmpeg : Depends: libavcodec59 (>= 7:5.0)
          Depends: libavdevice59 (>= 7:5.0) but it is not going to be installed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants