Skip to content
This repository has been archived by the owner on Jan 31, 2023. It is now read-only.

check_yum acummulating processes #16

Open
calestyo opened this issue Apr 10, 2015 · 5 comments
Open

check_yum acummulating processes #16

calestyo opened this issue Apr 10, 2015 · 5 comments

Comments

@calestyo
Copy link
Owner

A user reports via mail:
hi! we had an issue with check_yum spawning yum processes without
killing them when yum is stuck waiting for a lock. to see this happen, run

yum install isdn4k-utils

(or some other package not installed already) and do NOT answer yes, but
leave it waiting.

the timeout code will kill the python script itself after 55 seconds,
but the child process will be left behind. we had a server dying due to
the lack of memory after a while, since Icinga runs the check every 5
minutes when it is in non-OK state...

the included patch will simply disable check_yum's own signal handler
for SIGALRM and then proceed to send SIGALRM to all processes in its
process group. this will include the forked nrpe parent, but not nrpe
itself. when run interactively in a shell without job control, it may
also terminate that interactive shell. I don't think it is worthwhile
to complicate the code to avoid that behaviour.

also, have you heard that Google Code is shutting down? would be good
to migrate your project to Github or similar. if you have already done
so, please update

http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_yum/details

thanks!

@calestyo
Copy link
Owner Author

And he sent:

Index: modules/nagios/files/plugins/check_yum_updates
===================================================================
--- modules/nagios/files/plugins/check_yum_updates  (revision 108477)
+++ modules/nagios/files/plugins/check_yum_updates  (revision 108478)
@@ -198,7 +198,11 @@

    def timeout_signal_handler(self, signum, frame):
        """Function to be called by signal.alarm to kill the plugin."""
-       
+
+                # Send SIGALRM to all other processes in process group.
+       signal.signal(signal.SIGALRM, signal.SIG_IGN)
+       os.kill(0, signal.SIGALRM)
+
        end(UNKNOWN, "YUM nagios plugin has self terminated after exceeding the timeout (%s seconds)" % self.timeout)

@calestyo
Copy link
Owner Author

Now first to the issue itself:
I've tried that but cannot reproduce it.
For me (SL 6.6) even when yum install waits for Y/N check_yum runs just through normally.

How exactly do you invoke check_yum? And as which user?

When you look at issue #7, I mention something that upstream implemented (--setopt=exit_on_lock=true) and which could help us with locking issues... but I haven't had the mood so far to revisit this,... and it has the problem that if we simply exit then we cannot use "OK"... and OTOH we don't want to have non-OK statuses all the time, just because yum exited because of a lock.

@calestyo
Copy link
Owner Author

(Oh I just could reproduce your issue,... but it only happens, when I run check_yum as root)

@calestyo
Copy link
Owner Author

The to your patch:

  1. I'd probably prefer to simply remove the timeout code from check_yum at all. I mean we have plenty of other ways to set a timeout, Icinga/Nagios already have their timeous, there is the timeout(1) program.
    Users should IMHO simply use the standard GNU tools for that.

  2. I'm a bit reluctant of doing this.... as far as I understand the process group could also comprise further programs that in turn invoke check_yum, and we shouldn't kill those... if at all we should only kill our children!?

@calestyo
Copy link
Owner Author

Last but not least: yes I read that Google Code shuts down when it became public,... I've also started the migration but it always failed and I've opened a ticket at google.
Apparently they've done it in the mean time again and the issues were migrated as well.
So I've moved now all references to this site and marked the Google code site as closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant