Zombie process created on plugin stop #1203

iwankgb · 2016-09-14T13:43:05Z

We are facing issues with SIGKILL while working with one of our projects:

Collector plugin launches long running process in that is needed to return metrics.
Plugin process is killed with SIGKILL
Child process mentioned in step 1 becomes a zombie.

As of now we are going to use systemd to launch snapd as it will kill all the zombies. It would be great if we could handle termination signal gracefully and plugin could take care of its children processes/opened sockets/other resources on its own.

I am aware of implications that @andrzej-k mentioned in #711 and I do not expect collector interface to be extended before 1.0. Would there be a chance to sent another signal, prior to SIGKILL to indicate that plugin is to be shut down in lets say 5 seconds? It could work as a temporary fix until more robust version is developed.

lynxbat · 2016-09-14T15:32:28Z

@iwankgb Have you tested this against master? I know one of these kind of bugs is solved.

jcooklin · 2016-09-14T15:40:28Z

#1092 👈 is the commit that likely addresses this issue

IRCody · 2016-09-14T17:05:42Z

I think #1092 addresses zombie plugin processes but not children of plugin processes.

iwankgb · 2016-09-15T08:26:51Z

@lynxbat, @jcooklin, @IRCody - I will have it tested by the end of the week.

iwankgb · 2016-09-15T15:57:31Z

@lynxbat - @IRCody seems to be right, plugin child process becomes a zombie on current master (4ac7b4d).

lynxbat · 2016-09-15T16:18:27Z

@iwankgb @IRCody - in this case, wouldn't the plugin author need to implement behavior to catch and kill?

IRCody · 2016-09-15T18:27:47Z

@lynxbat: I think that that is what this issue is requesting since right now the signal we are sending is not catchable.

@iwankgb: As a temporary fix, would it be possible to update the child processes to periodically check if the parent pid is init and kill itself in that case? (Basically checking if the parent pid is 1 and killing itself if it is since that signifies the parent process has died). This has the pro of working now without any changes to snap.

A more permanent fix may be something like what is outlined here. Basically setting the plugin process as a separate process group. Then whenever we want to kill the plugin, we instead kill the process group.

This seems like it would work but I haven't tested it. It's also specific to linux I think so we'd probably want to think about a way to abstract it away (if that isn't already done by go somewhere) so that we don't preclude supporting other methods for different OS's.

lynxbat · 2016-09-15T18:31:59Z

What about a facility in the plugin libs for spawning child processes where we can manage the shutdown from the plugin side? Telling the author to use the facility so we can help them shut them down should the framework stop talking to the plugin.

IRCody · 2016-09-15T19:19:42Z

@lynxbat: Do you mean as some sort of plugin-lib extension or as a change to the plugin interface?

I think going forward we should first change the initial signal to be a catchable signal and give the plugin some time to kill it's own children before calling with SIGKILL as was mentioned when the issue was opened.

As far as strategies for the a plugin author to do this, one interesting thing would be to set the PdeathSig inside the syscallAttr struct when launching child processes to be SIGKILL so that child processes are automatically cleaned up. Linux specific but would be appropriate for a linux specific plugin.

iwankgb · 2016-09-16T08:09:17Z

@IRCody - as temporary fix we decided to use systemd to launch snapd and zombie is being killed after some time. We were also experimenting with process groups some time ago and we realized that relying on them might not work in all the cases (as you can still call setpgid() whenever you want).

Another solution that could be used in order to make sure that all the processes created are killed would be launching each plugin in its own PID namespace - as plugin process would become init process of the namespace then when it is killed all other processes in the namespace would get killed too. We use this solution in some of our project and I'm happy to share the experience with you.

I am going to verify Pdeathsig behavior and will get back to you guys.

@lynxbat - in my opinion, as plugin author, providing process spawning logic specific to Swan might not be the best option. I can see a lot of possible issues related to stdout/stderr piping for instance (in our scenario being able to read process output is pretty critical). I would prefer to receive catchable signal that allows to stop all my children gracefully.

iwankgb · 2016-09-16T09:25:59Z

@IRCody - using Pdeathsig does not seem to solve the issue that I am facing.

candysmurf · 2016-09-16T16:22:34Z

As we're moving to gRPC, it supports multiplexing and the flow control. Should we allow plugins unload themselves but notify Snap? I'm just trying to throw in a different approach which will require some change in Snap. If it's the direction, we like to think.

"net/context" package allows client to cancel pending RPC requests to servers
"net/trace" package allows tracing of RPCs and long-lived objects.

IRCody · 2016-09-16T16:57:40Z

@IRCody - using Pdeathsig does not seem to solve the issue that I am facing.

@iwankgb: That stinks. The documentation around it seems like it'd be exactly what you want:

 Pdeathsig    Signal         // Signal that the process will get when its parent dies (Linux only)

Can you link me to the code for the plugin so I can easily reproduce the behavior and play around with potential fixes?

iwankgb · 2017-03-03T12:56:51Z

This is no longer relevant, I believe.

kindermoumoute added the type/bug label Nov 11, 2016

snapbot added the tracked label Nov 14, 2016

iwankgb closed this as completed Mar 3, 2017

iwankgb reopened this Mar 3, 2017

katarzyna-z closed this as completed Sep 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie process created on plugin stop #1203

Zombie process created on plugin stop #1203

iwankgb commented Sep 14, 2016

lynxbat commented Sep 14, 2016

jcooklin commented Sep 14, 2016

IRCody commented Sep 14, 2016

iwankgb commented Sep 15, 2016

iwankgb commented Sep 15, 2016

lynxbat commented Sep 15, 2016

IRCody commented Sep 15, 2016

lynxbat commented Sep 15, 2016

IRCody commented Sep 15, 2016

iwankgb commented Sep 16, 2016

iwankgb commented Sep 16, 2016

candysmurf commented Sep 16, 2016 •

edited

Loading

IRCody commented Sep 16, 2016

iwankgb commented Mar 3, 2017

Zombie process created on plugin stop #1203

Zombie process created on plugin stop #1203

Comments

iwankgb commented Sep 14, 2016

lynxbat commented Sep 14, 2016

jcooklin commented Sep 14, 2016

IRCody commented Sep 14, 2016

iwankgb commented Sep 15, 2016

iwankgb commented Sep 15, 2016

lynxbat commented Sep 15, 2016

IRCody commented Sep 15, 2016

lynxbat commented Sep 15, 2016

IRCody commented Sep 15, 2016

iwankgb commented Sep 16, 2016

iwankgb commented Sep 16, 2016

candysmurf commented Sep 16, 2016 • edited Loading

IRCody commented Sep 16, 2016

iwankgb commented Mar 3, 2017

candysmurf commented Sep 16, 2016 •

edited

Loading