-
Notifications
You must be signed in to change notification settings - Fork 294
Zombie process created on plugin stop #1203
Comments
@iwankgb Have you tested this against |
#1092 👈 is the commit that likely addresses this issue |
I think #1092 addresses zombie plugin processes but not children of plugin processes. |
@lynxbat: I think that that is what this issue is requesting since right now the signal we are sending is not catchable. @iwankgb: As a temporary fix, would it be possible to update the child processes to periodically check if the parent pid is init and kill itself in that case? (Basically checking if the parent pid is 1 and killing itself if it is since that signifies the parent process has died). This has the pro of working now without any changes to snap. A more permanent fix may be something like what is outlined here. Basically setting the plugin process as a separate process group. Then whenever we want to kill the plugin, we instead kill the process group. This seems like it would work but I haven't tested it. It's also specific to linux I think so we'd probably want to think about a way to abstract it away (if that isn't already done by go somewhere) so that we don't preclude supporting other methods for different OS's. |
What about a facility in the plugin libs for spawning child processes where we can manage the shutdown from the plugin side? Telling the author to use the facility so we can help them shut them down should the framework stop talking to the plugin. |
@lynxbat: Do you mean as some sort of plugin-lib extension or as a change to the plugin interface? I think going forward we should first change the initial signal to be a catchable signal and give the plugin some time to kill it's own children before calling with SIGKILL as was mentioned when the issue was opened. As far as strategies for the a plugin author to do this, one interesting thing would be to set the PdeathSig inside the syscallAttr struct when launching child processes to be SIGKILL so that child processes are automatically cleaned up. Linux specific but would be appropriate for a linux specific plugin. |
@IRCody - as temporary fix we decided to use systemd to launch snapd and zombie is being killed after some time. We were also experimenting with process groups some time ago and we realized that relying on them might not work in all the cases (as you can still call Another solution that could be used in order to make sure that all the processes created are killed would be launching each plugin in its own PID namespace - as plugin process would become init process of the namespace then when it is killed all other processes in the namespace would get killed too. We use this solution in some of our project and I'm happy to share the experience with you. I am going to verify @lynxbat - in my opinion, as plugin author, providing process spawning logic specific to Swan might not be the best option. I can see a lot of possible issues related to stdout/stderr piping for instance (in our scenario being able to read process output is pretty critical). I would prefer to receive catchable signal that allows to stop all my children gracefully. |
@IRCody - using |
As we're moving to gRPC, it supports multiplexing and the flow control. Should we allow plugins unload themselves but notify Snap? I'm just trying to throw in a different approach which will require some change in Snap. If it's the direction, we like to think. "net/context" package allows client to cancel pending RPC requests to servers |
@iwankgb: That stinks. The documentation around it seems like it'd be exactly what you want: Pdeathsig Signal // Signal that the process will get when its parent dies (Linux only) Can you link me to the code for the plugin so I can easily reproduce the behavior and play around with potential fixes? |
This is no longer relevant, I believe. |
We are facing issues with SIGKILL while working with one of our projects:
As of now we are going to use systemd to launch snapd as it will kill all the zombies. It would be great if we could handle termination signal gracefully and plugin could take care of its children processes/opened sockets/other resources on its own.
I am aware of implications that @andrzej-k mentioned in #711 and I do not expect collector interface to be extended before 1.0. Would there be a chance to sent another signal, prior to SIGKILL to indicate that plugin is to be shut down in lets say 5 seconds? It could work as a temporary fix until more robust version is developed.
The text was updated successfully, but these errors were encountered: