Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman may kill another processes instead of exec sessions #25104

Open
AlexIvchenko opened this issue Jan 23, 2025 · 3 comments
Open

Podman may kill another processes instead of exec sessions #25104

AlexIvchenko opened this issue Jan 23, 2025 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@AlexIvchenko
Copy link

Issue Description

When container is being removed, podman iterates through its exec sessions and checks whether exec session pid is still alive.
The problem is that the pid can be reused for other processes, so that it may not belong to exec session. In this scenario podman may kill another process

I have seen a situation when podman-rm kills itself because pid was assigned to podman-rm itself :)
The result is corrupted podman DB which doesn't allow to create container with same name, but also disallow to remove it because "container has already been removed". Probably, fresher versions of podman can handle this and container will be possible to remove, idk

But still, podman may kill processes it's not supposed to kill - this is the main issue

We have forked version of podman, but looking through the code of latest one it seems to me the issue still exits.
Such problems caused by pid reuse are quite uncommon, but they happen in large clusters

The first idea I came up to fix this is to store process start timestamp as well as pid. Combination of (pid, start) is good enough to determine whether pid belongs to created exec session or not

P.s. We use forked (long time ago) version, but it seems to me, problem is relevant for latest podman as well

Thanks in advance

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create container
  2. Exec into it
  3. Kill exec session
  4. Wait exec session pid to be reused by another process
  5. Remove the container

Describe the results you received

if pid exists it will be killed by podman

Describe the results you expected

pid is not killed because it belongs to another process

podman info output

version: forked from 1.6.2

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

No response

@AlexIvchenko AlexIvchenko added the kind/bug Categorizes issue or PR as related to a bug. label Jan 23, 2025
@baude
Copy link
Member

baude commented Jan 23, 2025

Are you able to observe this in podman 5.x or only 1.x? If the latter only, we would not be inclined to fix this.

cc @mheon

@Luap99
Copy link
Member

Luap99 commented Jan 23, 2025

Pid reuse is indeed still a large problem across our code base that was never paid much (if any) attention to.
On new kernel we should use pid fds instead but the problem is that pidfds doesn't work with the non daemon architecture of podman as the fd needs a process to stay open. We do not have that and can just record the pid in the DB.

In recent kernels pidfds got inode numbers that we can get via stat so it theory we could get the pid and also store the inode number in the db, then when we use we open a pidfd for the pid number and then check if the indoe is still the same.
AFAIK systemd moved to such a model.

Using the start time itself is still racy as the pid might have exited between our check and when we kill it. We must use pidfs to avoid races.

@brauner
Copy link

brauner commented Jan 23, 2025

Pid reuse is indeed still a large problem across our code base that was never paid much (if any) attention to. On new kernel we should use pid fds instead but the problem is that pidfds doesn't work with the non daemon architecture of podman as the fd needs a process to stay open. We do not have that and can just record the pid in the DB.

In recent kernels pidfds got inode numbers that we can get via stat so it theory we could get the pid and also store the inode number in the db, then when we use we open a pidfd for the pid number and then check if the indoe is still the same. AFAIK systemd moved to such a model.

Using the start time itself is still racy as the pid might have exited between our check and when we kill it. We must use pidfs to avoid races.

Starting with v6.14 you can use pidfd file handles as well (name_to_handle_at() -> store the pidfd file handle -> open_by_handle_at(/* works unprivileged */) // reopen pidfd file handle to get pidfd if a) process still alive and b) process is within your pidns hierarchy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants