Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIO-interface: add Unhealthy container state #5307

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

docjyJ
Copy link
Collaborator

@docjyJ docjyJ commented Sep 21, 2024

@szaimen szaimen added 3. to review Waiting for reviews enhancement New feature or request labels Sep 25, 2024
@szaimen szaimen added this to the next milestone Sep 25, 2024
@szaimen szaimen changed the title Ench/noid/heathcheck AIO: add IContainerState Sep 25, 2024
@docjyJ docjyJ self-assigned this Sep 28, 2024
@docjyJ docjyJ requested a review from szaimen September 28, 2024 18:21
Copy link
Member

@st3iny st3iny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Jean-Yves, Simon asked me to have a look at your changes. I left some feedback regarding the code itself.

I really like you refactoring the container state logic from an interface to an enum. However, I think that both concerns could have been split into separate PRs. One PR to refactor the interface and another one to implement the new healthy states. But the code is already there so why not.

Looks good otherwise but did not test.

php/src/Container/Container.php Outdated Show resolved Hide resolved
php/src/Controller/DockerController.php Outdated Show resolved Hide resolved
php/src/Docker/DockerActionManager.php Outdated Show resolved Hide resolved
php/src/Docker/DockerActionManager.php Outdated Show resolved Hide resolved
php/src/Docker/DockerActionManager.php Outdated Show resolved Hide resolved
php/src/Docker/DockerActionManager.php Outdated Show resolved Hide resolved
php/README.md Outdated Show resolved Hide resolved
php/src/Container/IContainerState.php Outdated Show resolved Hide resolved
@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 1, 2024

Hi,

Thanks for the answer, as soon as I have time I'll look into it.

I will try to separate and make several PR.

Take care,

@docjyJ docjyJ added 2. developing Work in progress blocked and removed 3. to review Waiting for reviews labels Oct 2, 2024
@docjyJ docjyJ marked this pull request as draft October 2, 2024 13:11
@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 2, 2024

I keep this PR open for the unhealthy state.
See #5372 for enum

@docjyJ docjyJ force-pushed the ench/noid/heathcheck branch from 1f3f231 to b9ca83a Compare October 7, 2024 07:45
@docjyJ docjyJ changed the title AIO: add IContainerState AIO: add Unhealthy container state Oct 7, 2024
@szaimen
Copy link
Collaborator

szaimen commented Oct 8, 2024

Conflicts :/

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 8, 2024

Don't worry, i'll handle it.

@docjyJ docjyJ force-pushed the ench/noid/heathcheck branch from b9ca83a to 6a3c340 Compare October 8, 2024 09:23
@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 8, 2024

Solved and up to date and ready (to be tested anyway...)

@docjyJ docjyJ marked this pull request as ready for review October 8, 2024 09:24
@docjyJ docjyJ added 3. to review Waiting for reviews and removed 2. developing Work in progress blocked labels Oct 8, 2024
@szaimen szaimen changed the title AIO: add Unhealthy container state AIO-interface: add Unhealthy container state Oct 8, 2024
@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 8, 2024

I have podman on my machine and I can't launch the container... I can't test it and I don't have time to debug...

@szaimen
Copy link
Collaborator

szaimen commented Oct 8, 2024

I have podman on my machine and I can't launch the container...

Why may I ask? Do you run into an issue here?

Copy link
Member

@st3iny st3iny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks good to me now!

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 15, 2024

Health checks may need to be adjusted, probably reduce the interval during startup to speed up startup of all containers.

start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

start interval is the time between health checks during the start period. This option requires Docker Engine version 25.0 or later.

See: https://docs.docker.com/reference/dockerfile/#healthcheck

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 15, 2024

This should allow for better dependency management.
Can be used for helm chart, I guess.

@szaimen
Copy link
Collaborator

szaimen commented Oct 15, 2024

Thanks for the idea! However our health checks are currently built in a way that they never fail after a specific time. See for example https://github.com/nextcloud/all-in-one/blob/main/Containers/apache/healthcheck.sh. for the rest the defaults are good enough imho

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 15, 2024

Thanks for the idea! However our health checks are currently built in a way that they never fail after a specific time. See for example https://github.com/nextcloud/all-in-one/blob/main/Containers/apache/healthcheck.sh. for the rest the defaults are good enough imho

The problem with doing this is that docker considers the container ready... This should immediately take the container out of the starting state.

Copy link
Collaborator

@szaimen szaimen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking, see above

@szaimen
Copy link
Collaborator

szaimen commented Oct 15, 2024

This is the only reliable way to check if the container is really up and running.

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 15, 2024

A fix would be to fail the container if nextcloud:9000 is not reachable and add a startup period (e.g. 10 minutes)

#!/bin/bash

- nc -z "$NEXTCLOUD_HOST" 9000 || exit 0
+ nc -z "$NEXTCLOUD_HOST" 9000 || exit 0
nc -z 127.0.0.1 8000 || exit 1
nc -z 127.0.0.1 "$APACHE_PORT" || exit 1
if ! nc -z "$NC_DOMAIN" 443; then
    echo "Could not reach $NC_DOMAIN on port 443."
    exit 1
fi
- HEALTHCHECK CMD /healthcheck.sh
+ HEALTHCHECK --start-period=10m CMD /healthcheck.sh

@szaimen
Copy link
Collaborator

szaimen commented Oct 15, 2024

A fix would be to fail the container if nextcloud:9000 is not reachable and add a startup period (e.g. 10 minutes)

This is the problem: we cannot spwcify a time here as is depending on the overall setup like installed apps and features, amount of users, given hardware and else especially during upgrades

@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 15, 2024

This is the problem: we cannot spwcify a time here as is depending on the overall setup like installed apps and features, amount of users, given hardware and else especially during upgrades

Yes, I see...

Would it be better to manage dependencies like docker compose?

The solution for detecting the ready state of a service is to use the condition attribute with one of the following options:

  • service_started
  • service_healthy. This specifies that a dependency is expected to be “healthy”, which is defined with healthcheck, before starting a dependent service.

https://docs.docker.com/compose/how-tos/startup-order/

@szaimen
Copy link
Collaborator

szaimen commented Oct 15, 2024

I fear this is going to have the same problem afaics, no?

@docjyJ docjyJ requested a review from szaimen October 18, 2024 21:40
@szaimen
Copy link
Collaborator

szaimen commented Oct 21, 2024

@docjyJ LGTM now 😊

Can you please rebase the PR and squash the commits?
Afterwards we can merge this :)

Signed-off-by: Jean-Yves <[email protected]>
@docjyJ docjyJ force-pushed the ench/noid/heathcheck branch from 9482e43 to 4798489 Compare October 21, 2024 09:19
@docjyJ
Copy link
Collaborator Author

docjyJ commented Oct 21, 2024

Done

Comment on lines +17 to +35
public function isStarting(): bool {
return $this == self::Starting;
}

public function isRestarting(): bool {
return $this == self::Restarting;
}

public function isHealthy(): bool {
return $this == self::Healthy;
}

public function isUnhealthy(): bool {
return $this == self::Unhealthy;
}

public function isRunning(): bool {
return $this->isHealthy() || $this->isUnhealthy() || $this->isStarting() || $this->isRestarting();
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Running state corresponds to the old GetRunningContainerState state.
Maybe there is another clearer state. Healthy means that the container is running without any problem detected by the AIO.

@@ -28,7 +30,7 @@ private function PerformRecursiveContainerStart(string $id, bool $pullImage = tr

// Don't start if container is already running
// This is expected to happen if a container is defined in depends_on of multiple containers
if ($container->GetRunningState() === ContainerState::Running) {
if ($container->GetContainerState()->isRunning()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container->GetRunningState does not include Starting and Restarting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid it being hard to understand I went through the isSomething function

Comment on lines -52 to -56
if ($responseBody['State']['Running'] === true) {
return ContainerState::Running;
} else {
return ContainerState::Stopped;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true regardless of the Healthy or Starting or Running state.

Copy link
Collaborator

@szaimen szaimen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@docjyJ I discussed the latest changes with @st3iny and we agree that the latest changes do not make much sense or at least are not easy enough to merge them. I am very sorry for that.

I would like to ask you if you could restore the changes from b20e2a8 by running git checkout <hash> and then push the old changes to a new branch via so: git checkout -b ench/noid/heathcheck-restored. From that commit hash, I think only such a check like https://github.com/nextcloud/all-in-one/pull/5307/files#diff-502ad3fbc5a2714763795f78cf314d4a76ada0cb3746530f2fb86fa471dd897bR91-R105 is missing. (best in a new commit)

Can you please do the above? I would do it myself but I unfortunately don't have your changes locally available anymore

@szaimen szaimen added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Oct 27, 2024
@szaimen szaimen modified the milestones: v9.8.0, next Oct 30, 2024
@szaimen szaimen marked this pull request as draft November 4, 2024 09:24
@szaimen szaimen removed this from the next milestone Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing Work in progress enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants