Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activator health checks #15575

Open
thorweijie opened this issue Oct 16, 2024 · 1 comment
Open

Activator health checks #15575

thorweijie opened this issue Oct 16, 2024 · 1 comment
Labels
kind/question Further information is requested

Comments

@thorweijie
Copy link

Ask your question here:

We have a kubernetes cluster with many inference services. After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0, so we set target burst capacity to 0 to bypass the activator and fix the issue. We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

@thorweijie thorweijie added the kind/question Further information is requested label Oct 16, 2024
@skonto
Copy link
Contributor

skonto commented Oct 23, 2024

Hi @thorweijie!

After all the inference services were restarted, we noticed the istio-proxy container in activator pods were having high cpu usage and health checks were failing with response code 0

What healthchecks were failing, the activator ones?

We noticed that despite being skipped, the activator pods were still trying to perform health checks with response code 0 until they were restarted. We would like to know if the health checks for activator are cached, and whether the frequency of the health checks can be configured?

The probing mechanism is started when endpoints are created/updated with a default frequency of 200ms.
If probing finished successfully you should see this msg assuming you enable activator debug logging:

{"severity":"DEBUG","timestamp":"2024-10-23T14:20:52.082125337Z","logger":"activator","caller":"net/revision_backends.go:348","message":"Done probing, got 1 healthy pods","commit":"0abee66","knative.dev/controller":"activator","knative.dev/pod":"activator-8675c9944c-mdfj9","knative.dev/key":"default/autoscale-go-00001"}

Once all pods are ready (and stay that way) probing should stop. The idea is that activator is in standby mode to handle traffic and so each activator instance needs to know ready targets so it can route traffic to them if needed.
Afaik there is no caching. Maybe @ReToCode, @dprotaso have more to say here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants