Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Disregard non-running pods #128

Open
rakvay opened this issue Oct 30, 2024 · 1 comment
Open

[enhancement] Disregard non-running pods #128

rakvay opened this issue Oct 30, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@rakvay
Copy link

rakvay commented Oct 30, 2024

Describe the enhancement you'd like

I have been using the Kubernetes dashboards provided in this repository, and I appreciate the work that has gone into creating them. However, I’ve noticed that some of the metrics currently include non-running pods, which can lead to inaccurate resource usage and performance insights.
Specifically, I would like to request the following changes to ensure that PromQL queries related to resource requests and limits (specifically kube_pod_container_resource_requests and kube_pod_container_resource_limits) explicitly filter out non-running pods. For example, metrics like kube_pod_status_phase should include a check for phase="Running".

Current expressions:

sum(kube_pod_container_resource_requests{namespace=~"$namespace", resource="cpu", cluster="$cluster"})

sum(kube_pod_container_resource_limits{namespace=~"$namespace", resource="memory", cluster="$cluster"})

Proposed modifications:

sum(kube_pod_container_resource_requests{namespace=~"$namespace", resource="cpu"} * on(namespace, pod) group_left() (sum(kube_pod_status_phase{phase="Running", cluster="$cluster"}) by (pod, namespace) == 1))

sum(kube_pod_container_resource_limits{namespace=~"$namespace", resource="memory"} * on(namespace, pod) group_left() (sum(kube_pod_status_phase{phase="Running", cluster="$cluster"}) by (pod, namespace) == 1))

Similar modifications should be applied to all relevant metrics to accurately reflect the state of running pods.

Additional context

No response

@rakvay rakvay added the enhancement New feature or request label Oct 30, 2024
@martin-ilavsky
Copy link
Contributor

Hi we have imilar issue for memory/cpu metrics. When pod is restarted, metrics for it persists for a little while, thus creating peaks in cpu/memory metrics since they are summing 2 metrics together. We have added id to expression to separate them.

Current expression:

"sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\", pod=~\"$pod\", image!=\"\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container)"

Changed expression:
"sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\", pod=~\"$pod\", image!=\"\", container!=\"\", cluster=\"$cluster\"}[$__rate_interval])) by (container,id)"

Similarly to cpu, network and other metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants