-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(helm-chart): update helm release kube-prometheus-stack to v54.2.0 #3305
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: budimanjojo-bot <111944664+budimanjojo-bot[bot]@users.noreply.github.com>
budimanjojo-bot
bot
added
renovate/helm
Pull request to a Renovate helm update
renovatebot
Pull request created by Renovate
type/minor
Pull request of type minor version bump
labels
Nov 22, 2023
--- cluster/apps/monitoring-system/kube-prometheus-stack/base Kustomization: flux-system/monitoring-system-kube-prometheus-stack HelmRelease: monitoring-system/kube-prometheus-stack
+++ cluster/apps/monitoring-system/kube-prometheus-stack/base Kustomization: flux-system/monitoring-system-kube-prometheus-stack HelmRelease: monitoring-system/kube-prometheus-stack
@@ -10,13 +10,13 @@
chart: kube-prometheus-stack
interval: 15m
sourceRef:
kind: HelmRepository
name: prometheus-community-charts
namespace: flux-system
- version: 54.1.0
+ version: 54.2.0
install:
crds: CreateReplace
createNamespace: true
remediation:
retries: 5
interval: 15m |
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-node.rules
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-node.rules
@@ -13,22 +13,22 @@
heritage: Helm
spec:
groups:
- name: node.rules
rules:
- expr: |-
- topk by(cluster, namespace, pod) (1,
+ topk by (cluster, namespace, pod) (1,
max by (cluster, node, namespace, pod) (
label_replace(kube_pod_info{job="kube-state-metrics",node!=""}, "pod", "$1", "pod", "(.*)")
))
record: 'node_namespace_pod:kube_pod_info:'
- expr: |-
count by (cluster, node) (
node_cpu_seconds_total{mode="idle",job="node-exporter"}
* on (namespace, pod) group_left(node)
- topk by(namespace, pod) (1, node_namespace_pod:kube_pod_info:)
+ topk by (namespace, pod) (1, node_namespace_pod:kube_pod_info:)
)
record: node:node_num_cpu:sum
- expr: |-
sum(
node_memory_MemAvailable_bytes{job="node-exporter"} or
(
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-storage
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-storage
@@ -27,15 +27,15 @@
kubelet_volume_stats_available_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"}
/
kubelet_volume_stats_capacity_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"}
) < 0.03
and
kubelet_volume_stats_used_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"} > 0
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
for: 1m
labels:
severity: critical
- alert: KubePersistentVolumeFillingUp
annotations:
@@ -52,15 +52,15 @@
kubelet_volume_stats_capacity_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"}
) < 0.15
and
kubelet_volume_stats_used_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"} > 0
and
predict_linear(kubelet_volume_stats_available_bytes{job="kubelet", namespace=~".*", metrics_path="/metrics"}[6h], 4 * 24 * 3600) < 0
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
for: 1h
labels:
severity: warning
- alert: KubePersistentVolumeInodesFillingUp
annotations:
@@ -74,15 +74,15 @@
kubelet_volume_stats_inodes_free{job="kubelet", namespace=~".*", metrics_path="/metrics"}
/
kubelet_volume_stats_inodes{job="kubelet", namespace=~".*", metrics_path="/metrics"}
) < 0.03
and
kubelet_volume_stats_inodes_used{job="kubelet", namespace=~".*", metrics_path="/metrics"} > 0
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
for: 1m
labels:
severity: critical
- alert: KubePersistentVolumeInodesFillingUp
annotations:
@@ -99,15 +99,15 @@
kubelet_volume_stats_inodes{job="kubelet", namespace=~".*", metrics_path="/metrics"}
) < 0.15
and
kubelet_volume_stats_inodes_used{job="kubelet", namespace=~".*", metrics_path="/metrics"} > 0
and
predict_linear(kubelet_volume_stats_inodes_free{job="kubelet", namespace=~".*", metrics_path="/metrics"}[6h], 4 * 24 * 3600) < 0
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_access_mode{ access_mode="ReadOnlyMany"} == 1
- unless on(namespace, persistentvolumeclaim)
+ unless on (namespace, persistentvolumeclaim)
kube_persistentvolumeclaim_labels{label_excluded_from_alerts="true"} == 1
for: 1h
labels:
severity: warning
- alert: KubePersistentVolumeErrors
annotations:
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-system-kubelet
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-system-kubelet
@@ -41,17 +41,17 @@
annotations:
description: Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage
}} of its Pod capacity.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubelettoomanypods
summary: Kubelet is running at capacity.
expr: |-
- count by(cluster, node) (
- (kube_pod_status_phase{job="kube-state-metrics",phase="Running"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job="kube-state-metrics"})
+ count by (cluster, node) (
+ (kube_pod_status_phase{job="kube-state-metrics",phase="Running"} == 1) * on (instance,pod,namespace,cluster) group_left(node) topk by (instance,pod,namespace,cluster) (1, kube_pod_info{job="kube-state-metrics"})
)
/
- max by(cluster, node) (
+ max by (cluster, node) (
kube_node_status_capacity{job="kube-state-metrics",resource="pods"} != 1
) > 0.95
for: 15m
labels:
severity: info
- alert: KubeNodeReadinessFlapping
@@ -80,14 +80,14 @@
annotations:
description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds
on node {{ $labels.node }}.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeletpodstartuplatencyhigh
summary: Kubelet Pod startup latency is too high.
expr: histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job="kubelet",
- metrics_path="/metrics"}[5m])) by (cluster, instance, le)) * on(cluster, instance)
- group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"}
+ metrics_path="/metrics"}[5m])) by (cluster, instance, le)) * on (cluster,
+ instance) group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"}
> 60
for: 15m
labels:
severity: warning
- alert: KubeletClientCertificateExpiration
annotations:
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-swap
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-swap
@@ -0,0 +1,24 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.container-memory-swap
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.container_memory_swap
+ rules:
+ - expr: |-
+ container_memory_swap{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
+ * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (1,
+ max by (cluster, namespace, pod, node) (kube_pod_info{node!=""})
+ )
+ record: node_namespace_pod_container:container_memory_swap
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-resource
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-resource
@@ -0,0 +1,86 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.container-resource
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.container_resource
+ rules:
+ - expr: |-
+ kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
+ group_left() max by (namespace, pod, cluster) (
+ (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
+ )
+ record: cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
+ - expr: |-
+ sum by (namespace, cluster) (
+ sum by (namespace, pod, cluster) (
+ max by (namespace, pod, container, cluster) (
+ kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"}
+ ) * on (namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
+ kube_pod_status_phase{phase=~"Pending|Running"} == 1
+ )
+ )
+ )
+ record: namespace_memory:kube_pod_container_resource_requests:sum
+ - expr: |-
+ kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
+ group_left() max by (namespace, pod, cluster) (
+ (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
+ )
+ record: cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
+ - expr: |-
+ sum by (namespace, cluster) (
+ sum by (namespace, pod, cluster) (
+ max by (namespace, pod, container, cluster) (
+ kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"}
+ ) * on (namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
+ kube_pod_status_phase{phase=~"Pending|Running"} == 1
+ )
+ )
+ )
+ record: namespace_cpu:kube_pod_container_resource_requests:sum
+ - expr: |-
+ kube_pod_container_resource_limits{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
+ group_left() max by (namespace, pod, cluster) (
+ (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
+ )
+ record: cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
+ - expr: |-
+ sum by (namespace, cluster) (
+ sum by (namespace, pod, cluster) (
+ max by (namespace, pod, container, cluster) (
+ kube_pod_container_resource_limits{resource="memory",job="kube-state-metrics"}
+ ) * on (namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
+ kube_pod_status_phase{phase=~"Pending|Running"} == 1
+ )
+ )
+ )
+ record: namespace_memory:kube_pod_container_resource_limits:sum
+ - expr: |-
+ kube_pod_container_resource_limits{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
+ group_left() max by (namespace, pod, cluster) (
+ (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
+ )
+ record: cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
+ - expr: |-
+ sum by (namespace, cluster) (
+ sum by (namespace, pod, cluster) (
+ max by (namespace, pod, container, cluster) (
+ kube_pod_container_resource_limits{resource="cpu",job="kube-state-metrics"}
+ ) * on (namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
+ kube_pod_status_phase{phase=~"Pending|Running"} == 1
+ )
+ )
+ )
+ record: namespace_cpu:kube_pod_container_resource_limits:sum
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-general.rules
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-general.rules
@@ -48,11 +48,12 @@
other alerts.
This alert fires whenever there's a severity="info" alert, and stops firing when another alert with a
severity of 'warning' or 'critical' starts firing on the same namespace.
This alert should be routed to a null receiver and configured to inhibit alerts with severity="info".
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/general/infoinhibitor
summary: Info-level alert inhibition.
- expr: ALERTS{severity = "info"} == 1 unless on(namespace) ALERTS{alertname !=
- "InfoInhibitor", severity =~ "warning|critical", alertstate="firing"} == 1
+ expr: ALERTS{severity = "info"} == 1 unless on (namespace) ALERTS{alertname
+ != "InfoInhibitor", severity =~ "warning|critical", alertstate="firing"} ==
+ 1
labels:
severity: none
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.pod-owner
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.pod-owner
@@ -0,0 +1,65 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.pod-owner
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.pod_owner
+ rules:
+ - expr: |-
+ max by (cluster, namespace, workload, pod) (
+ label_replace(
+ label_replace(
+ kube_pod_owner{job="kube-state-metrics", owner_kind="ReplicaSet"},
+ "replicaset", "$1", "owner_name", "(.*)"
+ ) * on (replicaset, namespace) group_left(owner_name) topk by (replicaset, namespace) (
+ 1, max by (replicaset, namespace, owner_name) (
+ kube_replicaset_owner{job="kube-state-metrics"}
+ )
+ ),
+ "workload", "$1", "owner_name", "(.*)"
+ )
+ )
+ labels:
+ workload_type: deployment
+ record: namespace_workload_pod:kube_pod_owner:relabel
+ - expr: |-
+ max by (cluster, namespace, workload, pod) (
+ label_replace(
+ kube_pod_owner{job="kube-state-metrics", owner_kind="DaemonSet"},
+ "workload", "$1", "owner_name", "(.*)"
+ )
+ )
+ labels:
+ workload_type: daemonset
+ record: namespace_workload_pod:kube_pod_owner:relabel
+ - expr: |-
+ max by (cluster, namespace, workload, pod) (
+ label_replace(
+ kube_pod_owner{job="kube-state-metrics", owner_kind="StatefulSet"},
+ "workload", "$1", "owner_name", "(.*)"
+ )
+ )
+ labels:
+ workload_type: statefulset
+ record: namespace_workload_pod:kube_pod_owner:relabel
+ - expr: |-
+ max by (cluster, namespace, workload, pod) (
+ label_replace(
+ kube_pod_owner{job="kube-state-metrics", owner_kind="Job"},
+ "workload", "$1", "owner_name", "(.*)"
+ )
+ )
+ labels:
+ workload_type: job
+ record: namespace_workload_pod:kube_pod_owner:relabel
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-rss
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-rss
@@ -0,0 +1,24 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.container-memory-rss
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.container_memory_rss
+ rules:
+ - expr: |-
+ container_memory_rss{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
+ * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (1,
+ max by (cluster, namespace, pod, node) (kube_pod_info{node!=""})
+ )
+ record: node_namespace_pod_container:container_memory_rss
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules
@@ -1,164 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
- name: kube-prometheus-stack-k8s.rules
- namespace: monitoring-system
- labels:
- app: kube-prometheus-stack
- app.kubernetes.io/managed-by: Helm
- app.kubernetes.io/instance: kube-prometheus-stack
- app.kubernetes.io/part-of: kube-prometheus-stack
- release: kube-prometheus-stack
- heritage: Helm
-spec:
- groups:
- - name: k8s.rules
- rules:
- - expr: |-
- sum by (cluster, namespace, pod, container) (
- irate(container_cpu_usage_seconds_total{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])
- ) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
- 1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
- )
- record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
- - expr: |-
- container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
- * on (cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1,
- max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
- )
- record: node_namespace_pod_container:container_memory_working_set_bytes
- - expr: |-
- container_memory_rss{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
- * on (cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1,
- max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
- )
- record: node_namespace_pod_container:container_memory_rss
- - expr: |-
- container_memory_cache{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
- * on (cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1,
- max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
- )
- record: node_namespace_pod_container:container_memory_cache
- - expr: |-
- container_memory_swap{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
- * on (cluster, namespace, pod) group_left(node) topk by(cluster, namespace, pod) (1,
- max by(cluster, namespace, pod, node) (kube_pod_info{node!=""})
- )
- record: node_namespace_pod_container:container_memory_swap
- - expr: |-
- kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
- group_left() max by (namespace, pod, cluster) (
- (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
- )
- record: cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
- - expr: |-
- sum by (namespace, cluster) (
- sum by (namespace, pod, cluster) (
- max by (namespace, pod, container, cluster) (
- kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"}
- ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
- kube_pod_status_phase{phase=~"Pending|Running"} == 1
- )
- )
- )
- record: namespace_memory:kube_pod_container_resource_requests:sum
- - expr: |-
- kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
- group_left() max by (namespace, pod, cluster) (
- (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
- )
- record: cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
- - expr: |-
- sum by (namespace, cluster) (
- sum by (namespace, pod, cluster) (
- max by (namespace, pod, container, cluster) (
- kube_pod_container_resource_requests{resource="cpu",job="kube-state-metrics"}
- ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
- kube_pod_status_phase{phase=~"Pending|Running"} == 1
- )
- )
- )
- record: namespace_cpu:kube_pod_container_resource_requests:sum
- - expr: |-
- kube_pod_container_resource_limits{resource="memory",job="kube-state-metrics"} * on (namespace, pod, cluster)
- group_left() max by (namespace, pod, cluster) (
- (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
- )
- record: cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
- - expr: |-
- sum by (namespace, cluster) (
- sum by (namespace, pod, cluster) (
- max by (namespace, pod, container, cluster) (
- kube_pod_container_resource_limits{resource="memory",job="kube-state-metrics"}
- ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
- kube_pod_status_phase{phase=~"Pending|Running"} == 1
- )
- )
- )
- record: namespace_memory:kube_pod_container_resource_limits:sum
- - expr: |-
- kube_pod_container_resource_limits{resource="cpu",job="kube-state-metrics"} * on (namespace, pod, cluster)
- group_left() max by (namespace, pod, cluster) (
- (kube_pod_status_phase{phase=~"Pending|Running"} == 1)
- )
- record: cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
- - expr: |-
- sum by (namespace, cluster) (
- sum by (namespace, pod, cluster) (
- max by (namespace, pod, container, cluster) (
- kube_pod_container_resource_limits{resource="cpu",job="kube-state-metrics"}
- ) * on(namespace, pod, cluster) group_left() max by (namespace, pod, cluster) (
- kube_pod_status_phase{phase=~"Pending|Running"} == 1
- )
- )
- )
- record: namespace_cpu:kube_pod_container_resource_limits:sum
- - expr: |-
- max by (cluster, namespace, workload, pod) (
- label_replace(
- label_replace(
- kube_pod_owner{job="kube-state-metrics", owner_kind="ReplicaSet"},
- "replicaset", "$1", "owner_name", "(.*)"
- ) * on(replicaset, namespace) group_left(owner_name) topk by(replicaset, namespace) (
- 1, max by (replicaset, namespace, owner_name) (
- kube_replicaset_owner{job="kube-state-metrics"}
- )
- ),
- "workload", "$1", "owner_name", "(.*)"
- )
- )
- labels:
- workload_type: deployment
- record: namespace_workload_pod:kube_pod_owner:relabel
- - expr: |-
- max by (cluster, namespace, workload, pod) (
- label_replace(
- kube_pod_owner{job="kube-state-metrics", owner_kind="DaemonSet"},
- "workload", "$1", "owner_name", "(.*)"
- )
- )
- labels:
- workload_type: daemonset
- record: namespace_workload_pod:kube_pod_owner:relabel
- - expr: |-
- max by (cluster, namespace, workload, pod) (
- label_replace(
- kube_pod_owner{job="kube-state-metrics", owner_kind="StatefulSet"},
- "workload", "$1", "owner_name", "(.*)"
- )
- )
- labels:
- workload_type: statefulset
- record: namespace_workload_pod:kube_pod_owner:relabel
- - expr: |-
- max by (cluster, namespace, workload, pod) (
- label_replace(
- kube_pod_owner{job="kube-state-metrics", owner_kind="Job"},
- "workload", "$1", "owner_name", "(.*)"
- )
- )
- labels:
- workload_type: job
- record: namespace_workload_pod:kube_pod_owner:relabel
-
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-cache
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-memory-cache
@@ -0,0 +1,24 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.container-memory-cache
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.container_memory_cache
+ rules:
+ - expr: |-
+ container_memory_cache{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}
+ * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (1,
+ max by (cluster, namespace, pod, node) (kube_pod_info{node!=""})
+ )
+ record: node_namespace_pod_container:container_memory_cache
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-cpu-usage-seconds-tot
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-k8s.rules.container-cpu-usage-seconds-tot
@@ -0,0 +1,25 @@
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+ name: kube-prometheus-stack-k8s.rules.container-cpu-usage-seconds-tot
+ namespace: monitoring-system
+ labels:
+ app: kube-prometheus-stack
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/instance: kube-prometheus-stack
+ app.kubernetes.io/part-of: kube-prometheus-stack
+ release: kube-prometheus-stack
+ heritage: Helm
+spec:
+ groups:
+ - name: k8s.rules.container_cpu_usage_seconds_total
+ rules:
+ - expr: |-
+ sum by (cluster, namespace, pod, container) (
+ irate(container_cpu_usage_seconds_total{job="kubelet", metrics_path="/metrics/cadvisor", image!=""}[5m])
+ ) * on (cluster, namespace, pod) group_left(node) topk by (cluster, namespace, pod) (
+ 1, max by (cluster, namespace, pod, node) (kube_pod_info{node!=""})
+ )
+ record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
+
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-system-apiserver
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-system-apiserver
@@ -19,47 +19,47 @@
annotations:
description: A client certificate used to authenticate to kubernetes apiserver
is expiring in less than 7.0 days.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
summary: Client certificate is about to expire.
expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
- > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
+ > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
< 604800
for: 5m
labels:
severity: warning
- alert: KubeClientCertificateExpiration
annotations:
description: A client certificate used to authenticate to kubernetes apiserver
is expiring in less than 24.0 hours.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
summary: Client certificate is about to expire.
expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
- > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
+ > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
< 86400
for: 5m
labels:
severity: critical
- alert: KubeAggregatedAPIErrors
annotations:
description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
}} has reported errors. It has appeared unavailable {{ $value | humanize
}} times averaged over the past 10m.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeaggregatedapierrors
summary: Kubernetes aggregated API has reported errors.
- expr: sum by(name, namespace, cluster)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[10m]))
+ expr: sum by (name, namespace, cluster)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[10m]))
> 4
labels:
severity: warning
- alert: KubeAggregatedAPIDown
annotations:
description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
}} has been only {{ $value | humanize }}% available over the last 10m.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeaggregatedapidown
summary: Kubernetes aggregated API is down.
- expr: (1 - max by(name, namespace, cluster)(avg_over_time(aggregator_unavailable_apiservice{job="apiserver"}[10m])))
+ expr: (1 - max by (name, namespace, cluster)(avg_over_time(aggregator_unavailable_apiservice{job="apiserver"}[10m])))
* 100 < 85
for: 5m
labels:
severity: warning
- alert: KubeAPIDown
annotations:
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubelet.rules
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubelet.rules
@@ -13,24 +13,24 @@
heritage: Helm
spec:
groups:
- name: kubelet.rules
rules:
- expr: histogram_quantile(0.99, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{job="kubelet",
- metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on(cluster, instance)
+ metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on (cluster, instance)
group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"})
labels:
quantile: '0.99'
record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
- expr: histogram_quantile(0.9, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{job="kubelet",
- metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on(cluster, instance)
+ metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on (cluster, instance)
group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"})
labels:
quantile: '0.9'
record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
- expr: histogram_quantile(0.5, sum(rate(kubelet_pleg_relist_duration_seconds_bucket{job="kubelet",
- metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on(cluster, instance)
+ metrics_path="/metrics"}[5m])) by (cluster, instance, le) * on (cluster, instance)
group_left(node) kubelet_node_name{job="kubelet", metrics_path="/metrics"})
labels:
quantile: '0.5'
record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
--- cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-apps
+++ cluster/base HelmRelease: monitoring-system/kube-prometheus-stack PrometheusRule: monitoring-system/kube-prometheus-stack-kubernetes-apps
@@ -31,16 +31,16 @@
description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready
state for longer than 15 minutes.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodnotready
summary: Pod has been in a non-ready state for more than 15 minutes.
expr: |-
sum by (namespace, pod, cluster) (
- max by(namespace, pod, cluster) (
+ max by (namespace, pod, cluster) (
kube_pod_status_phase{job="kube-state-metrics", namespace=~".*", phase=~"Pending|Unknown|Failed"}
- ) * on(namespace, pod, cluster) group_left(owner_kind) topk by(namespace, pod, cluster) (
- 1, max by(namespace, pod, owner_kind, cluster) (kube_pod_owner{owner_kind!="Job"})
+ ) * on (namespace, pod, cluster) group_left(owner_kind) topk by (namespace, pod, cluster) (
+ 1, max by (namespace, pod, owner_kind, cluster) (kube_pod_owner{owner_kind!="Job"})
)
) > 0
for: 15m
labels:
severity: warning
- alert: KubeDeploymentGenerationMismatch
@@ -221,13 +221,13 @@
annotations:
description: Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking
more than {{ "43200" | humanizeDuration }} to complete.
runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubejobnotcompleted
summary: Job did not complete in time
expr: |-
- time() - max by(namespace, job_name, cluster) (kube_job_status_start_time{job="kube-state-metrics", namespace=~".*"}
+ time() - max by (namespace, job_name, cluster) (kube_job_status_start_time{job="kube-state-metrics", namespace=~".*"}
and
kube_job_status_active{job="kube-state-metrics", namespace=~".*"} > 0) > 43200
labels:
severity: warning
- alert: KubeJobFailed
annotations: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
renovate/helm
Pull request to a Renovate helm update
renovatebot
Pull request created by Renovate
type/minor
Pull request of type minor version bump
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
54.1.0
->54.2.0
Release Notes
prometheus-community/helm-charts (kube-prometheus-stack)
v54.2.0
Compare Source
kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
What's Changed
Full Changelog: prometheus-community/helm-charts@prometheus-25.8.0...kube-prometheus-stack-54.2.0
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.