The following metrics are being consumed by the Kubecost installation.
The Cost Model both exports and consumes the following metrics.
Metric | Description |
---|---|
node_cpu_hourly_cost |
Hourly cost per vCPU on this node |
node_gpu_hourly_cost |
Hourly cost per GPU on this node |
node_ram_hourly_cost |
Hourly cost per Gb of memory on this node |
node_total_hourly_cost |
Total node cost per hour |
kubecost_load_balancer_cost |
Hourly cost of a load balancer |
kubecost_cluster_management_cost |
Hourly cost paid as a cluster management fee |
pv_hourly_cost |
Hourly cost per GP on a persistent volume |
node_gpu_count |
Number of GPUs available on node |
container_cpu_allocation |
Average number of CPUs requested/used over last 1m |
container_gpu_allocation |
Average number of GPUs requested over last 1m |
container_memory_allocation_bytes |
Average bytes of RAM requested/used over last 1m |
pod_pvc_allocation |
Bytes provisioned for a PVC attached to a pod |
kubecost_node_is_spot |
Cloud provider info about node preemptibility |
kubecost_network_zone_egress_cost |
Total cost per GB egress across zones |
kubecost_network_region_egress_cost |
Total cost per GB egress across regions |
kubecost_network_internet_egress_cost |
Total cost per GB of internet egress |
service_selector_labels |
Service Selector Labels |
deployment_match_labels |
Deployment Match Labels |
statefulSet_match_labels |
StatefulSet Match Labels |
kubecost_cluster_memory_working_set_bytes |
(Created by recording rule) |
The Kubecost network-costs daemonset collects node network data and exports the egress, ingress, and performance statistics.
Metric | Description |
---|---|
kubecost_pod_network_egress_bytes_total |
egressed byte counts by pod |
kubecost_pod_network_ingress_bytes_total |
ingressed byte counts by pod |
kubecost_network_costs_parsed_entries |
total parsed conntrack entries |
kubecost_network_costs_parse_time |
total time in milliseconds it took to parse conntrack entries |
cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.
GitHub: https://github.com/google/cadvisor
Metric | Description |
---|---|
container_memory_usage_bytes |
Current memory usage, including all memory regardless of when it was accessed |
container_fs_limit_bytes |
Number of bytes that can be consumed by the container on this filesystem |
container_fs_usage_bytes |
Number of bytes that are consumed by the container on this filesystem |
container_memory_working_set_bytes |
Current working set |
container_network_receive_bytes_total |
Cumulative count of bytes received |
container_network_transmit_bytes_total |
Cumulative count of bytes transmitted |
container_cpu_usage_seconds_total |
Cumulative cpu time consumed |
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
The following KSM metrics are both consumed and emitted by the Kubecost installation. The cost-model
replicates all of these metrics such that a KSM installation is not actually required. Read more here.
GitHub: https://github.com/kubernetes/kube-state-metrics
Metric | Description |
---|---|
kube_deployment_spec_replicas |
Number of pods specified for a Deployment |
kube_deployment_status_replicas_available |
Number of pods currently available for a Deployment |
kube_job_status_failed |
The number of pods which reached Phase Failed and the reason for failure |
kube_namespace_annotations |
Kubernetes annotations converted to Prometheus labels |
kube_namespace_labels |
Kubernetes labels converted to Prometheus labels |
kube_node_labels |
Kubernetes labels converted to Prometheus labels |
kube_node_status_allocatable |
The allocatable for different resources of a node that are available for scheduling |
kube_node_status_allocatable_cpu_cores |
Total allocatable cpu cores of the node (Depecated in ksm 2.0.0) |
kube_node_status_allocatable_memory_bytes |
Total allocatable memory bytes of the node (Depecated in ksm 2.0.0) |
kube_node_status_capacity |
The capacity for different resources of a node |
kube_node_status_capacity_cpu_cores |
Total cpu cores available on the the node (Depecated in ksm 2.0.0) |
kube_node_status_capacity_memory_bytes |
Total memory available on the node (bytes) (Depecated in ksm 2.0.0) |
kube_node_status_condition |
The condition of a cluster node |
kube_persistentvolume_capacity_bytes |
Total capacity of a persistent volume (bytes) |
kube_persistentvolume_status_phase |
Status of a persistent volume (Bound |
kube_persistentvolumeclaim_info |
Information about persistent volume claim |
kube_persistentvolumeclaim_resource_requests_storage_bytes |
The capacity of storage requested by the persistent volume claim |
kube_pod_annotations |
Kubernetes annotations converted to Prometheus labels |
kube_pod_container_resource_limits |
The number of requested limit resource by a container |
kube_pod_container_resource_limits_cpu_cores |
Limit on CPU cores that can be used by the container. (Depecated in ksm 2.0.0) |
kube_pod_container_resource_limits_memory_bytes |
Limit on the amount of memory that can be used by the container. (Depecated in ksm 2.0.0) |
kube_pod_container_resource_requests |
The number of requested request resource by a container |
kube_pod_container_status_restarts_total |
The number of container restarts per container |
kube_pod_container_status_running |
Describes whether the container is currently in running state |
kube_pod_container_status_terminated_reason |
Describes the reason the container is currently in terminated state |
kube_pod_labels |
Kubernetes labels converted to Prometheus labels |
kube_pod_owner |
Information about the Pod's owner |
kube_pod_status_phase |
The pods current phase (Pending |
kube_replicaset_owner |
Information about the ReplicaSet's owner |
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
GitHub: https://github.com/prometheus/node_exporter
Metric | Description |
---|---|
node_cpu_seconds_total |
Seconds the cpus spent in each mode |
node_disk_reads_completed |
The total number of reads completed successfully |
node_disk_reads_completed_total |
The total number of reads completed successfully |
node_disk_writes_completed |
The total number of writes completed successfully |
node_disk_writes_completed_total |
The total number of writes completed successfully |
node_filesystem_device_error |
Whether an error occurred while getting statistics for the given device |
node_memory_Buffers_bytes |
Memory information field Buffers_bytes |
node_memory_Cached_bytes |
Memory information field Cached_bytes |
node_memory_MemAvailable_bytes |
Memory information field MemAvailable_bytes |
node_memory_MemFree_bytes |
Memory information field MemFree_bytes |
node_memory_MemTotal_bytes |
Memory information field MemTotal_bytes |
node_network_transmit_bytes_total |
Network device statistic transmit_bytes |
Prometheus emits metrics which are used by Kubecost for diagnostic purposes:
Metric | Description |
---|---|
up |
Scrape target status |
prometheus_target_interval_length_seconds |
Amount of time between target scrapes |
NVIDIA GPU monitoring support can be explained in more detail on the Kubecost Blog: Monitoring NVIDIA GPU Usage in Kubernetes with Prometheus. The following metrics are consumed:
GitHub: https://github.com/NVIDIA/k8s-device-plugin
Metric | Description |
---|---|
DCGM_FI_DEV_GPU_UTIL |
GPU utilization |