Metrics

Inspect Karpenter Metrics

Karpenter makes several metrics available in Prometheus format to allow monitoring cluster provisioning status. These metrics are available by default at karpenter.kube-system.svc.cluster.local:8080/metrics configurable via the METRICS_PORT environment variable documented here

Controller Runtime Metrics

controller_runtime_terminal_reconcile_errors_total

Total number of terminal reconciliation errors per controller

  • Stability Level: STABLE

controller_runtime_reconcile_total

Total number of reconciliations per controller

  • Stability Level: STABLE

controller_runtime_reconcile_time_seconds

Length of time per reconciliation per controller

  • Stability Level: STABLE

controller_runtime_reconcile_panics_total

Total number of reconciliation panics per controller

  • Stability Level: STABLE

controller_runtime_reconcile_errors_total

Total number of reconciliation errors per controller

  • Stability Level: STABLE

controller_runtime_max_concurrent_reconciles

Maximum number of concurrent reconciles per controller

  • Stability Level: STABLE

controller_runtime_active_workers

Number of currently used workers per controller

  • Stability Level: STABLE

Workqueue Metrics

workqueue_work_duration_seconds

How long in seconds processing an item from workqueue takes.

  • Stability Level: STABLE

workqueue_unfinished_work_seconds

How many seconds of work has been done that is in progress and hasn’t been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.

  • Stability Level: STABLE

workqueue_retries_total

Total number of retries handled by workqueue

  • Stability Level: STABLE

workqueue_queue_duration_seconds

How long in seconds an item stays in workqueue before being requested

  • Stability Level: STABLE

workqueue_longest_running_processor_seconds

How many seconds has the longest running processor for workqueue been running.

  • Stability Level: STABLE

workqueue_depth

Current depth of workqueue by workqueue and priority

  • Stability Level: STABLE

workqueue_adds_total

Total number of adds handled by workqueue

  • Stability Level: STABLE

AWS SDK Go Metrics

aws_sdk_go_request_total

The total number of AWS SDK Go requests

  • Stability Level: STABLE

aws_sdk_go_request_retry_count

The total number of AWS SDK Go retry attempts per request

  • Stability Level: STABLE

aws_sdk_go_request_duration_seconds

Latency of AWS SDK Go requests

  • Stability Level: STABLE

aws_sdk_go_request_attempt_total

The total number of AWS SDK Go request attempts

  • Stability Level: STABLE

aws_sdk_go_request_attempt_duration_seconds

Latency of AWS SDK Go request attempts

  • Stability Level: STABLE

Leader Election Metrics

leader_election_slowpath_total

Total number of slow path exercised in renewing leader leases. ’name’ is the string used to identify the lease. Please make sure to group by name.

  • Stability Level: STABLE

leader_election_master_status

Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. ’name’ is the string used to identify the lease. Please make sure to group by name.

  • Stability Level: STABLE
Last modified July 15, 2025: chore: Release v1.6.0 (#8275) (7370b5fd)