Skip to main content

Configure Space-level observability

important

This feature is GA since v1.14.0, requires Spaces v1.6.0, and is off by default. To enable, set observability.enabled=true (features.alpha.observability.enabled=true before v1.14.0) when installing Spaces:

up space init --token-file="${SPACES_TOKEN_PATH}" "v${SPACES_VERSION}" \
...
--set "observability.enabled=true" \

This guide explains how to configure Space-level observability. This feature is only applicable to self-hosted Space administrators. This lets Space administrators observe the cluster infrastructure where the Space software gets installed.

When you enable observability in a Space, Upbound deploys a single OpenTelemetry Collector to collect and export metrics and logs to your configured observability backends.

Prerequisites

This feature requires the OpenTelemetry Operator on the Space cluster. Install this now if you haven't already:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.116.0/opentelemetry-operator.yaml

If running Spaces v1.11 or later, use OpenTelemetry Operator v0.110.0 or later due to breaking changes in the OpenTelemetry Operator.

Configuration

To configure how Upbound exports, review the spacesCollector value in your Space installation Helm chart. Below is an example of an otlphttp compatible endpoint.

observability:
spacesCollector:
config:
exporters:
otlphttp:
endpoint: "<your-endpoint>"
headers:
api-key: YOUR_API_KEY
exportPipeline:
logs:
- otlphttp
metrics:
- otlphttp

You can export metrics and logs from your Crossplane installation, Spaces infrastructure (controller, API, router, etc.), provider-helm, and provider-kubernetes.

Router metrics

The Spaces router component uses Envoy as a reverse proxy and exposes detailed metrics about request handling, circuit breakers, and connection pooling. Upbound collects these metrics in your Space after you enable Space-level observability.

Envoy metrics in Upbound include:

  • Upstream cluster metrics - Request status codes, timeouts, retries, and latency for traffic to control planes and services
  • Circuit breaker metrics - Connection and request circuit breaker state for both DEFAULT and HIGH priority levels
  • Downstream listener metrics - Client connections and requests received
  • HTTP connection manager metrics - End-to-end HTTP request processing and latency

For a complete list of available router metrics and example PromQL queries, see the Router metrics reference.

Available metrics

Space-level observability collects metrics from multiple infrastructure components:

Infrastructure component metrics

  • Crossplane controller metrics
  • Spaces controller, API, and router metrics
  • Provider metrics (provider-helm, provider-kubernetes)

Router metrics

The router component exposes Envoy proxy metrics for monitoring traffic flow and service health. Key metric categories include:

  • envoy_cluster_upstream_rq_* - Upstream request metrics (status codes, timeouts, retries, latency)
  • envoy_cluster_circuit_breakers_* - Circuit breaker state and capacity
  • envoy_listener_downstream_* - Client connection and request metrics
  • envoy_http_downstream_* - HTTP request processing metrics

Example query to monitor total request rate:

sum(rate(envoy_cluster_upstream_rq_total{job="spaces-router-envoy"}[5m]))

For detailed router metrics documentation and more query examples, see the Router metrics reference.

OpenTelemetryCollector image

Control plane (SharedTelemetry) and Space observability deploy the same custom OpenTelemetry Collector image. The OpenTelemetry Collector image supports otlhttp, datadog, and debug exporters.

For more information on observability configuration, review the Helm chart reference.

Observability in control planes

Read the observability documentation to learn about the features Upbound offers for collecting telemetry from control planes.

Router metrics reference

Upstream cluster metrics

MetricDescription
envoy_cluster_upstream_rq_xx_totalHTTP status codes (2xx, 3xx, 4xx, 5xx) with label envoy_response_code_class
envoy_cluster_upstream_rq_timeout_totalRequests that timed out waiting for upstream
envoy_cluster_upstream_rq_retry_limit_exceeded_totalRequests that exhausted retry attempts
envoy_cluster_upstream_rq_totalTotal upstream requests
envoy_cluster_upstream_rq_time_bucketLatency histogram (for P50/P95/P99 calculations)
envoy_cluster_upstream_rq_time_sumSum of request durations
envoy_cluster_upstream_rq_time_countCount of requests

Circuit breaker metrics

NameDescription
envoy_cluster_circuit_breakers_default_cx_openDEFAULT priority connection circuit breaker open (gauge)
envoy_cluster_circuit_breakers_default_rq_openDEFAULT priority request circuit breaker open (gauge)
envoy_cluster_circuit_breakers_default_remaining_cxAvailable DEFAULT priority connections (gauge)
envoy_cluster_circuit_breakers_default_remaining_rqAvailable DEFAULT priority request slots (gauge)
envoy_cluster_circuit_breakers_high_cx_openHIGH priority connection circuit breaker open (gauge)
envoy_cluster_circuit_breakers_high_rq_openHIGH priority request circuit breaker open (gauge)
envoy_cluster_circuit_breakers_high_remaining_cxAvailable HIGH priority connections (gauge)
envoy_cluster_circuit_breakers_high_remaining_rqAvailable HIGH priority request slots (gauge)

Downstream listener metrics

NameDescription
envoy_listener_downstream_rq_xx_totalHTTP status codes for responses sent to clients
envoy_listener_downstream_rq_totalTotal requests received from clients
envoy_listener_downstream_cx_totalTotal connections from clients
envoy_listener_downstream_cx_activeCurrently active client connections (gauge)

HTTP connection manager metrics

NameDescription
envoy_http_downstream_rq_xxHTTP status codes (note: no _total suffix for this metric family)
envoy_http_downstream_rq_totalTotal HTTP requests received
envoy_http_downstream_rq_time_bucketDownstream request latency histogram
envoy_http_downstream_rq_time_sumSum of downstream request durations
envoy_http_downstream_rq_time_countCount of downstream requests