Configure Space-level observability
This feature is in preview in v1.13.0 and GA since v1.14.0, requires Spaces v1.6.0. This feature is off by
default. To enable, set observability.enabled=true
(features.alpha.observability.enabled=true before v1.14.0) when installing
Spaces:
up space init --token-file="${SPACES_TOKEN_PATH}" "v${SPACES_VERSION}" \
...
--set "observability.enabled=true" \
The observability feature collects telemetry data from user-facing control plane workloads like:
- Crossplane
- Providers
- Functions
Self-hosted Spaces users can add control plane system workloads such as the
api-server, etcd by setting the
observability.collectors.includeSystemTelemetry Helm flag to true.
Sensitive data
To avoid exposing sensitive data in the SharedTelemetryConfig resource, use
Kubernetes secrets to store the sensitive data and reference the secret in the
SharedTelemetryConfig resource.
Create the secret in the same namespace/group as the SharedTelemetryConfig
resource. The example below uses kubectl create secret to create a new secret:
kubectl create secret generic sensitive -n <STC_NAMESPACE> \
--from-literal=apiKey='YOUR_API_KEY'
Next, reference the secret in the SharedTelemetryConfig resource:
apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
name: newrelic
spec:
configPatchSecretRefs:
- name: sensitive
key: apiKey
path: exporters.otlphttp.headers.api-key
controlPlaneSelector:
labelSelectors:
- matchLabels:
org: foo
exporters:
otlphttp:
endpoint: https://otlp.nr-data.net
headers:
api-key: dummy # This value is replaced by the secret value, can be omitted
exportPipeline:
metrics: [otlphttp]
traces: [otlphttp]
logs: [otlphttp]
The configPatchSecretRefs field in the spec specifies the secret name,
key, and path values to inject the secret value in the
SharedTelemetryConfig resource.
This guide explains how to configure Space-level observability. This feature is only applicable to self-hosted Space administrators. This lets Space administrators observe the cluster infrastructure where the Space software gets installed.
When you enable observability in a Space, Upbound deploys a single OpenTelemetry Collector to collect and export metrics, logs, and traces to your configured observability backends.
Prerequisites
This feature requires the OpenTelemetry Operator on the Space cluster. Install this now if you haven't already:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.116.0/opentelemetry-operator.yaml
If running Spaces v1.11 or later, use OpenTelemetry Operator v0.110.0 or later due to breaking changes in the OpenTelemetry Operator.
Configuration
To configure how Upbound exports, review the spacesCollector value in your Space installation Helm chart. Below is an example of an otlphttp compatible endpoint.
observability:
spacesCollector:
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: my-secret
key: api-key
config:
exporters:
otlphttp:
endpoint: "<your-endpoint>"
headers:
api-key: ${env:API_KEY}
exportPipeline:
logs:
- otlphttp
metrics:
- otlphttp
traces:
- otlphttp
You can export metrics, logs, and traces from your Crossplane installation, Spaces infrastructure (controller, API, router, etc.), provider-helm, and provider-kubernetes.
Available metrics
Space-level observability collects metrics from multiple infrastructure components:
Infrastructure component metrics
- Crossplane controller metrics
- Spaces controller, API, and router metrics
- Provider metrics (provider-helm, provider-kubernetes)
Router metrics
The router component exposes Envoy proxy metrics for monitoring traffic flow and service health. Key metric categories include:
envoy_cluster_upstream_rq_*- Upstream request metrics (status codes, timeouts, retries, latency)envoy_cluster_circuit_breakers_*- Circuit breaker state and capacityenvoy_listener_downstream_*- Client connection and request metricsenvoy_http_downstream_*- HTTP request processing metrics
Example query to monitor total request rate:
sum(rate(envoy_cluster_upstream_rq_total{job="spaces-router-envoy"}[5m]))
Example query for P95 latency:
histogram_quantile(
0.95,
sum by (le) (
rate(envoy_cluster_upstream_rq_time_bucket{job="spaces-router-envoy"}[5m])
)
)
For detailed router metrics documentation and more query examples, see the Router metrics reference.
OpenTelemetryCollector image
Control plane (SharedTelemetry) and Space observability deploy the same custom
OpenTelemetry Collector image. The OpenTelemetry Collector image supports
otlhttp, datadog, and debug exporters.
For more information on observability configuration, review the Helm chart reference.
Observability in control planes
Read the observability documentation to learn about the features Upbound offers for collecting telemetry from control planes.
Router metrics reference
To avoid overwhelming observability tools with hundreds of Envoy metrics, an allow-list filters metrics to only the following metric families.
Upstream cluster metrics
Metrics tracking requests sent from Envoy to configured upstream clusters. Individual control planes, spaces-api, and other services are each considered an upstream cluster. Use these metrics to monitor service health, identify upstream errors, and measure backend latency.
| Metric | Description |
|---|---|
envoy_cluster_upstream_rq_xx_total | HTTP status codes (2xx, 3xx, 4xx, 5xx) with label envoy_response_code_class |
envoy_cluster_upstream_rq_timeout_total | Requests that timed out waiting for upstream |
envoy_cluster_upstream_rq_retry_limit_exceeded_total | Requests that exhausted retry attempts |
envoy_cluster_upstream_rq_total | Total upstream requests |
envoy_cluster_upstream_rq_time_bucket | Latency histogram (for P50/P95/P99 calculations) |
envoy_cluster_upstream_rq_time_sum | Sum of request durations |
envoy_cluster_upstream_rq_time_count | Count of requests |
Circuit breaker metrics
Metrics tracking circuit breaker state and remaining capacity. Circuit breakers
prevent cascading failures by limiting connections and concurrent requests to
unhealthy upstreams. Two priority levels exist: DEFAULT for watch requests and
HIGH for API requests.
| Name | Description |
|---|---|
envoy_cluster_circuit_breakers_default_cx_open | DEFAULT priority connection circuit breaker open (gauge) |
envoy_cluster_circuit_breakers_default_rq_open | DEFAULT priority request circuit breaker open (gauge) |
envoy_cluster_circuit_breakers_default_remaining_cx | Available DEFAULT priority connections (gauge) |
envoy_cluster_circuit_breakers_default_remaining_rq | Available DEFAULT priority request slots (gauge) |
envoy_cluster_circuit_breakers_high_cx_open | HIGH priority connection circuit breaker open (gauge) |
envoy_cluster_circuit_breakers_high_rq_open | HIGH priority request circuit breaker open (gauge) |
envoy_cluster_circuit_breakers_high_remaining_cx | Available HIGH priority connections (gauge) |
envoy_cluster_circuit_breakers_high_remaining_rq | Available HIGH priority request slots (gauge) |
Downstream listener metrics
Metrics tracking requests received from clients such as kubectl and API consumers. Use these metrics to monitor client connection patterns, overall request volume, and responses sent to external users.
| Name | Description |
|---|---|
envoy_listener_downstream_rq_xx_total | HTTP status codes for responses sent to clients |
envoy_listener_downstream_rq_total | Total requests received from clients |
envoy_listener_downstream_cx_total | Total connections from clients |
envoy_listener_downstream_cx_active | Currently active client connections (gauge) |
HTTP connection manager metrics
Metrics from Envoy's HTTP connection manager tracking end-to-end request processing. These metrics provide a comprehensive view of the HTTP request lifecycle including status codes and client-perceived latency.
| Name | Description |
|---|---|
envoy_http_downstream_rq_xx | HTTP status codes (note: no _total suffix for this metric family) |
envoy_http_downstream_rq_total | Total HTTP requests received |
envoy_http_downstream_rq_time_bucket | Downstream request latency histogram |
envoy_http_downstream_rq_time_sum | Sum of downstream request durations |
envoy_http_downstream_rq_time_count | Count of downstream requests |