Observability
Important

This feature is in preview. It is enabled by default in Cloud Spaces. To enable it in a self-hosted Space, set features.alpha.observability.enabled=true when installing the Space:

up space init --token-file="${SPACES_TOKEN_PATH}" "v${SPACES_VERSION}" \
  ...
  --set "features.alpha.observability.enabled=true" \

Upbound offers a built-in feature to help you collect and export logs, metrics, and traces for everything running in a Control Plane. Upbound provides an integrated observability pipeline built on the OpenTelemetry project.

Benefits

The observability feature allows you to:

  • collect, process, and expose telemetry data in control planes.
  • deploy a collector per control plane.
  • Pass data to external observability backends, such as Datadog, New Relic, and more.

How it works

The pipeline deploys OpenTelemetry Collectors to collect, process, and expose telemetry data from control planes. Upbound deploys a collector per control plane, defined by a SharedTelemetryConfig set up at the group level. Control plane collectors pass their data to external observability backends defined in the SharedTelemetryConfig.

SharedTelemetryConfig

SharedTelemetryConfig is a custom resource that defines the telemetry configuration for a group of control planes. This resources allows you to specify the exporters and pipelines your control planes use to send telemetry data to your external observability backends.

The following is an example of a SharedTelemetryConfig resource that sends metrics and traces to New Relic:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: newrelic
  namespace: default
spec:
  controlPlaneSelector:
    labelSelectors:
      - matchLabels:
          org: foo
  exporters:
    otlphttp:
      endpoint: https://otlp.nr-data.net
      headers:
        api-key: YOUR_API_KEY
  exportPipeline:
    metrics: [otlphttp]
    traces: [otlphttp]
    logs: [otlphttp]

The controlPlaneSelector field specifies the control planes that use this configuration. The exporters field specifies the configuration for the exporters. Each exporter configuration is unique and corresponds to its OpenTelemetry Collector configuration. The exportPipeline field specifies the control plane pipelines that send telemetry data to the exporters. The metrics, traces and logs fields specify the names of the pipelines the control planes use to send metrics, traces, and logs respectively. The names of the pipelines correspond to the exporters in the OpenTelemetry Collector service pipeline configuration.

Usage

SharedTelemetryConfigs are group-scoped resources. This lets you configure telemetry collection for each group of control planes in a Space.

Important
Your control plane can only use a single SharedTelemetryConfig. If you create multiple SharedTelemetryConfigs that select the same control plane, the one applied first takes precedence. The other SharedTelemetryConfigs fail to provision in the control plane due to conflict.

Currently supported exporters are:

  • datadog (review the OpenTelemetry documentation for configuration details)
  • otelhttp (used by New Relic among others, review the New Relic documentation for configuration details)

The example below shows how to configure a SharedTelemetryConfig resource to send metrics, traces and logs to Datadog:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: datadog
  namespace: default
spec:
  controlPlaneSelector:
    labelSelectors:
      - matchLabels:
          org: foo
  exporters:
    datadog:
      api:
        site: ${DATADOG_SITE}
        key: ${DATADOG_API_KEY}
  exportPipeline:
    metrics: [datadog]
    traces: [datadog]
    logs: [datadog]

Control plane selection

To configure which control planes in a group you want to provision a telemetry collector into, use the spec.controlPlaneSelector field. You can either use labelSelectors or the names of a control plane directly. A control plane matches if any of the label selectors match.

This example matches all control planes in the group that have environment: production as a label:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: telemetry-collector
spec:
  controlPlaneSelector:
    labelSelectors:
      - matchLabels:
          environment: production

You can use the more complex matchExpressions to match labels based on an expression. This example matches control planes that have label environment: production or environment: staging:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: telemetry-collector
spec:
  controlPlaneSelector:
    labelSelectors:
      - matchExpressions:
        - { key: environment, operator: In, values: [production,staging] }

You can also specify the names of control planes directly:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: telemetry-collector
spec:
  controlPlaneSelector:
    names:
    - controlplane-dev
    - controlplane-staging
    - controlplane-prod

Sensitive data

Important
This feature is available from Spaces v1.10

To avoid exposing sensitive data in the SharedTelemetryConfig resource, use Kubernetes secrets to store the sensitive data and reference the secret in the SharedTelemetryConfig resource.

Create the secret in the same namespace/group as the SharedTelemetryConfig resource. The example below uses kubectl create secret to create a new secret:

kubectl create secret generic sensitive -n <STC_NAMESPACE>  \
    --from-literal=apiKey='YOUR_API_KEY'

Next, reference the secret in the SharedTelemetryConfig resource:

apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: newrelic
spec:
  configPatchSecretRefs:
    - name: sensitive
      key: apiKey
      path: exporters.otlphttp.headers.api-key
  controlPlaneSelector:
    labelSelectors:
      - matchLabels:
          org: foo
  exporters:
    otlphttp:
      endpoint: https://otlp.nr-data.net
      headers:
        api-key: dummy # This value is replaced by the secret value, can be omitted
  exportPipeline:
    metrics: [otlphttp]
    traces: [otlphttp]
    logs: [otlphttp]

The configPatchSecretRefs field in the spec specifies the secret name, key, and path values to inject the secret value in the SharedTelemetryConfig resource.

Telemetry processing

Important
This feature is available from Spaces v1.11.

The SharedTelemetryConfig resource allows you to configure a processing pipeline for the telemetry data collected by the OpenTelemetry Collector. Like spec.exporters, the spec.processors field allows you to configure the processors that transform the telemetry data for the exporters. It follows the OpenTelmetry Collector processor configuration.

For now, the only supported processor is the transform processor.

Telemetry transforms

The transform processor allows for the transformation of telemetry data using the OpenTelemetry Transformation Language.

The transform processor can transform metrics, logs, and traces at different scopes and allows you to use conditionals to select specific data.

Example of useful transformations include:

  • adding, removing, and modifying attributes. Renaming, concatenating multiple labels, etc
  • converting metric types (gauge to sum)
  • for more information, review the transform processor README

Important considerations:

  • Your context determines the transformation scope.
  • conditions are “any match condition” field. If your data meets any condition, the transformation applies to that data.

Some useful examples:

Adding an attribute/label to metrics
apiVersion: observability.spaces.upbound.io/v1alpha1
kind: SharedTelemetryConfig
metadata:
  name: datadog
  namespace: default
spec:
  controlPlaneSelector:
    labelSelectors:
      - matchLabels:
          org: foo
  exporters:
    datadog:
      api:
        site: ${DATADOG_SITE}
        key: ${DATADOG_API_KEY}
  exportPipeline:
    metrics: [datadog]
    traces: [datadog]
    logs: [datadog]
  processors:
    transform:
      error_mode: ignore
      metric_statements:
        - context: datapoint
          statements:
            - set(attributes["newLabel"], "someLabel")
  processorPipeline:
    metrics: [transform]

You can also add the label only to a specific metric:

...
processors:
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["newLabel"], "someLabel") where metric.name == "crossplane_managed_resource_ready"
...
Removing labels

From metrics:

...
- processors:
  transform:
    metric_statements:
      - context: datapoint
        statements:
          - delete_key(attributes, "kubernetes_namespace")
...

From logs:

- processors:
  transform:
    log_statements:
      - context: log
        statements:
          - delete_key(attributes, "log.file.name")
Modifying logs
...
- processors:
  transform:
    log_statements:
      - context: log
        statements:
          - set(attributes["original"], body) # save the original log message
          - set(body, Concat(["log message:", body], " ")) # add a prefix to the log message
...
References

For more information, review the following transform processor documentation:

Status

If successful, Upbound creates the SharedTelemetryConfig resource and provisions the OpenTelemetry Collector for the selected control plane. To see the status, run kubectl get stc:

 kubectl get stc
NAME       SELECTED   FAILED   PROVISIONED   AGE
datadog    1          0        1             63s
  • SELECTED shows the number of control planes selected by the SharedTelemetryConfig.
  • FAILED shows the number of control planes that failed to provision the OpenTelemetry Collector.
  • PROVISIONED shows the provisioned and running OpenTelemetry Collectors on each control plane.

To return the names of control planes selected and provisioned, review the resource status:

...
status:
  selected:
    - ctp
  provisioned:
    - ctp

If a conflict or another issue occurs, the failed control planes status returns the failure conditions:

k get stc
NAME       SELECTED   FAILED   PROVISIONED   AGE
datadog    1          1        0             63s
...
status:
  failed:
  - conditions:
    - lastTransitionTime: "2024-04-26T09:32:28Z"
      message: 'control plane dev is already managed by another SharedTelemetryConfig:
        newrelic'
      reason: SelectorConflict
      status: "True"
      type: Failed
    controlPlane: ctp
  selectedControlPlanes:
  - ctp

Upbound marks the control plane as provisioned only if the OpenTelemetry Collector is deployed and running. There could be a delay in the status update if the OpenTelemetry Collector is currently deploying:

 k get stc
NAME       SELECTED   FAILED   PROVISIONED   AGE
datadog    1          0        0             63s