Analyze pod crashes

important

This guide requires an Upbound control plane instance running UXP v2.0 or later. Upbound SaaS coming soon.

Upbound Crossplane is capable of running Intelligent Controllers, which define AI-augmented functions to perform tasks. This guide walks through a use case for using AI to analyze and remediate issues for app deployments, such as out-of-memory and pods stuck in a crashloopbackoff.

Prerequisites

Before you begin make sure you have:

An Upbound Account
The up CLI installed
An Anthropic API key
An AWS account

Set up your environment

Clone the repository upbound/configuration-deployment-analysis to your machine:

git clone git@github.com:upbound/configuration-deployment-analysis.git

This repository contains a control plane project that defines watch operations. These watch operations define workflows for:

watching for events emitted by pods in a cluster
analyzing them using LLMs and suggesting remediations
the remediations are gated by a human-in-the-loop approval
if the suggested remediation is approved, it gets applied to address the issues

Configure credentials and runtime settings

In the project directory, edit the operations/init-operation/operation.yaml and update the function's input.spec.anthropicApiKey to provide an API key from Anthropic. This operation runs whenever you launch the control plane and configures all the required runtime settings.

Launch the local UXP cluster

In the root of the project directory, launch the control plane locally:

up project run --local

After the control plane gets created, apply a ClusterRole to give Crossplane admin access, so it can create all the Kubernetes resources it needs on the cluster. This object is already defined in the examples folder of the project directory:

kubectl apply -f examples/admin.yaml

Apply example deployments and watch for issues

Apply the examples to demonstrate flows for catching and remediating out-of-memory and crashloopbackoff issues:

kubectl apply -f examples/oomkilled.yaml -f crashloopbackoff.yaml

These workloads intentionally cause the deployed pods to exhibit respective errors.

Observe analyses and remediations

The operation function pipeline creates Analysis and Remediation resources based on observed behaviors. Fetch an analysis object to observe suggestions for remediation made by the LLM:

kubectl get Analysis -n crossplane-system
kubectl describe Analysis -n crossplane-system <your-analysis-object>

Clean up

Clean up the local control plane to prevent it from continuing to invoke your LLM. Run the following command:

up project stop

Next steps

Read the concept documentation for Intelligent Controllers to learn more about using AI-powered functions in your function pipelines.

Prerequisites​

Set up your environment​

Configure credentials and runtime settings​

Launch the local UXP cluster​

Apply example deployments and watch for issues​

Observe analyses and remediations​

Clean up​

Next steps​