Deployment

You need a Kubernetes cluster as the hosting environment to run Spaces. You can install Spaces into any Kubernetes cluster, version v1.25 or later. Upbound validates the Spaces software runs on AWS EKS, Google Cloud GKE, and Microsoft AKS. Upbound recommends dedicating the Kubernetes cluster for the express purpose of running Spaces as its sole workload.

Deployment requirements

Spaces requires three things:

  1. A Kubernetes cluster.
  2. You’ve configured the Kubernetes cluster with the required prerequisites.
  3. You must have an Upbound account. Spaces is a feature only available for paying customers in the Business Critical tier of Upbound.

This guide helps you think through all steps needed to deploy Spaces for production workloads.

Sizing a Space

In a Space, the managed control planes you create get scheduled as pods across the cluster’s node pools. The hyper scale cloud providers each offer managed Kubernetes services that can support hundreds of nodes in their node pools. That means the number of control planes you can run in a single Space is on the order hundreds–if not more.

Rightsizing a Space for a production deployment depends on several factors:

  1. The number of control planes you plan to run in the Space.
  2. The number of managed resources you plan each control plane to reconcile.
  3. The Crossplane providers you plan to install in each control plane.

Memory considerations

Control plane empty state memory usage

An idle, empty managed control plane consumes about 640 MB of memory. This encompasses the set of pods that constitute a managed control plane and which get deployed for each control plane instance.

Managed resource memory usage

In Upbound’s testing, memory usage isn’t influenced a lot by the number of managed resources under management. Memory usage only goes up slightly by 100 MB when going from 100 to 1000 resource instances under management of a control plane. Hence, for simplicity, you don’t need to account for an increase in memory usage on this axis of the control plane.

Provider memory usage

When you install a Crossplane provider on a control plane, memory gets consumed according to the number of custom resources it defines. Upbound Official Provider families provide higher fidelity control to platform teams to install providers for only the resources they need, reducing the bloat of needlessly installing unused custom resources. Still, you must factor provider memory usage into your calculations to ensure you’ve rightsized the memory available in your Spaces cluster.

Important
Be careful not to conflate managed resource with custom resource definition. The former is an “instance” of an external resource in Crossplane, while the latter defines the API schema of that resource.

It’s estimated that each custom resource definition consumes ~3 MB of memory. The calculation is:

number_of_managed_resources_defined_in_provider x 3 MB = memory_required

For example, if you plan to use provider-aws-ec2, provider-aws-s3, and provider-aws-iam, the resulting calculation is:

provider-aws-ec2: 98 x 3 MB = 294 MB
provider-aws-s3: 23 x 3 MB = 69 MB
provider-aws-iam 22 x 3 MB = 66 MB
---
total memory: 429 MB

In this scenario, you should budget ~430 MB of memory for provider usage on this control plane.

Tip
Do this calculation for each provider you plan to install on your control plane. Then do this calculation for each control plane you plan to run in your Space.

Total memory usage

Add the memory usage from the previous sections. Given the preceding examples, they result in a recommendation to budget ~1 GB memory for each control plane you plan to run in the Space.

Important
The 1 GB number mentioned above is derived from the numbers used in the examples above. You should input your own provider requirements to arrive at a final number for your own deployment.

CPU considerations

Managed resource CPU usage

The number of managed resources under management by a control plane is the largest contributing factor for CPU usage in a Space. CPU usage scales linearly according to the number of managed resources under management by your control plane. In Upbound’s testing, CPU usage requirements does vary from provider to provider. Using the Upbound Official Provider families as a baseline:

ProviderMR create operation (CPU core seconds)MR update or reconciliation operation (CPU core seconds)
provider-family-aws102 to 3
provider-family-gcp71.5
provider-family-azure7 to 101.5 to 3

When resources are in a non-ready state, Crossplane providers reconcile often (as fast as every 15 seconds). Once a resource reaches READY, each Crossplane provider defaults to a 10 minute poll interval. Given this, a 16-core machine has 16x10x60 = 9600 CPU core seconds available. Interpreting this table:

  • A single control plane that needs to create 100 AWS MRs concurrently would consume 1000 CPU core seconds, or about 1.5 cores.
  • A single control plane that continuously reconciles 100 AWS MRs once they’ve reached a READY state would consume 300 CPU core seconds, or a little under half a core.

Since provider-family-aws has the highest recorded numbers for CPU time required, you can use that as an upper limit in your calculations.

Using these calculations and extrapolating values, given a 16 core machine, it’s recommended you don’t exceed a single control plane managing 1000 MRs. Suppose you plan to run 10 control planes, each managing 1000 MRs. You want to make sure your node pool has capacity for 160 cores. If you are using a machine type that has 16 cores per machine, that would mean having a node pool of size 10. If you are using a machine type that has 32 cores per machine, that would mean having a node pool of size 5.

Cloud API latency

Oftentimes, you are using Crossplane providers to talk to external cloud APIs. Those external cloud APIs often have global API rate limits (examples: Azure limits, AWS EC2 limits).

For Crossplane providers built on Upjet (such as Upbound Official Provider families), these providers use Terraform under the covers. They expose some knobs (such as --max-reconcile-rate) you can use to tweak reconciliation rates, described in more detail in the Upjet sizing guide

Resource buffers

The guidance in the preceding sections explains how to calculate CPU and memory usage requirements for:

  • a set of control planes in a Space
  • tuned to the number of providers you plan to use
  • according to the number of managed resource instances you plan to have managed by your control planes

Upbound recommends budgeting an extra buffer of 20% to your resource capacity calculations. The numbers shared in the preceding sections don’t account for peaks or surges since they’re based off average measurements. Upbound recommends budgeting this buffer to account for these things.

Deploying more than one Space

You are welcome to deploy more than one Space. You just need to make sure you have a 1:1 mapping of Space to Kubernetes clusters. Spaces are by their nature constrained to a single Kubernetes Cluster, which are regional entities. If you want to offer managed control planes in multiple cloud environments or multiple public clouds entirely, these are justifications for deploying >1 Spaces.

Cert-manager

A Spaces deployment uses the Certificate Custom Resource from cert-manager to provision certificates within the Space. This establishes a nice API boundary between what your platform may need and the Certificate requirements of a Space.

In the event you would like more control over the issuing Certificate Authority for your deployment or the deployment of cert-manager itself, this guide is for you.

Deploying

An Upbound Space deployment doesn’t have any special requirements for the cert-manager deployment itself. The only expectation is that cert-manager and the corresponding Custom Resources exist in the cluster.

You should be free to install cert-manager in the cluster in any way that makes sense for your organization. You can find some installation ideas in the cert-manager docs.

Issuers

A default Upbound Space install includes a ClusterIssuer. This ClusterIssuer is a selfSigned issuer that other certificates are minted from. You have a couple of options available to you for changing the default deployment of the Issuer:

  1. Changing the issuer name.
  2. Providing your own ClusterIssuer.

Changing the issuer name

The ClusterIssuer name is controlled by the certificates.space.clusterIssuer Helm property. You can adjust this during installation by providing the following parameter (assuming your new name is ‘SpaceClusterIssuer’):

--set ".Values.certificates.space.clusterIssuer=SpaceClusterIssuer"

Providing your own ClusterIsser

To provide your own ClusterIssuer, you need to first setup your own ClusterIssuer in the cluster. The cert-manager docs have a variety of options for providing your own. See the Issuer Configuration docs for more details.

Once you have your own ClusterIssuer set up in the cluster, you need to turn off the deployment of the ClusterIssuer included in the Spaces deployment. To do that, provide the following parameter during installation:

--set ".Values.certificates.provision=false"

Considerations:
If your ClusterIssuer has a name that’s different from the default name that the Spaces installation expects (‘spaces-selfsigned’), you need to also specify your ClusterIssuer name during install using:

--set ".Values.certificates.space.clusterIssuer=<your ClusterIssuer name>"

Ingress

To route requests from an external client (kubectl, ArgoCD, etc) to a control plane, a Spaces deployment includes a default Ingress manifest. In order to ease getting started scenarios, the current Ingress includes configurations (properties and annotations) that assume that you installed the commonly used ingress-nginx ingress controller in the cluster. This section walks you through using a different Ingress, if that’s something that your organization needs.

Default manifest

An example of what the current Ingress manifest included in a Spaces install is below:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mxe-router-ingress
  namespace: upbound-system
  annotations:
    nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/proxy-ssl-verify: "on"
    nginx.ingress.kubernetes.io/proxy-ssl-secret: "upbound-system/mxp-hostcluster-certs"
    nginx.ingress.kubernetes.io/proxy-ssl-name: spaces-router
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Request-Id: $req_id";
      more_set_headers "Request-Id: $req_id";
      more_set_headers "Audit-Id: $req_id";      
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - {{ .Values.ingress.host }}
      secretName: mxe-router-tls
  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: "/v1/controlPlanes"
            pathType: Prefix
            backend:
              service:
                name: spaces-router
                port:
                  name: http

The notable pieces are:

  1. Namespace

This property represents the namespace that the spaces-router is deployed to. In most cases this is upbound-system.

  1. proxy-ssl-* annotations

The spaces-router pod terminates TLS using certificates located in the mxp-hostcluster-certs Secret located in the upbound-system Namespace.

  1. proxy-* annotations

Requests coming into the ingress-controller can be variable depending on what the client is requesting. For example, kubectl get crds has different requirements for the connection compared to a ‘watch’, for example kubectl get pods -w. The ingress-controller is configured to be able to account for either scenario.

  1. configuration-snippets

These commands add headers to the incoming requests that help with telemetry and diagnosing problems within the system.

  1. Rules

Requests coming into the control planes use a /v1/controlPlanes prefix and need to be routed to the spaces-router.

Using a different ingress manifest

Operators can choose to use an Ingress manifest and ingress controller that makes the most sense for their organization. If they want to turn off deploying the default Ingress manifest, they can do so during installation by providing the following parameter during installation:

--set ".Values.ingress.provision=false"

Considerations:

Operators will need to take into account the following considerations when disabling the default Ingress deployment.

  1. Ensure the custom Ingress manifest is placed in the same namespace as the spaces-router pod.
  2. Ensure that the ingress is configured to use a spaces-router as a secure backend and that the secret used is the mxp-hostcluster-certs secret.
  3. Ensure that the ingress is configured to handle long-lived connections.
  4. Ensure that the routing rule sends requests prefixed with /v1/controlPlanes to the spaces-router using the http port.