Scaling vCluster and etcd Resources

In large workloads or control plane migration, you may performance impacting resource constraints. This guide explains how to scale vCluster and etcd resources for optimal performance in your self-hosted Space.

Signs of resource constraints

You may need to scale your vCluster or etcd resources if you observe:

  • API server timeout errors such as http: Handler timeout
  • Error messages about too many requests and requests to try again later
  • Operations like provider installation failing with errors like cannot apply provider package secret
  • vCluster pods experiencing continuous restarts
  • API performance degrades with high resource volume

Scaling vCluster API server resources

The vCluster API server (vcluster-api) handles Kubernetes API requests for your control planes. Deployments with multiple control planes or providers may exceed default resource allocations.

# Default settings
controlPlanes.api.resources.limits.cpu: "2000m"
controlPlanes.api.resources.requests.cpu: "100m"
controlPlanes.api.resources.requests.memory: "1000Mi"

For larger workloads, like migrating from an existing control plane with several providers, increase these resource limits in your Spaces values.yaml file.

controlPlanes:
  api:
    resources:
      limits:
        cpu: "4000m"      # Increase to 4 cores
        memory: "6Gi"     # Increase to 6GB memory
      requests:
        cpu: "500m"       # Increase baseline CPU request
        memory: "2Gi"     # Increase baseline memory request

Scaling etcd storage

Kubernetes relies on etcd performance, which can lead to IOPS (input/output operations per second) bottlenecks. Upbound allocates 50Gi volumes for etcd in cloud environments to ensure adequate IOPS performance.

# Default setting
controlPlanes.etcd.persistence.size: "5Gi"

For production environments or when migrating large control planes, increase etcd volume size and specify an appropriate storage class:

controlPlanes:
  etcd:
    persistence:
      size: "50Gi"                 # Recommended for production
      storageClassName: "fast-ssd" # Use a high-performance storage class

Storage class considerations

For AWS:

  • Use GP3 volumes with adequate IOPS
  • For AWS GP3 volumes, IOPS scale with volume size (3000 IOPS baseline)
  • For optimal performance, provision at least 32Gi to support up to 16,000 IOPS

For GCP and Azure:

  • Use SSD-based persistent disk types for optimal performance
  • Consider premium storage options for high-throughput workloads

Scaling Crossplane resources

Crossplane manages provider resources in your control planes. You may need to increase provider resources for larger deployments:

# Default settings
controlPlanes.uxp.resourcesCrossplane.requests.cpu: "370m"
controlPlanes.uxp.resourcesCrossplane.requests.memory: "400Mi"

For environments with many providers or managed resources:

controlPlanes:
  uxp:
    resourcesCrossplane:
      limits:
        cpu: "1000m"      # Add CPU limit
        memory: "1Gi"     # Add memory limit
      requests:
        cpu: "500m"       # Increase CPU request
        memory: "512Mi"   # Increase memory request

High availability configuration

For production environments, enable High Availability mode to ensure resilience:

controlPlanes:
  ha:
    enabled: true

Best practices for migration scenarios

When migrating from existing control planes into a self-hosted Space:

  1. Pre-scale resources: Scale up resources before performing the migration
  2. Monitor resource usage: Watch resource consumption during and after migration with kubectl top pods
  3. Scale incrementally: If issues persist, increase resources incrementally until performance stabilizes
  4. Consider storage performance: etcd is sensitive to storage I/O performance

Helm values configuration

Apply these settings through your Spaces Helm values file:

controlPlanes:
  api:
    resources:
      limits:
        cpu: "4000m"
        memory: "6Gi"
      requests:
        cpu: "500m"
        memory: "2Gi"
  etcd:
    persistence:
      size: "50Gi"
      storageClassName: "gp3" # Use your cloud provider's fast storage class
  uxp:
    resourcesCrossplane:
      limits:
        cpu: "1000m" 
        memory: "1Gi"
      requests:
        cpu: "500m"
        memory: "512Mi"
  ha:
    enabled: true  # For production environments

Apply the configuration using Helm:

helm upgrade --install spaces oci://xpkg.upbound.io/spaces-artifacts/spaces \
  -f values.yaml \
  -n upbound-system

Considerations

  • Provider count: Each provider adds resource overhead - consider using provider families to optimize resource usage
  • Managed resources: The number of managed resources impacts CPU usage more than memory
  • Vertical pod autoscaling: Consider using vertical pod autoscaling in Kubernetes to automatically adjust resources based on usage
  • Storage performance: Storage performance is as important as capacity for etcd
  • Network latency: Low-latency connections between components improve performance