In large workloads or control plane migration, you may performance impacting
resource constraints. This guide explains how to scale vCluster and etcd
resources for optimal performance in your self-hosted Space.
Signs of resource constraints
You may need to scale your vCluster or etcd
resources if you observe:
- API server timeout errors such as
http: Handler timeout
- Error messages about
too many requests
and requests totry again later
- Operations like provider installation failing with errors like
cannot apply provider package secret
- vCluster pods experiencing continuous restarts
- API performance degrades with high resource volume
Scaling vCluster API server resources
The vCluster API server (vcluster-api
) handles Kubernetes API requests for
your control planes. Deployments with multiple control planes or providers may
exceed default resource allocations.
# Default settings
controlPlanes.api.resources.limits.cpu: "2000m"
controlPlanes.api.resources.requests.cpu: "100m"
controlPlanes.api.resources.requests.memory: "1000Mi"
For larger workloads, like migrating from an existing control plane with several
providers, increase these resource limits in your Spaces values.yaml
file.
controlPlanes:
api:
resources:
limits:
cpu: "4000m" # Increase to 4 cores
memory: "6Gi" # Increase to 6GB memory
requests:
cpu: "500m" # Increase baseline CPU request
memory: "2Gi" # Increase baseline memory request
Scaling etcd
storage
Kubernetes relies on etcd
performance, which can lead to IOPS (input/output
operations per second) bottlenecks. Upbound allocates 50Gi
volumes for etcd
in cloud environments to ensure adequate IOPS performance.
# Default setting
controlPlanes.etcd.persistence.size: "5Gi"
For production environments or when migrating large control planes, increase
etcd
volume size and specify an appropriate storage class:
controlPlanes:
etcd:
persistence:
size: "50Gi" # Recommended for production
storageClassName: "fast-ssd" # Use a high-performance storage class
Storage class considerations
For AWS:
- Use GP3 volumes with adequate IOPS
- For AWS GP3 volumes, IOPS scale with volume size (3000 IOPS baseline)
- For optimal performance, provision at least 32Gi to support up to 16,000 IOPS
For GCP and Azure:
- Use SSD-based persistent disk types for optimal performance
- Consider premium storage options for high-throughput workloads
Scaling Crossplane resources
Crossplane manages provider resources in your control planes. You may need to increase provider resources for larger deployments:
# Default settings
controlPlanes.uxp.resourcesCrossplane.requests.cpu: "370m"
controlPlanes.uxp.resourcesCrossplane.requests.memory: "400Mi"
For environments with many providers or managed resources:
controlPlanes:
uxp:
resourcesCrossplane:
limits:
cpu: "1000m" # Add CPU limit
memory: "1Gi" # Add memory limit
requests:
cpu: "500m" # Increase CPU request
memory: "512Mi" # Increase memory request
High availability configuration
For production environments, enable High Availability mode to ensure resilience:
controlPlanes:
ha:
enabled: true
Best practices for migration scenarios
When migrating from existing control planes into a self-hosted Space:
- Pre-scale resources: Scale up resources before performing the migration
- Monitor resource usage: Watch resource consumption during and after migration with
kubectl top pods
- Scale incrementally: If issues persist, increase resources incrementally until performance stabilizes
- Consider storage performance:
etcd
is sensitive to storage I/O performance
Helm values configuration
Apply these settings through your Spaces Helm values file:
controlPlanes:
api:
resources:
limits:
cpu: "4000m"
memory: "6Gi"
requests:
cpu: "500m"
memory: "2Gi"
etcd:
persistence:
size: "50Gi"
storageClassName: "gp3" # Use your cloud provider's fast storage class
uxp:
resourcesCrossplane:
limits:
cpu: "1000m"
memory: "1Gi"
requests:
cpu: "500m"
memory: "512Mi"
ha:
enabled: true # For production environments
Apply the configuration using Helm:
helm upgrade --install spaces oci://xpkg.upbound.io/spaces-artifacts/spaces \
-f values.yaml \
-n upbound-system
Considerations
- Provider count: Each provider adds resource overhead - consider using provider families to optimize resource usage
- Managed resources: The number of managed resources impacts CPU usage more than memory
- Vertical pod autoscaling: Consider using vertical pod autoscaling in Kubernetes to automatically adjust resources based on usage
- Storage performance: Storage performance is as important as capacity for etcd
- Network latency: Low-latency connections between components improve performance