GPU and accelerator support
vCluster supports GPU and accelerator workloads when the node exposes those devices through standard Kubernetes mechanisms. vCluster does not configure the physical GPU, install the vendor driver, or choose the device presentation mode. The node image, operating system, and vendor device plugin or Dynamic Resource Allocation driver own that layer.
From the tenant cluster's perspective, GPU workloads use the same Kubernetes APIs they would use on a regular cluster:
- Extended resources such as
nvidia.com/gpuoramd.com/gpu. - Vendor device plugins, such as the NVIDIA device plugin, AMD GPU device plugin, or an accelerator vendor's equivalent plugin.
- GPU Operators, when the vendor provides one.
- Dynamic Resource Allocation (DRA) objects such as
DeviceClass,ResourceClaim, andResourceClaimTemplate. - Optional higher-level schedulers or platforms, such as NVIDIA KAI Scheduler, NVIDIA Run:ai, or Slurm integrations.
vCluster role​
vCluster provides the tenant Kubernetes control plane and syncs the Kubernetes objects that workloads need. It also lets tenants run isolated clusters on shared or private worker nodes. It does not sit in the device path between a pod and the GPU.
This means:
- If a node advertises
nvidia.com/gpu, a tenant workload can requestnvidia.com/gpu. - If a node advertises
amd.com/gpu, a tenant workload can requestamd.com/gpu. - If an accelerator vendor exposes a Kubernetes device plugin or DRA driver, vCluster can work with that driver's resources and DRA objects.
- If the required driver, runtime configuration, device plugin, or DRA driver is missing from the node or tenant cluster, vCluster cannot make the device appear by itself.
For private nodes, each tenant cluster can run its own GPU Operator, device plugin, DRA driver, scheduler, and accelerator CRDs. This is the common model for GPU cloud platforms because the tenant owns the full worker-node software stack.
For shared host nodes, the device plugin and drivers usually run on the control plane cluster nodes. Tenant workloads can use the resources that the shared nodes advertise, subject to the sync and scheduling configuration.
Install the NVIDIA GPU Operator in a tenant cluster​
With Private Nodes, the tenant cluster can install the NVIDIA GPU Operator directly because its workloads run on dedicated worker nodes. This is the common pattern for AI cloud and inference provider platforms: the tenant cluster owns the GPU Operator, device plugin, DCGM Exporter, MIG Manager, and related CRDs for its private node pool.
Install the GPU Operator inside the tenant cluster only when that tenant owns the GPU node software stack. On shared host nodes, the platform team usually installs the GPU driver stack and device plugin on the control plane cluster nodes instead.
Before you install​
Prepare the private GPU nodes before installing the Operator:
- Provision or join the GPU nodes to the tenant cluster.
- Confirm the nodes run a supported Linux distribution, kernel, and container runtime.
- Decide whether the Operator should install the NVIDIA driver or use a driver that is already installed in the node image.
- Decide whether the tenant cluster needs MIG, NVIDIA vGPU, CDI, GPUDirect, or DCGM metrics.
- Confirm
kubectlandhelmpoint at the tenant cluster, not the control plane cluster.
For the full vendor matrix and chart options, see the NVIDIA GPU Operator installation guide.
Install with Helm​
Create the Operator namespace and label it for privileged workloads if your cluster uses Pod Security Admission:
kubectl create namespace gpu-operator
kubectl label --overwrite namespace gpu-operator pod-security.kubernetes.io/enforce=privileged
Add the NVIDIA Helm repository:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
Install the Operator in the tenant cluster:
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--wait
If your private node image already includes the NVIDIA driver, disable driver installation:
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--wait \
--set driver.enabled=false
For inference endpoints that use GPU metrics, enable pod labels on DCGM Exporter so Prometheus can associate GPU metrics with the consuming pod:
helm upgrade --install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--wait \
--set dcgmExporter.enablePodLabels=true
Verify GPU availability​
Check the Operator pods:
kubectl get pods -n gpu-operator
kubectl get clusterpolicy
Confirm the tenant cluster sees nvidia.com/gpu on the private GPU nodes:
kubectl get nodes -o 'custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu'
Run a CUDA smoke test:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vectoradd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0-ubuntu22.04
resources:
limits:
nvidia.com/gpu: 1
Apply it and check the logs:
kubectl apply -f cuda-vectoradd.yaml
kubectl logs pod/cuda-vectoradd
kubectl delete -f cuda-vectoradd.yaml
If the pod stays pending, check node readiness, taints, GPU resource names, project quotas, allowed node types, and whether the private node has joined the tenant cluster. If Operator pods fail, check the node OS, kernel, container runtime, driver installation mode, and the NVIDIA Operator logs.
Supported vendors and accelerators​
NVIDIA​
NVIDIA GPUs commonly use the NVIDIA GPU Operator or the NVIDIA device plugin.
The node advertises resources such as nvidia.com/gpu.
Workloads request that resource in resources.limits.
The node's driver and GPU Operator configuration control NVIDIA-specific modes such as MIG or NVIDIA vGPU. vCluster consumes the resulting Kubernetes resources. It does not create MIG partitions or configure vGPU profiles.
AMD​
AMD GPUs use the same Kubernetes mechanism.
Install and configure the AMD driver stack and AMD GPU device plugin, AMD GPU Operator, or DRA driver.
The node then advertises the AMD resource, commonly amd.com/gpu.
Tenant workloads request that resource like any other Kubernetes extended resource.
For DRA configuration, see Dynamic resource allocation and device classes.
Other accelerators​
Other accelerators, such as SambaNova devices, FPGAs, DPUs, or custom AI accelerators, follow the same rule. If the vendor exposes the device to Kubernetes, vCluster can work with that Kubernetes-facing interface.
Check the vendor documentation for the exact resource name, driver installation steps, and CRDs. Also confirm where the operator or controller should run.
Dynamic resource allocation and device classes​
Dynamic Resource Allocation is useful when workloads need more detail than a simple resource count. For example, workloads might need device attributes, capacity slices, or administrator-controlled device classes.
DRA sync is disabled by default. To use DRA with shared host nodes, enable the settings your workload needs:
deviceClassessyncs allowedDeviceClassresources from the control plane cluster to the tenant cluster.resourceClaimssyncs tenant-createdResourceClaimresources to the control plane cluster.resourceClaimTemplatessyncs tenant-createdResourceClaimTemplateresources to the control plane cluster.
Once deviceClasses sync is enabled, platform administrators create DeviceClass resources on the control plane cluster and choose which classes are visible in each tenant cluster.
For private nodes, tenants can also run the DRA driver and related controllers inside their tenant cluster when they own the worker-node software stack.
Hardware presentation modes​
GPU presentation mode is determined before vCluster schedules a workload:
| Mode | Where it is configured | vCluster role |
|---|---|---|
| Bare-metal PCIe passthrough | Physical server, OS image, driver, and device plugin | Workloads request the advertised Kubernetes resource |
| NVIDIA vGPU | NVIDIA vGPU host and guest driver stack, OS image, and operator or plugin configuration | Workloads request the resource exposed by that stack |
| NVIDIA MIG | NVIDIA GPU Operator or device plugin configuration | Workloads request the MIG resources advertised by the plugin |
| DRA device allocation | Vendor DRA driver and DeviceClass resources | Syncs allowed DRA objects between the control plane cluster and tenant cluster |
If you provision physical GPU servers with vMetal, vMetal controls the bare metal lifecycle and node OS image. The OS image and post-provision configuration determine which GPU drivers, vGPU stack, MIG strategy, or vendor plugins are available. For that layer, see GPU presentation modes in vMetal.
Summary checklist​
To make GPU or accelerator workloads work in a tenant cluster:
- Prepare the worker node with the required firmware, OS image, kernel modules, and vendor driver stack.
- Install the vendor device plugin, GPU Operator, or DRA driver in the right cluster.
- Confirm the node advertises the expected resource or DRA devices.
- Configure vCluster sync for any required CRDs, scheduler objects, or DRA objects.
- Run a workload that requests the advertised resource name or references the synced
DeviceClass.
For GPU bare metal provisioning and OS image guidance, see vMetal GPU Quickstart.