For optimal reading, please switch to desktop mode.
Note: The content of this article relates to NVIDIA VGPU release 15.
When even your grandpa is asking you for some GPU resource to serve the latest
language model, it's probably time to get serious about GPU. But you
don't want to give him that whole NVIDIA A100 80GB, right? That's where
NVIDIA's virtual GPU technology comes in; one or more virtual GPUs can be
created on a single physical card, and whilst this technology isn't new, it has
always been complex to set up. With the new vgpu role in our
stackhpc.linux Ansible collection,
we've tried to make this easier.
Before you get started
vGPU functionality requires a commercial license. The guest instances will need to
be able to reach a license server, which can either be cloud hosted on NVIDIA
infrastructure (CLS) or hosted on your own infrastructure (DLS). For more
details see this NVIDIA knowledge article.
Contact NVIDIA for purchasing details.
To MIG or not to MIG
MIG stands for "Multi-instance GPU" and it is a way to split a supported
card
into multiple separate partitions with dedicated resources. It differs from
legacy vGPUs which used time slicing to schedule work onto the GPU. In MIG
mode, NVIDIA promises deterministic latency and throughput as each workload
runs in parallel. Sounds great, right? But the cost is that you lose a seventh
of the physical resources of the GPU to management:
There are seven SM slices, not eight, because some SMs cover operational overhead when MIG mode is enabled.
Analysing the performance differences will be the focus of a future blog
article, but for those who can't wait, here are some benchmarking results from
VMware
Ansible Role
Whether you choose MIG or time slicing for your vGPUs, the
stackhpc.linux.vgpu role has you covered. It's published on Ansible
Galaxy, so all you need to do is
add the following snippet to your requirements.yml:
collections:
- name: stackhpc.linux
version: 1.0.1
Define some variables (e.g in <ansible_inventory>/group_vars/vgpu):
# Path to GRID driver downloaded from the NVIDIA licensing portal
vgpu_driver_url: "file://{{ lookup('env', 'HOME') }}/NVIDIA-GRID-Linux-KVM-525.105.14-525.105.17-528.89.zip"
#nvidia-692 GRID A100D-4C
#nvidia-693 GRID A100D-8C
#nvidia-694 GRID A100D-10C
#nvidia-695 GRID A100D-16C
#nvidia-696 GRID A100D-20C
#nvidia-697 GRID A100D-40C
#nvidia-698 GRID A100D-80C
#nvidia-699 GRID A100D-1-10C
#nvidia-700 GRID A100D-2-20C
#nvidia-701 GRID A100D-3-40C
#nvidia-702 GRID A100D-4-40C
#nvidia-703 GRID A100D-7-80C
#nvidia-707 GRID A100D-1-10CME
vgpu_definitions:
# Configuring a MIG backed vGPU
- pci_address: "0000:17:00.0"
virtual_functions:
- mdev_type: nvidia-700
index: 0
- mdev_type: nvidia-700
index: 1
- mdev_type: nvidia-700
index: 2
- mdev_type: nvidia-699
index: 3
mig_devices:
"1g.10gb": 1
"2g.20gb": 3
# Configuring a card in a time-sliced configuration (non-MIG backed)
- pci_address: "0000:65:00.0"
virtual_functions:
- mdev_type: nvidia-697
index: 0
- mdev_type: nvidia-697
index: 1
and run this simple playbook:
---
- hosts: vgpu
tags:
- iommu
tasks:
- import_role:
name: stackhpc.linux.iommu
handlers:
- name: reboot
reboot:
reboot_timeout: 3600
become: true
- hosts: vgpu
tags:
- vgpu
tasks:
- import_role:
name: stackhpc.linux.vgpu
handlers:
- name: reboot
reboot:
reboot_timeout: 3600
become: true
Could it get much easier? See the role
documentation
for more information on the configuration options; as well as a more detailed
walkthrough of the steps you need to get everything fully configured.
It's Apache 2.0 licensed, so happy hacking, and don't forget to contribute any
useful changes back.
OpenStack Config
Of course, creating the mediated devices on the host is not enough. You have to pass
them through to a virtual machine to make use of them. Since we are an OpenStack
shop, we will use the example of configuring OpenStack Nova. Just add the following
snippet to your nova-compute service's nova.conf:
[devices]
enabled_mdev_types = nvidia-700, nvidia-699
[mdev_nvidia-700]
device_addresses = 0000:21:00.4,0000:21:00.5,0000:21:00.6,0000:81:00.4,0000:81:00.5,0000:81:00.6
mdev_class = CUSTOM_NVIDIA_700
[mdev_nvidia-699]
device_addresses = 0000:21:00.7,0000:81:00.7
mdev_class = CUSTOM_NVIDIA_699
[devices]
enabled_mdev_types = nvidia-697
[mdev_nvidia-697]
device_addresses = 0000:21:00.4,0000:21:00.5,0000:81:00.4,0000:81:00.5
# Custom resource classes don't work when you only have single resource type.
mdev_class = vGPU
You will need to adjust the PCI addresses to match the PCI addresses
of your VGPU virtual functions. These can be obtained by checking
the mdevctl configuration after running the role:
# mdevctl list
73269d0f-b2c9-438d-8f28-f9e4bc6c6995 0000:17:00.4 nvidia-700 manual (defined)
dc352ef3-efeb-4a5d-a48e-912eb230bc76 0000:17:00.5 nvidia-700 manual (defined)
a464fbae-1f89-419a-a7bd-3a79c7b2eef4 0000:17:00.6 nvidia-700 manual (defined)
f3b823d3-97c8-4e0a-ae1b-1f102dcb3bce 0000:17:00.7 nvidia-699 manual (defined)
330be289-ba3f-4416-8c8a-b46ba7e51284 0000:65:00.4 nvidia-697 manual (defined)
1ba5392c-c61f-4f48-8fb1-4c6b2bbb0673 0000:65:00.5 nvidia-697 manual (defined)
The mdev_class maps to a resource class that you can set in your flavor definition.
Note that if you only define a single mdev type on a given hypervisor, then the
mdev_class configuration option is silently ignored and it will use the vGPU
resource class (bug?).
Openstack Flavors
Define some flavors that request the resource class that was configured in nova.conf.
An example definition, that can be used with openstack.cloud.compute_flavor Ansible module,
is shown below:
vgpu_a100_2g_20gb:
name: "vgpu.a100.2g.20gb"
ram: 65536
disk: 30
vcpus: 8
is_public: false
extra_specs:
hw:cpu_policy: "dedicated"
hw:cpu_thread_policy: "prefer"
hw:mem_page_size: "1GB"
hw:cpu_sockets: 2
hw:numa_nodes: 8
hw_rng:allowed: "True"
resources:CUSTOM_NVIDIA_700: "1"
You now should be able to launch a VM with this flavor.
Wait! A Spanner in the Works
Just when we thought we'd got this vGPU thing sussed, NVIDIA threw a spanner in
the works. Starting with the V16 release
of the GRID drivers (July 2023), NVIDIA dropped support for the following cards:
- Graphics cards that support only C-series vGPUs, namely:
- NVIDIA H800 PCIe 80GB
- NVIDIA H100 PCIe 80GB
- NVIDIA A800 PCIe 80GB
- NVIDIA A800 PCIe 80GB liquid cooled
- NVIDIA A800 HGX 80GB
- NVIDIA A100 PCIe 80GB
- NVIDIA A100 PCIe 80GB liquid cooled
- NVIDIA A100X
- NVIDIA A100 HGX 80GB
- NVIDIA A100 PCIe 40GB
- NVIDIA A100 HGX 40GB
- NVIDIA A30
- NVIDIA A30X
Instead, these graphics cards are supported with NVIDIA AI Enterprise.
A full table can be found here. This means that
you can no longer use your Virtual Compute Server license and will need to
get it upgraded to the NVIDIA AI Enterprise equivalent (at least if you want
driver updates)! You also need to install alternative drivers on the
hypervisor. We will follow up with details of the technical implications of
this change in a subsequent blog article.
Future Blogs
This is the first blog in a series. The following articles are in the pipeline:
- vGPUs in OpenStack with Kayobe
- Switching to NVIDIA AI Enterprise
- Dynamic vGPU parititioning on OpenStack with Cyborg
- vGPU Performance shootout: MIG vs time slicing
Stay tuned...
Get in touch
If you would like to get in touch we would love to hear
from you. Reach out to us via Twitter,
LinkedIn
or directly via our contact page.