For optimal reading, please switch to desktop mode.
A Brief Introduction to Single-Root I/O Virtualisation (SR-IOV)
Setup for SR-IOV
Aside from OpenStack, deployment of SR-IOV involves configuration at many levels.
BIOS needs to be configured to enable both Virtualization Technology and SR-IOV.
Mellanox NIC firmware must be configured to enable the creation of SR-IOV VFs and define the maximum number of VFs to support. This requires the installation of the Mellanox Firmware Tools (MFT) package from Mellanox OFED.
Kernel boot parameters are required to support direct access to SR-IOV hardware:
intel_iommu=on iommu=pt
A number of VFs can be created by writing the required number to a file under /sys, for example: /sys/class/net/eno6/device/sriov_numvfs
NOTE: There are certain NIC models (e.g. Mellanox Connect-X 3) that do not support management via sysfs, those need to be configured using modprobe (see modprobe.d man page).
This is typically done as a udev trigger script on insertion of the PF device. The upper limit set for VFs is given by another (read-only) file in the same directory.
As a framework for management using infrastructure-as-code principles and Ansible at every level, Kayobe provides support for running custom Ansible playbooks on the inventory and groups of the infrastructure deployment. Over time StackHPC has developed a number of roles to perform additional configuration as a custom site playbook. A recent addition is a Galaxy role for SR-IOV setup
A simple custom site playbook could look like this:
---
- name: Configure SR-IOV
hosts: compute_sriov
tasks:
- include_role:
name: stackhpc.sriov
handlers:
- name: reboot
include_tasks: tasks/reboot.yml
tags: reboot
...
This playbook would then be invoked from the Kayobe CLI:
(kayobe) $ kayobe playbook run sriov.yml
Once the system is prepared for supporting SR-IOV, OpenStack configuration is required to enable VF resource management, scheduling according to VF availability, and pass-through of the VF to VMs that request it.
SR-IOV and LAGs
An additional complication might be that hypervisors use bonded NICs to provide network access for VMs. This provides greater fault tolerance. However, a VF is normally associated with only one PF (and the two PFs in a bond would lead to inconsistent connectivity).
Mellanox NICs have a feature, VF-LAG, which claims to enable SR-IOV to work in configurations where the ports of a 2-port NIC are bonded together.
Setup for VF-LAG requires additional steps and complexities, and we'll be covering it in greater detail in another blog post soon.
Nova Configuration
Scheduling with Hardware Resource Awareness
SR-IOV VFs are managed in the same way as PCI-passthrough hardware (eg, GPUs). Each VF is managed as a hardware resource. The Nova scheduler must be configured not to schedule instances requesting SR-IOV resources to hypervisors with none available. This is done using the PciPassthroughFilter scheduler filter.
In Kayobe config, the Nova scheduler filters are configured by defining non-default parameters in nova.conf. In the kayobe-config repo, add this to etc/kayobe/kolla/config/nova.conf:
[filter_scheduler]
available_filters = nova.scheduler.filters.all_filters
enabled_filters = other-filters,PciPassthroughFilter
(The other filters listed may vary according to other configuration applied to the system).
Hypervisor Hardware Resources for Passthrough
The nova-compute service on each hypervisor requires configuration to define which hardware/VF resources are to be made available for passthrough to VMs. In addition, for infrastructure with multiple physical networks, an association must be made to define which VFs connect to which physical network. This is done by defining a whitelist (pci_passthrough_whitelist) of available hardware resources on the compute hypervisors. This can be tricky to configure if the available resources are different in an environment with multiple variants of hypervisor hardware specification. One solution using Kayobe's inventory is to define whitelist hardware mappings either globally, or in group variables or even individual host variables as follows:
# Physnet to device mappings for SR-IOV, used for the pci
# passthrough whitelist and sriov-agent configs
sriov_physnet_mappings:
p4p1: physnet2
This state can then be applied by adding a macro-expanded term to etc/kayobe/kolla/config/nova.conf:
{% raw %}
[pci]
passthrough_whitelist = [{% for dev, physnet in sriov_physnet_mappings.items() %}{{ (loop.index0 > 0)|ternary(',','') }}{ "devname": "{{ dev }}", "physical_network": "{{ physnet }}" }{% endfor %}]
{% endraw %}
We have used the network device name in for designation here, but other options are available:
- devname: network-device-name(as used above)
- address: pci-bus-addressTakes the form [[[[<domain>]:]<bus>]:][<slot>][.[<function>]].This is a good way of unambiguously selecting a single device in the hardware device tree.
- address: mac-addressCan be wild-carded.Useful if the vendor of the SR-IOV NIC is different from all other NICs in the configuration, so that selection can be made by OUI.
- vendor_id: pci-vendor product_id: pci-deviceA good option for selecting a single hardware device model, wherever they are located.These values are 4-digit hexadecimal (but the conventional 0x prefix is not required).
The vendor ID and device ID are available from lspci -nn (or lspci -x for the hard core). The IDs supplied should be those of the virtual function (VF) not the physical functions, which may be slightly different.
Neutron Configuration
kolla_enable_neutron_sriov: true
Neutron Server
SR-IOV usually connects to VLANs; here we assume Neutron has already been configured to support this. The sriovnicswitch ML2 mechanism driver must be enabled. In Kayobe config, this is added to etc/kayobe/neutron.yml:
# List of Neutron ML2 mechanism drivers to use. If unset the kolla-ansible
# defaults will be used.
kolla_neutron_ml2_mechanism_drivers:
- openvswitch
- l2population
- sriovnicswitch
Neutron SR-IOV NIC Agent
Neutron requires an additional agent to run on compute hypervisors with SR-IOV resources. The SR-IOV agent must be configured with mappings between physical network name and the interface name of the SR-IOV PF. In Kayobe config, this should be added in a file etc/kayobe/kolla/config/neutron/sriov_agent.ini. Again we can do an expansion using the variables drawn from Kayobe config's inventory and extra variables:
{% raw %}
[sriov_nic]
physical_device_mappings = {% for dev, physnet in sriov_physnet_mappings.items() %}{{ (loop.index0 > 0)|ternary(',','') }}{{ physnet }}:{{ dev }}{% endfor %}
exclude_devices =
{% endraw %}