For optimal reading, please switch to desktop mode.
What is the single biggest avoidable toil in deployment of OpenStack? Hard to choose, but at StackHPC we've recently been looking at one of our pet grievances, around the way that we've been creating the images for provisioning an HPC-enabled overcloud.
An HPC-enabled overcloud might differ in various ways, in order to offer high performance connectivity, or greater efficiency - whether that be in compute overhead or data movement.
In this specific instance, we are looking at incorporating Open Fabrics for binding the network-oriented data services that our hypervisor is providing to its guests.
Open Fabrics on Mellanox Ethernet
We take the view that CPU cycles spent in the hypervisor are taken from our clients, and we do what we can to minimise this. We've had good success in demonstrating the advantages of both SR-IOV and RDMA for trimming the fat from hypervisor data movement.
Remote DMA (RDMA) is supported by integrating packages from Open Fabrics enterprise distribution (OFED), an alternative networking stack that bypasses the kernel's TCP/IP stack to deliver data directly to the processes requesting it. Mellanox produce their own version of OFED, developed and targeted specifically for their NICs.
TripleO and the Red Hat OpenStack Ecosystem
Red Hat's ecosystem is built upon TripleO, which uses DiskImage-Builder (DIB) - with a good deal of extra customisation in the form of DIB elements.
The TripleO project have done a load of good work to integrate the invocation of DIB into the OpenStack client. The images created in TripleO's process include the overcloud images used for hypervisors and controller nodes deployed using TripleO. Conventionally the same image is used for all overcloud roles but as we've shown in previous articles we can built distinct images tailored to compute, control, networking or storage as required.
Introducing the Tar Pit
We'd been following a process of taking the output from the OpenStack client's openstack overcloud image build command (the overcloud images are in QCOW2 format at this point) and then using virt-customize to boot a captive VM in order to apply site-specific transformations, including the deployment of OFED.
We've previously covered the issues around creating Mellanox OFED packages specifically built for the kernel version embedded in OpenStack overcloud images. The repo produced is made available on our intranet, and accessed by the captive VM instantiated by virt-customize.
This admittedly works, but sucks in numerous ways:
- It adds a heavyweight extra stage to our deployment process (and one that requires a good deal of extra software dependencies).
- OFED really fattens up the image and this is probably the slowest possible way in which it could be integrated into the deployment.
- It adds significant complexity to scripting an automated ground-up redeployment.
The Rainy Day
Through our work on Kolla-on-Bifrost (a.k.a Kayobe) we have been building our own DiskImage-Builder elements. Our deployments for the Square Kilometre Array telescope have had us looking again at the image building process. A quiet afternoon led us to put the work in to integrating our own HPC-specific DIB elements into a single-step process for generating overcloud images. For TripleO deployments, we now integrate our steps into the invocation of the TripleO OpenStack CLI, as described in the TripleO online documentation.
Here's how:
- We install our MLNX-OFED repo on an intranet webserver acting as a package repo as before. In TripleO this can easily be the undercloud seed node. It's best for future control plane upgrades if it is a server that is reachable from the OpenStack overcloud instances when they are active.
- We use a git repo of StackHPC's toolbox of DIB elements
- We define some YAML for adding our element to TripleO's overcloud-full image build (call this overcloud-images-stackhpc.yaml):
disk_images:
-
imagename: overcloud-full
elements:
- mlnx-ofed
environment:
# Example: point this to your intranet's unpacked MLNX-OFED repo
DIB_MLNX_OFED_VERSION: 4.0-2
DIB_MLNX_OFED_REPO: http://172.16.8.2/repo/MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64
DIB_MLNX_OFED_DELETE_REPO: n
DIB_MLNX_OFED_PKGLIST: "mlnx-ofed-hypervisor mlnx-fw-updater"
- Define some environment variables. Here we select to build Ocata stable images. DiskImage-Builder doesn't extend any existing value assigned for ELEMENTS_PATH, so we must define all of TripleO's elements locations, plus our own:
export STABLE_RELEASE="ocata"
export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean*"
export ELEMENTS_PATH=/home/stack/stackhpc-image-elements/elements:\
/usr/share/tripleo-image-elements:\
/usr/share/instack-undercloud:\
/usr/share/tripleo-puppet-elements
- Invoke the OpenStack client providing configurations - here for a CentOS overcloud image - plus our overcloud-images-stackhpc.yaml fragment:
openstack overcloud image build \
--config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images.yaml\
--config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos7.yaml \
--config-file /home/stack/stackhpc-image-elements/overcloud-images-stackhpc.yaml
All going to plan, the result is an RDMA-enabled overcloud image, done right (or at least, better than it was before).
Share and enjoy!