For optimal reading, please switch to desktop mode.
Published: Tue 06 June 2017
Updated: Tue 06 June 2017
By Stig Telfer
In Deployment .
tags: hpc openstack baremetal tripleo infrastructure
What is the single biggest avoidable toil in deployment of OpenStack?
Hard to choose, but at StackHPC we've recently been looking at one
of our pet grievances, around the way that we've been creating the
images for provisioning an HPC-enabled overcloud.
An HPC-enabled overcloud might differ in various ways, in order to
offer high performance connectivity, or greater efficiency - whether
that be in compute overhead or data movement.
In this specific instance, we are looking at incorporating Open
Fabrics for binding the network-oriented
data services that our hypervisor is providing to its guests.
Open Fabrics on Mellanox Ethernet
We take the view that CPU cycles spent in the hypervisor are taken
from our clients , and we do what we can to minimise this. We've
had good success in demonstrating the advantages of both SR-IOV and RDMA
for trimming the fat from hypervisor data movement.
Remote DMA (RDMA) is supported by integrating packages from Open
Fabrics enterprise distribution (OFED), an alternative networking
stack that bypasses the kernel's TCP/IP stack to deliver data
directly to the processes requesting it. Mellanox produce their
own version of OFED ,
developed and targeted specifically for their NICs.
TripleO and the Red Hat OpenStack Ecosystem
Red Hat's ecosystem is built upon TripleO ,
which uses DiskImage-Builder (DIB) - with
a good deal of extra customisation in the form of DIB elements.
The TripleO project have done a load of good work to integrate the
invocation of DIB into the OpenStack client. The images created
in TripleO's process include the overcloud images used for hypervisors
and controller nodes deployed using TripleO. Conventionally the
same image is used for all overcloud roles but as we've shown in
previous articles
we can built distinct images tailored to compute, control, networking
or storage as required.
Introducing the Tar Pit
We'd been following a process of taking the output from the OpenStack
client's openstack overcloud image build command (the overcloud
images are in QCOW2 format at this point) and then using
virt-customize to boot a captive VM in order to apply site-specific
transformations, including the deployment of OFED.
We've previously covered
the issues around creating Mellanox OFED packages specifically built
for the kernel version embedded in OpenStack overcloud images. The
repo produced is made available on our intranet, and accessed
by the captive VM instantiated by virt-customize .
This admittedly works, but sucks in numerous ways:
It adds a heavyweight extra stage to our deployment process (and one that
requires a good deal of extra software dependencies).
OFED really fattens up the image and this is probably the slowest possible way
in which it could be integrated into the deployment.
It adds significant complexity to scripting an automated ground-up redeployment.
The Rainy Day
Through our work on Kolla-on-Bifrost (a.k.a Kayobe) we have been building our
own DiskImage-Builder elements. Our deployments for the Square
Kilometre Array telescope have had us
looking again at the image building process. A quiet afternoon led
us to put the work in to integrating our own HPC-specific DIB
elements into a single-step process for generating overcloud images.
For TripleO deployments, we now integrate our steps into the
invocation of the TripleO OpenStack CLI, as described in the TripleO
online documentation .
Here's how:
We install our MLNX-OFED repo on an intranet webserver acting
as a package repo as before. In TripleO this can easily be the
undercloud seed node. It's best for future control plane
upgrades if it is a server that is reachable from the
OpenStack overcloud instances when they are active.
We use a git repo of StackHPC's toolbox of DIB elements
We define some YAML for adding our element to TripleO's
overcloud-full image build (call this overcloud-images-stackhpc.yaml ):
disk_images :
-
imagename : overcloud-full
elements :
- mlnx-ofed
environment :
# Example: point this to your intranet's unpacked MLNX-OFED repo
DIB_MLNX_OFED_VERSION : 4.0-2
DIB_MLNX_OFED_REPO : http://172.16.8.2/repo/MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64
DIB_MLNX_OFED_DELETE_REPO : n
DIB_MLNX_OFED_PKGLIST : "mlnx-ofed-hypervisor mlnx-fw-updater"
Define some environment variables. Here we select to build Ocata stable images.
DiskImage-Builder doesn't extend any existing value assigned for ELEMENTS_PATH ,
so we must define all of TripleO's elements locations, plus our own:
export STABLE_RELEASE = "ocata"
export DIB_YUM_REPO_CONF = "/etc/yum.repos.d/delorean*"
export ELEMENTS_PATH = /home/stack/stackhpc-image-elements/elements:\
/usr/share/tripleo-image-elements:\
/usr/share/instack-undercloud:\
/usr/share/tripleo-puppet-elements
Invoke the OpenStack client providing configurations - here for a CentOS overcloud image -
plus our overcloud-images-stackhpc.yaml fragment:
openstack overcloud image build \
--config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images.yaml\
--config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos7.yaml \
--config-file /home/stack/stackhpc-image-elements/overcloud-images-stackhpc.yaml
All going to plan, the result is an RDMA-enabled overcloud image, done right (or at least, better
than it was before).
Share and enjoy!