For optimal reading, please switch to desktop mode.
Our project with the Square Kilometre Array includes a
requirement for high-performance containerised runtime environments. We
have been building a system with bare metal infrastructure, multiple
physical networks, high-performance data services and optimal
integrations between OpenStack and container orchestration engines (such
as Kubernetes and Docker Swarm).
We have previously documented our upgrade of OpenStack deployment from
Ocata to Pike.
This upgrade impacted Docker Swarm and Kubernetes: provisioning of both
COEs in a bare metal environment failed after the upgrade. We resolved
the issues with Docker Swarm but left Kubernetes for patching over a
major release upgrade.
A fix was announced with Queens release, along with swarm-mode support
for Docker. This strengthened the case to upgrade Magnum to Queens on an
underlying Openstack Pike. The design ethos of Kolla-Ansible and Kayobe, using containerisation to
avoid the nightmares of dependency interlock, made the targeted upgrade
of Magnum a relatively smooth ride.
Fixing Magnum deployment by upgrading from Pike to Queens
We use Kayobe to manage the configuration of our Kolla deployment.
Changing the version of a single OpenStack service (in this case,
Magnum) is as simple as setting the version of a specific Docker
container tag, as follows:
Prepare the Kaybobe environment (assuming it is already installed):
cd src/kayobe-config
git checkout BRANCH-NAME
git pull
source kayobe-env
cd ../kayobe
source ../../venv/kayobe/bin/activate
export KAYOBE_VAULT_PASSWORD=**secret**
Add magnum_tag: 6.0.0.0 to
kayobe-config/etc/kayobe/kolla/globals.yml.
Finally, build and deploy the new version of Magnum to the control
plane. To ensure that other OpenStack services are not affected
during the deployment, we use --kolla-tags and
--kolla-skip-tags:
kayobe overcloud container image build magnum --push
-e kolla_source_version=stable/queens
-e kolla_openstack_release=6.0.0.0
-e kolla_source_url=https://git.openstack.org/openstack/kolla
kayobe overcloud container image pull --kolla-tags magnum
--kolla-skip-tags common
kayobe overcloud service upgrade --kolla-tags magnum
--kolla-skip-tags common
That said, the upgrade came with a few unforeseen issues:
We discovered that Kolla Ansible, a tool that Kayobe uses to deploy
Magnum containers, assumes that host machines running Kubernetes are
able to communicate with Keystone on an internal endpoint, not an
option in our case since the internal endpoints were internal to the
control plane, which does not include tenant networks and instances
(which could be baremetal nodes or VMs). Since this is generally an
invalid assumption, a patch was pushed upstream which has been quickly
approved in the code review process. After applying this patch, it
is necessary to reconfigure default configuration templates for
Heat, made possible by a single Kayobe command:
kayobe overcloud service reconfigure --kolla-tags heat
--kolla-skip-tags common
Docker community edition (v17.03.0-ce onwards) uses cgroupfs
as the native.cgroupdriver. However, Magnum assumes that
this is systemd and does not explicitly demand this be the
case. As a result, deployment fails. This was addressed in this
pull request.
By default, Magnum's behaviour is to assign a floating IP to each
server in a container infrastructure cluster. This means that all
the traffic flows through the control plane (when accessing the
cluster from an external location; internal traffic is direct).
Disabling floating IP appeared to have no effect which we filed as a
bug on launchpad.
Patch to fix Magnum to correctly handle disabling of floating IP in
swarm mode is currently under way.
Patch kayobe-config
to update magnum_tag to 6.0.0.0 as well as point
magnum_conductor_footer and magnum_api_footer to a
patched Magnum Queens fork stackhpc/queens on our Github
account.
Fedora Atomic 27 image for containers
A recently recently released Fedora Atomic 27 image (download link
comes packaged with baremetal and Mellanox drivers therefore it is no
longer necessary to build custom image using diskimage-builder
to incorporate these drivers. However, it was necessary to make a few
one-off manual changes to the image which we achieved by making changes
to the image through virsh console:
First, boot into the image using cloud-init credentials
defined within init.iso (built from these instructions):
sudo virt-install --name fa27 --ram 2048 --vcpus 2 --disk
path=/var/lib/libvirt/images/Fedora-Atomic-27-20180326.1.x86_64.qcow2
--os-type linux --os-variant fedora25 --network bridge=virbr0
--cdrom /var/lib/libvirt/images/init.iso --noautoconsole
sudo virsh console fa27
The images are shipped with Docker v1.13.1 which is 12 releases
behind the current stable release, Docker v18.03.1-ce (note that
versioning scheme changed after v1.13.1 to v17.03.0-ce). To obtain
up-to-date features required by our customers, we upgraded this to
the latest release.
sudo su
cd /etc/yum.repos.d/
curl -O https://download.docker.com/linux/fedora/docker-ce.repo
rpm-ostree override remove docker docker-common cockpit-docker
rpm-ostree install docker-ce -r
Fedora Atomic 27 comes installed with packages for a scalable
network file system called GlusterFS. However, one of our
customer requirements was to support RDMA capability for GlusterFS
in order to maximise IOPs for data intensive tasks compared to
IP-over-Infiniband. The package was available on rpm-ostree
repository as glusterfs-rdma. It was installed and enabled
as follows:
# installing glusterfs
sudo su
rpm-ostree upgrade
rpm-ostree install glusterfs-rdma fio
systemctl enable rdma
rpm-ostree install perftest infiniband-diags
Write cloud-init script that runs once to resize root volume
partition due to the fact that root volume is mounted as LVM and
conventional cloud-init script to grow this partition fails
leading to containers deployed inside swarm cluster to quickly fill
up. The following script placed under
/etc/cloud/cloud.cfg.d/99_growpart.cfg did the trick in our
case which generalises to different types of root block devices:
#cloud-config
# resize volume
runcmd:
- lsblk
- PART=$(pvs | awk '$2 == "atomicos" { print $1 }')
- echo $PART
- /usr/bin/growpart ${PART:: -1} ${PART: -1}
- pvresize ${PART}
- lvresize -r -l 100%FREE /dev/atomicos/root
- lsblk
Cleaning up the image makes it lighter but prevents users rolling
back the installation, an action we do not anticipate users to need
to perform. Removing cloud-init config allows it to run
again and removes authorisation details. Cleaning service logs gives
the image a fresh start.
# undeploy old images
sudo atomic host status
# current deployment <0/1> should have a * next to it
sudo ostree admin undeploy <0/1>
sudo rpm-ostree cleanup -r
# image cleanup
sudo rm -rf /var/log/journal/*
sudo rm -rf /var/log/audit/*
sudo rm -rf /var/lib/cloud/*
sudo rm /var/log/cloud-init*.log
sudo rm -rf /etc/sysconfig/network-scripts/ifcfg-*
# auth cleanup
sudo rm ~/.ssh/authorized_keys
sudo passwd -d fedora
# virsh cleanup
# press Ctrl+Shift+] to exit virsh console
sudo virsh shutdown fa27
sudo virsh undefine fa27
Ansible roles for managing container infrastructure
There are official Ansible modules for various OpenStack projects like
Nova, Heat, Keystone, etc. However, Magnum currently does not have one,
especially those that concern creating, updating and managing container
infrastructure inventory. Magnum currently lacks certain useful features
possible indirectly through Nova API like attaching multiple network
interfaces to each node in the cluster that it creates. The ability to
generate and reuse an existing cluster inventory is further necessitated
by a specific requirement of this project to mount GlusterFS volumes to
each node in the container infrastructure cluster.
In order to lay the foundation for performing preliminary data
consumption tests for the Square Kilometre Array's (SKA) Performance
Prototype Platform (P3), we needed to attach each node in the container
infrastructure cluster to multiple high speed network interfaces:
- 10G Ethernet
- 25G High Throughput Ethernet
- 100G Infiniband
We have submitted a blueprint
to support multiple networks using Magnum API since Nova already allows
multiple network interfaces to be attached. In the meantime, we wrote an
Ansible role to drive Magnum
and generate an Ansible inventory from the cluster deployment. Using
this inventory, further playbooks apply our enhancements to the deployment.
The role allows us to declare specification of container infrastructure
required including a variable that is a list of networks to attach to
cluster nodes. A bespoke ansible module os_container_infra
creates/updates/deletes cluster as specified using
python-magnumclient. Another module called
os_stack_facts then gathers facts about container infrastructure
using python-heatclient allowing us to generate an inventory of
the cluster. Finally, a module called os_server_interface uses
python-novaclient to attach each node in the container
infrastructure cluster to additional network interfaces declared in the
specifications.
We make use of the recently announced openstacksdk Python module
for talking to OpenStack which was conceived with an aim to assimilate
shade and os_client_config projects which have been
performing similar functions under separate umbrellas. We enjoyed the
experience of using openstacksdk API that is largely consistent with
the parent projects. Ansible plans to eventually transition to
openstacksdk but they do not currently have specific plans to support
plugin libraries like python-magnumclient,
python-heatclient and python-novaclient which provide
wider coverage in terms of the range of interaction with their API
compared to openstacksdk, which only offers a set of
common-denominators across various OpenStack cloud platforms.
With Magnum playing an increasing more important role in the OpenStack
ecosystem by allowing users to create and manage container orchestration
engines like Kubernetes, we expect this role will make lives easier for
those of us who regularly use Ansible to manage complex and large scale
HPC infrastructure.