The Convergence of HPC, AI and Cloud

For optimal reading, please switch to desktop mode.

Projects are an excellent way to separate OpenStack resources, but out of the box, that separation is logical rather than physical. Virtual machines will schedule wherever they fit, provisioning on any hypervisor that offers up the correct combination of resources and traits for their given flavor. But what if you want to tightly control the compute resources of different projects? Perhaps a lower-priority project that should squash its VMs onto hypervisors with much higher overcommit ratios? Maybe important projects need to always have compute resources available. What you need is a way of isolating projects to groups of hypervisors.

Nova configuration

For this guide, we're going to assume we have a project called top-priority, and we'll call the aggregate of private hypervisors private-hosts.

Before making changes at the OpenStack level, set the following in your nova.conf file for the Nova Scheduler:

[scheduler]
limit_tenants_to_placement_aggregate = True
enable_isolated_aggregate_filtering = True

According to the Nova documentation, limit_tenants_to_placement_aggregate lets the scheduler restrict tenants to specific placement aggregates. Now that's a lot of big words, so let's break it down a bit further. When a user creates a server instance in OpenStack, the scheduler decides which hypervisor host it should be allocated to. The scheduler is almost endlessly configurable and understanding exactly how it works is a rabbit hole we'll try to avoid for now. There are only really three things you need to know:

Placement aggregates are logical groups of resource providers. In this case, it's going to be the private-hosts group that we want to be isolating.
"Tenant" is just an old term for "Project" in OpenStack. So what that option really does is just let us limit projects to only use certain hypervisors.
That's only half the story though, because now we can make sure top-priority VMs stay on the private-hosts, it does nothing to block other projects from also putting their VMs on private-hosts. For that, we need our second option.

Again, according to the Nova documentation, enable_isolated_aggregate_filtering allows the scheduler to restrict hosts in aggregates based on matching required traits in the aggregate metadata and the instance flavor/image. What this means is that we can have a set of required traits on our aggregate, so it will block any images or flavors without those traits. This might not sound very helpful at first, but OpenStack images and flavors can be private to a project. If we create a set of private flavors with some unique traits, we can tell the private-hosts to only allow those flavors in. And voila! We can now exclusively isolate projects to hypervisors!

OpenStack configuration

Well now we know it's all possible in theory, we just need to put it into practice.

It's worth noting that administrator access is assumed for all the OpenStack commands below. We'll also assume you're already using a Python virtual environment with regular OpenStack CLI access configured.

Step 0 is to install the placement client. It's a separate package to the normal OpenStack client:

pip install osc-placement

Placement used to be part of Nova, but has grown to be its own OpenStack service for tracking resource provider inventories and usage.

Step 0.1 is to take note of the UUIDs of any hypervisors we want to isolate, along with the top-priority project.

openstack --os-compute-api-version=2.53 hypervisor list
openstack project show top-priority -c name -c id

The first real step is to create that private-hosts aggregate, and add the hosts we want to isolate:

openstack aggregate create private-hosts
openstack aggregate add host private-hosts <hypervisor1 UUID>
openstack aggregate add host private-hosts <hypervisor2 UUID>
...

It's worth noting that aggregates in Nova are not the same as aggregates in Placement! That being said, since the Rocky release the nova-api service will attempt to keep them in sync. When an administrator adds or removes a host to/from a Nova host aggregate, the change should be mirrored in placement. If they do get out of sync, there's a nova-manage command to bring them back together manually.

Limiting the project to that aggregate is as easy as setting a single property on the aggregate:

openstack aggregate set --property filter_tenant_id=<top-priority UUID> private-hosts

It's worth noting that if the top-priority project spawns a sequel, it is very easy to put both projects on the same set of hosts. The filter_tenant_id key can be suffixed with any string to add multiple projects to the same aggregate, such as filter_tenant_id2=$<top-priority-2 UUID>.

Now to block any other projects from using those hosts, we need a custom trait. Custom traits must always start with CUSTOM_ so in this case we'll call it CUSTOM_PRIVATE_HOSTS.

openstack --os-placement-api-version 1.6 trait create CUSTOM_PRIVATE_HOSTS

For this example, we'll put the trait on a test flavor. In production, it can be slapped on any combination of images or flavors you want. The trait just needs to be present somewhere on the VM for it to be scheduled properly.

openstack flavor create --vcpus 2 --ram 8 --disk 30 --private --project top-priority --property trait:CUSTOM_PRIVATE_HOSTS=required private-flavor

Now for the awkward bit. The trait needs to be added to the aggregate and every hypervisor individually. Crazy, I know, but that's just the way it is.

Be careful setting traits on hypervisors using the CLI, adding new traits will overwrite everything previously set. Standard traits will just be re-discovered automatically, but anything custom needs to be explicitly added again. The example below uses some bash trickery to get around this and just append the new traits:

# Apply the trait to each hypervisor
traits=$(openstack --os-placement-api-version 1.6 resource provider trait list -f value <hypervisor UUID> | sed 's/^/--trait /')
openstack --os-placement-api-version 1.6 resource provider trait set $traits --trait CUSTOM_PRIVATE_HOSTS <hypervisor_UUID>

# Apply the trait to the aggregate
openstack --os-compute-api-version 2.53 aggregate set --property trait:CUSTOM_PRIVATE_HOSTS=required private-hosts

Hypervisors do have the alternative option of using a provider config file. This can be a much more manageable solution for larger deployments since the configuration can be version-controlled and deployment tools such as Kolla Ansible allow the configuration to be deployed to groups of hosts en masse. Below is a very simple example of a provider.yml file which would also set the custom trait.

meta:
    schema_version: '1.0'
providers:
    - identification:
        name: $COMPUTE_NODE
    traits:
        additional:
        - 'CUSTOM_PRIVATE_HOSTS'

And that's it! private-flavor is blocked outside of top-priority and is mandatory inside it (any other flavor will fail to schedule). So top-priority will never share a hypervisor with another project again.

The upsides, the downsides, and the middlesides

A setup like this isn't perfect. The most obvious flaw is that you'll need a new set of flavors or images for your private project, which could mean a whole lot of unnecessary duplication.

Another issue is that it's not very easy to revert these changes once you've started creating VMs. The hypervisor configuration is very easy to remove, a few simple commands will remove the traits and take the host out of the aggregate. The flavor configuration however is different. Editing a flavor to remove a trait won't remove the trait from any existing VMs. The worst part is that there's no way to tell that these ghost traits still exist, except for manually inspecting the Nova database. That can leave you in the awkward situation of having an unschedulable VM. It will keep running where it is, but any migration operations will fail because no valid hosts exist. There are two ways to get out of this trap. One is to never edit your flavors (which is good advice in any case) and instead resize to a new flavor that doesn't have the old traits. The other is to dust off your SQL skills and manually edit the Database to remove the offending trait. Neither solution is perfect, the first requires a restart of the VM, the second is a sketchy workaround at best and a typo away from catastrophe at worst.

One alternative would be to enable placement_aggregate_required_for_tenants for Nova and attach every project to an aggregate. That moves all of the configuration away from images and flavors and onto the aggregates, which are far easier to lock down.

That approach solves the first problem but exacerbates the second, and also doesn't scale particularly well. Our original method is nice because all the extra config doesn't interfere with any defaults. Projects and hypervisors without any of the extra config will still behave as normal. With the alternative approach, all new hypervisors and projects need to be added to aggregates, and the complexity eternally grows with the size of the cloud.

Unfortunately, there's just no one-size-fits-all solution. It would be lovely if a single configuration option existed, but alas it does not. If it particularly bothers you, dear reader, you are more than welcome to make an upstream contribution. For now, we must work with what we have.

Useful links

https://docs.openstack.org/nova/latest/admin/aggregates.html#tenant-isolation-with-placement https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.limit_tenants_to_placement_aggregate https://docs.openstack.org/nova/latest/reference/isolate-aggregates.html https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.enable_isolated_aggregate_filtering

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter, LinkedIn or directly via our contact page.

StackHPC

Hypervisor Isolation in OpenStack

Nova configuration

OpenStack configuration

The upsides, the downsides, and the middlesides

Useful links

Get in touch