For optimal reading, please switch to desktop mode.
Published: Wed 15 November 2017
Updated: Tue 21 November 2017
By John Garbutt
In Bare metal .
tags: hpc openstack baremetal infrastructure scheduling placement
For many reasons, it is common for HPC users of OpenStack to use Ironic and
Nova together to deliver baremetal servers to their users. In this post we
look at recent changes to how Nova chooses which Ironic node to use for each
user's nova boot request.
Managing Capacity
Public cloud must present the illusion of infinite capacity. For
private cloud use cases, and research computing in particular, the
amount of available, unused capacity is of great interest. Most
small clouds soon hit the reality of running out of space. There
are two main approaches to dealing with capacity problems:
Explicit assignment (and pre-emption)
Co-operative multitasking
Given our situation described above, we have opted for co-operative
multitasking, where the users delete their own instances when they are finished
with those nodes, allowing others to do what they need.
To help reduce the strain on resources we are also prototyping
having a shared execution frameworks such as a Heat provisioned
OpenHPC Slurm cluster, a Magnum
provisioned Docker Swarm cluster and a Sahara provisioned RDMA
enabled Spark cluster from HiBD .
In this blog we are focusing on the capacity of Ironic based clouds. When you
add virtualisation into the mix, there are many questions around how different
combinations of flavors fit onto a single hypervisor, how to try to avoid
wasted space. Similarly, we are focusing on statically sized private clouds,
so this blog will ignore the whole world of capacity planning.
OpenStack Security Model
You can argue about this being a security model, or just the details of the
abstraction OpenStack presents, but the public APIs try their best to hide any
idea of physical hosts and capacity from non-cloud-admin users.
When building a public cloud as a publicly traded company, exposing via the
API in realtime how many physical hosts you run or how much free capacity you
have could probably break the law in some countries. But when you run a private
could, you really want a nice view of what your friends are using.
Co-operative Capacity Management
"Play nice, or I will delete all your servers every Friday afternoon!"
That is a very tempting thing to say, and I basically have said that. But its
hard to play nice when you have no idea how much capacity is in use. So we
have a solution: the capacity dashboard.
Talking to the users of P3, its clear having a visual representation of who is
currently using what has been much more useful that a wiki page of requests
that quickly drifted out of sync with reality. In the future we may consider
Mistral to enforce lease times of
servers, or maybe Blazar for a more
formal reservation system, but for now giving the scientists the flexibility
of a more informal system is working well.
Building the Dashboard
Firstly we have our monitoring infrastructure. This is currently built using
OpenStack Monasca, making use of Kafka and Influx DB. (We also use Monasca with
ELK to generate metrics from our logs, but that is a story for another day):
The dashboard is built using Grafana. There is a Monasca plugin, that means we
can use the Monasca API as a data source, and a Keystone plugin that is used
to authenticate and authorise access to both Grafana and its use of the Monasca
APIs:
https://grafana.com/plugins/monasca-datasource
Our system metrics are kept in a project general users don't have access to,
but the capacity metrics and dashboards are associated with the project that
all the users of the system have access to.
Now we have a system in place to ingest, store, query and visualize metrics in
a multi-tenant way, we now need a tool to query the capacity and send metrics
into Monasca.
Querying Baremetal Capacity
We have created a small CLI tool called os-capacity . It uses os_client_config and cliff to query the Placement API for details about the
current cloud capacity and usage. It also uses Nova APIs and Keystone APIs to
get hold of useful friendly names for the information that is in placement.
For the Capacity dashboard we use data from two particular CLI calls. Firstly
we look at the capacity by calling:
os-capacity resources group
+----------------------------------+-------+------+------+-------------+
| Resource Class Groups | Total | Used | Free | Flavors |
+----------------------------------+-------+------+------+-------------+
| VCPU:1,MEMORY_MB:512,DISK_GB:20 | 5 | 1 | 4 | my-flavor-1 |
| VCPU:2,MEMORY_MB:1024,DISK_GB:40 | 2 | 0 | 2 | my-flavor-2 |
+----------------------------------+-------+------+------+-------------+
This tool is currently very focused on baremetal clouds. The flavor mapping is
done assuming the flavors should exactly match all the available resources for
a given Resource Provider. This is clearly not true for a virtualised scenario.
It is also not true in some baremetal clouds, but this works OK for our cloud.
Of course, patches welcome :)
Secondly we can look at the usage of the cloud by calling:
os-capacity usages group user --max-width 70
+----------------------+----------------------+----------------------+
| User | Current Usage | Usage Days |
+----------------------+----------------------+----------------------+
| 1e6abb726dd04d4eb4b8 | Count:4, | Count:410, |
| 94e19c397d5e | DISK_GB:1484, | DISK_GB:152110, |
| | MEMORY_MB:524288, | MEMORY_MB:53739520, |
| | VCPU:256 | VCPU:26240 |
| 4661c3e5f2804696ba26 | Count:1, | Count:3, |
| 56b50dbd0f3d | DISK_GB:371, | DISK_GB:1113, |
| | MEMORY_MB:131072, | MEMORY_MB:393216, |
| | VCPU:64 | VCPU:192 |
+----------------------+----------------------+----------------------+
You can also group by project, but in the current SKA cloud all users are in
the same project, so grouping by user works best.
The only additional step is converting the above information into metrics that
are fed into Monasca. For now this has also been integrated into the
os-capacity tool, by a magic environment variable. Ideally we would feed
the json based output of os-capacity into a separate tool that manages sending
metrics, but that is a nice task for a rainy day.
What's Next?
Through our project in SKA we are starting to work very closely with CERN .
As part of that work we are looking at helping with the CERN prototype of
preemptible instances, and looking at many other ways that both the SKA and
CERN can work together to help our scientists be even more productive.
The ultimate goal is to deliver private cloud infrastructure for research
computing use cases that achieves levels of utilisation comparable to the
best examples of well-run conventional research computing clusters. Being
able to track available capacity is an important step in that direction.