For optimal reading, please switch to desktop mode.
For the first time, the November TOP500 list (published to coincide with Supercomputing 2020) includes fully OpenStack-based Software-Defined Supercomputers:
- At #99 is UM6P’s Toubkal
- At #421 is Cambridge University’s Cascade Lake extension of CSD3.
Drawing on experience including from the SKA Telescope Science Data Processor Performance Prototypting Platform and Verne Global's hpcDIRECT project, StackHPC has helped bootstrap and is providing support for these OpenStack deployments. They are deployed and operated using OpenStack Kayobe and OpenStack Kolla-Ansible.
A key part of the solution is being able to deploy an OpenHPC-2.0 Slurm cluster on server infrastructure managed by OpenStack Ironic. The Dell C6420 servers are imaged with CentOS 8, and we use our OpenHPC Ansible role to both configure the system and build images. Updated images are deployed in a non-impacting way through a custom Slurm reboot script.
With OpenStack in control, you can quickly rebalance what workloads are deployed. Users can move capacity between multiple Bare Metal, Virtual Machine and Container based workloads. In particular, OpenStack Magnum provides on demand creation of Kubernetes clusters, an approach popularised by CERN.
In addition to user workloads, the solution interacts with iDRAC and Redfish management interfaces to control server configurations, remediate faults and deliver overall system metrics. This was critical in optimising the data centre environment and resulted in the high efficiency achieved in the TOP500 list.
For more details, please watch our recent presentation from the OpenInfra Summit:
Get in touch
If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.