For optimal reading, please switch to desktop mode.
Welcome to the second edition of Navigating Upstream from the team at StackHPC! We’ve had an exciting few months since our last newsletter: hosting our first customer event in Paris, winning the Superuser award at the OpenInfra Summit and rolling out our first deployments of OpenStack Epoxy.
Our Azimuth Cloud Portal is also highlighted in this edition, along with a look at OpenStack for AI infrastructure, upcoming events, and info on our next OpenStack training opportunities.
– The StackHPC team
StackHPC on stage for the Super User award at the OpenInfra Summit in Paris.
StackHPC leaves Paris with the crown jewels
OpenInfra Europe Summit 2025
In October, we travelled to Paris for the OpenInfra Europe Summit 2025 and our first client event, hosted the day before the summit. It was a great extended weekend, bringing together the global community to share knowledge on all things open source and open infrastructure.
StackHPC colleagues featured in 13 talks and workshops at this year’s summit, the full list of which is available at the end of this section. The topics of these ranged from a hands-on guide to OpenStack deployments, to our learnings from running cutting-edge multi-architecture clouds.
Superuser Award Winners!
We are very proud to be this year’s winners of the OpenInfra Superuser award, in recognition of our contributions to open source infrastructure and the communities that build it. In particular, this award recognised our work on the Azimuth Cloud Platform, which is highlighted in this edition of the StackHPC newsletter.
StackHPC, 2025 OpenInfra Super User award winners.
Paris client event
Our trip to France also saw our inaugural StackHPC client event. Over the course of a day, we had multiple StackHPC and customer presentations, in addition to discussions ranging from experiences using GPUs in production to networking and its pitfalls.
We are very grateful to everyone who found the time to attend, particularly to those who presented their experiences. It was a fantastic opportunity to meet people previously only seen through a webcam, to share success stories, and build stronger connections, in short to fully embody the value of an Open Community.
Bringing customers and colleagues together for a workshop session at our client event.
StackHPC talks at OpenInfra Europe Summit 2025
- Multi-Arch OpenStack in Production: ARM64/x86 Bare Metal and Hypervisors with Kolla-Ansible
- by Bartosz Bezak and Nathan Harper
- History of the OpenStack Compute API (video unavailable)
- by John Garbutt
- Dynamic resource sharing and diverse compute platforms for AI: The Slinky Solution
- by Stig Telfer and Scott Davidson
- OpenStack deployments made easy - Kolla-Ansible workshop
- by Michal Nasiadka and Jakub Darmach
- Scientific SIG at the Open Source Pavilion (not recorded)
- by Stig Telfer, Martial Michel and Blair Bethwaite
- All aboard the Release Train: Lessons from our last 100 OpenStack upgrades
- by Alex Welsh and Seunghun Lee
- Scientific SIG Working Group Session
- by Stig Telfer, Martial Michel and Blair Bethwaite
- Kolla User Forum
- by Michal Nasiadka and Bartosz Bezak
- Slicing Supercomputers for Trusted Research Environments
- by John Garbutt and Tunde Oyewo
- Scientific SIG at the Open Source Pavilion (not recorded)
- by Stig Telfer, Martial Michel and Blair Bethwaite
- A Universe from Nothing: A hands-on introduction to OpenStack
- by Massimilano Favaro-Bedford and Matt Crees
- OpenStack Technical Committee Meet and Greet
- by Amy Marrich and Michal Nasiadka
- Industrial-Grade AI on OpenStack HPC: Real-World Use Cases & Future Roadmaps
- by Armstrong Foundjem, John Garbutt and Martial Michel
Why OpenStack for AI infrastructure? - 6G AI Sweden
Achieving peak performance from AI infrastructure is crucial for maximising returns on significant hardware and software investments, especially given the competitive nature of modern day AI workloads.
6G AI Sweden case study
In pursuit of world-class AI capabilities, 6G AI Sweden has partnered with StackHPC to deliver absolute data sovereignty for Swedish companies. Together, we designed and deployed a technology stack built on OpenStack and Kubernetes.
For AI use cases, a significant advantage of OpenStack is its native support for bare metal, virtualisation and containerisation, all couched within the same reconfigurable infrastructure. OpenStack’s multi-tenancy model and fine-grained policy is also well-suited to cloud-native service providers for AI infrastructure.
6G AI’s compute nodes are based on the Nvidia "HGX" reference architecture, with Nvidia H200 GPUs and NVMe storage local to each GPU. These are interconnected by 400G NDR Infiniband networking from Nvidia, along with an Ethernet network based on 800G switches and network fabric. High-performance storage from VAST data provides object, block and file storage services with multi-tenant isolation.
Bare metal compute infrastructure is created and managed using OpenStack Ironic. This enables provisioning and management of bare metal compute into isolated client tenancies, a critical requirement for 6G AI’s business model.
To make the most of this world-class AI infrastructure, 6G AI selected the Azimuth Cloud Portal - an intuitive platform for providing compute platforms on a self-service basis. Azimuth is a free and open-source cloud portal built to simplify deploying and accessing platforms for users. StackHPC, as the lead developers and custodians of Azimuth, provide services for deployment, support, and extension of the project. Find out more about the latest updates to Azimuth in the dedicated section below, and for a deeper dive into 6G AI’s system, click here.
At the Helm - Azimuth Cloud Portal Updates
We are constantly improving our award-winning Azimuth Cloud Portal, with even bigger things planned for the future.
The past few months have been primarily focused on consolidating and ensuring that we keep up with the fast-moving upstream Kubernetes and cloud-native ecosystems.
Azimuth 2025.10
Azimuth 2025.10 upgrades the management infrastructure to Kubernetes v1.31 and introduces experimental support for the ability to self-service applications directly on any Kubernetes cluster, removing the requirement for an OpenStack backend. As part of this work we added:
- OIDC Authentication: Azimuth authentication can now come directly from an OIDC identity provider, eliminating the need to rely on OpenStack-based authentication.
- Flexible Credential Registration: Users can register either Kubernetes or OpenStack credentials for their Azimuth tenancy.
Note: The existing OpenStack-integrated functionality will continue to work as before, ensuring backwards compatibility for current users.
JupyterHub custom profiles
The Azimuth JupyterHub app has been upgraded to better support custom notebook profiles and GPU hardware. The default configuration now includes GPU-enabled notebooks for Nvidia and Intel GPUs, and a list of available notebook profiles is now fully customisable by Azimuth operators within their azimuth-config.
Azimuth 2025.11 This release upgrades the Azimuth management infrastructure to Kubernetes v1.32 and includes various quality of life improvements to the operator and user experience. New Kubernetes v1.34 templates are now available for users.
Breaking change in Azimuth 2025.5.0 Unfortunately, after publishing the Azimuth 2025.5.0 release, we discovered that it introduced a breaking change for existing Slurm platforms. Patching Slurm platforms created prior to upgrading to this release will cause their Open OnDemand services to become inoperable. Users can be prevented from patching their Slurm platforms by pinning them to the version packaged in the previous release. For more information, refer to the 2025.5.0 release notes.
For more details on recent changes to Azimuth, have a look at the release notes.
The View from the Release Train - Infrastructure updates
StackHPC (along with the wider OpenStack community) has been busy bringing upgrades, fixes and new features to OpenStack and Release Train in the last quarter.
OpenStack
- OpenStack Epoxy (2025.1): Following on from previous quarters’ efforts in validation and testing, we have successfully carried out our first deployments to OpenStack Epoxy, with a raft of system upgrades soon to commence.
- Read the full release notes
StackHPC Kayobe-config
Container images for Glance have been updated to provide a way to force multipart uploads to S3 backends. This stops Glance from eating too much memory when large images are uploaded.
The OVN container image tags have been bumped to bring in a fix for OpenVSwitch log permissions, allowing Fluentd to scrape these logs (patch-notes).
Ceph and Magnum-CAPI-Helm have each received minor version bumps. The release notes for each of these changes are linked below.
- Ceph to v19.2.3 (release-notes)
- Update Magnum-CAPI-Helm driver to 1.3.0 (release-notes)
Kolla-Ansible
Override added for Octavia so that operators can add their own notification topics.
OVN container images now have default environment variables that ease running of OVN commands for operators.
Security
- Deny access to /server-status via the external frontend for deployments that use single frontend (patch-notes).
Fixes
- Fixes for issues regarding TLS certificate templating, accidental Nova Libvrt downgrades and CORS becoming blocked when attempting to upload an image via Horizon.
Kayobe
- Various fixes for issues including network connectivity, duplication of CA certificates, and configuration of backend TLS.
- Read the full release notes
Kolla
- Debian container image builds now use Bookworm suite for RabbitMQ installation.
- Read the full release notes
CVE Watch
The second half of 2025 is shaping up to be a bit more lively in terms of security vulnerabilities potentially affecting our customers’s deployments. Hopefully, this all the excitement the year has in stock on that front.
CVE-2025-6000 Privileged Vault Operator May Execute Code on the Underlying Host
Affecting HashiCorp Vault and its open source counterpart, OpenBao, this relatively minor vulnerability can allow a Vault/Bao operator to execute code on the underlying hosts. The conditions are specific enough that they didn’t affect our customers, as our deployments of Vault/Bao don’t meet these requirements.
OSSA-2025-002 Unauthenticated access to EC2/S3 token endpoints can grant Keystone authorisation
CVE number TBD
A much more concerning vulnerability, which resulted in an embargo on the release of any details prior to publication. As soon as we got advance notice of this bug, we immediately got to preparing patched versions of the Keystone services to make available to our customers the second the embargo was lifted, Tuesday 4 November at 15:00 GMT.
A vulnerability in the Keystone APIs “ec2tokens” and “s3tokens” allowed an unauthenticated attacker to retrieve a fully-scoped token (for ec2tokens), provided they could send the API endpoint a valid AWS signature (e.g., obtained from a presigned S3 URL). Such a token results in unauthorised access and possible privilege escalation. The attack would require access to the API endpoints, putting public and user-reachable endpoint enabled deployments at real risk, the former more so than the latter.
All in all, a vulnerability to be taken seriously and patched quickly.
Upcoming training opportunities
Our in-person ‘A Universe from Nothing’ workshop at the OpenInfra Summit.
We have our next OpenStack Operations Training Workshop scheduled for the first week of December. These multi-client workshops cover key OpenStack operations and infrastructure best practices.
One of StackHPC’s core values is to help develop a greater degree of independence by growing the skills and confidence of the teams we work with, and these workshops are an embodiment of this.
If you're interested in how these workshops are run, our team delivered a condensed in-person session at the OpenInfra Summit in Paris, a full recording of which is available on the OpenInfra Youtube..
For pricing information and to register your interest, please contact us at operations@stackhpc.com.
StackHPC in the Community
Kubernetes Community Days UK
As well as the OpenInfra Summit, a delegation from StackHPC had the pleasure of attending the Kubernetes Community Days UK conference in Edinburgh, Scotland; an event that brought together organisations and professionals to discuss how open source collaboration and tools like Kubernetes shape the cloud-native landscape.
Over the course of the event, we attended a series of engaging talks and workshops delivered by industry leaders. The sessions explored a range of topics from the vital role of open source in building sustainable cloud-native businesses, to the evolution, current status and latest developments in technologies such as GitOps and Generative AI. We heard how companies like Monzo have used Kubernetes and Cluster API to efficiently handle surges in workload demand. Another highlight was Giant Swarm's journey of migrating from a custom operator-based cluster management system to Cluster API, including the live migration of hundreds of production clusters for major enterprises such as Adidas and Vodafone.
The event provided a wonderful opportunity to connect with the broader community, meet contributors and maintainers from diverse organisations and be a part of conversations that sparked new ideas and perspectives that we're excited to bring back to our own work.
Upcoming Events
Our busy calendar continues into Q4. Here are the next events we will be attending:
Ceph Days Berlin 2025 in Berlin, Germany on 12-13th November
SuperComputing 25 (SC25) in St Louis, USA on 16-21 November
Computing Insight UK (CIUK) 2025 in Manchester, UK on 4-5th December
Dirac Day 2025 in Stoke-on-Trent, UK on 11th December
From the Blog
StackHPC at the 21st ECMWF HPC Workshop
Published 19 September 2025, by Stig Telfer
Stig’s round-up from the HPC Workshop marking ECMWF’s 50th anniversary. Highlights include AI-powered weather forecasting, IT4LIA AI Factory and the StackHPC team’s presentation on Slinky - a powerful combination of Slurm and Kubernetes.
Fresh Purrspectives - Improving the CloudKitty User Experience
Published 19 September 2025, by Leonie Chamberlain-Medd
Leonie discusses user-interface improvements brought to CloudKitty as part of her internship project with StackHPC, resulting in greater ease of use and a more intuitive user experience.
Slurm-Controlled Rebuild: Queue-Aware Cluster Upgrades
Published 4 September 2025, by Bertie Thorpe
Bertie dives into an exciting new tool for OpenTofu defined, Ansible-driven Slurm Appliance, which aims to reduce disruptive and costly downtime during upgrades of production Slurm clusters.
Keep an eye on our LinkedIn, or the blog page of our website for the latest posts.
Parting words
Thank you for taking the time to read the second edition of Navigating Upstream!
We always welcome any feedback and suggestions.
If you’d prefer not to receive future editions, you can opt out at any time using the link below or with a simple reply. Otherwise, we look forward to keeping in touch.
– The StackHPC Team
Reach out to us via Bluesky, LinkedIn or directly via our contact page.