StackHPC Cloud Engineer
StackHPC is a dynamic OpenStack and cloud consultancy that works with leading research institutions to provide high-performance cloud infrastructure for data-intensive scientific challenges. By focusing on client needs we identify and develop solutions that address the gaps in cloud for high performance research computing.
We’ve been growing for five years and are approaching 20 team members, but still retain a startup mindset.
An important aspect of StackHPC’s corporate culture is dedication to open source and the additional values of open design, development and community. All of our development is upstream open source; StackHPC is a committed member of the Open Infrastructure community being a Silver Member founder. Principal staff are active in community efforts around OpenStack and research computing and our CTO is co-founder and co-chair of the OpenStack Foundation's Scientific Special Interest Group.
StackHPC is looking for our next team member to join our growing cloud business and work with some of the best software engineers around. You would be interested in a career in cloud systems engineering, and ideally be familiar with working with complex cloud and HPC systems.
Typical work activities
- Design assistance and HPC consultancy.
- Creating new HPC deployments.
- Migration and upgrades of systems.
- Supporting existing client systems and resolving operational problems.
- Delivering training and knowledge transfer.
- Producing technical documentation for customers and our blog.
Skills and Experience
- Deployment and administration of Linux operating systems.
- Ansible configuration management.
- Infrastructure lifecycle management using, e.g. Terraform.
- Cloud infrastructure concepts such as cloud-init.
- Systems and process automation using Python and Bash.
- Docker containerisation methods.
- Development lifecycle tools, such as Git, Jira.
- Use of monitoring and reporting tools, such as Prometheus and Grafana.
- HPC application experience.
- Performance profiling, monitoring tools, and software performance optimization.
- Knowledge of configuring and optimising Object and File Systems used research computing.
- System-level hardware and performance optimisation.
- Setup of HPC middleware.
- Experience of Kubernetes.
- Experience of Slurm.
- Public Cloud (e.g. AWS, Azure and GCP).
- Exposure to the design or deployment of OpenStack or other cloud services.
- Technical knowledge of Linux-based cloud infrastructure technologies, such as: virtualisation, containerisation, software-defined networking (SDN), network function virtualisation (NFV), high speed networks, orchestration, storage, metrics, control consoles, code management systems, CI or test frameworks etc.
Our Technology Stack
We work with and deploy a wide range of OpenStack technologies and services, so experience with the following is always beneficial.
- O/S: CentOS, Ubuntu, RHEL, Rocky Linux
- Storage: Ceph, Cinder, Manila
- Supporting Components: Prometheus, Grafana, ElasticSearch, Kibana, Cloudkitty, Keystone, Horizon
- Networking: Neutron, SRIOV
- Compute and Workloads: Nova, Magnum, Slurm, Kubernetes, Octavia
Our technology stack is constantly evolving to meet market and user requirements. So we will help you get up to speed as required.
We have team members based across the globe but our HQ is in central Bristol - voted one of the best places to live in the UK. This location is a short walk from Bristol Temple Meads station.
Salary and benefits
- Competitive salary and bonus.
- Generous stock option scheme.
- Discretionary remote working / flexible working practices.
- Flexible working hours and home/office hybrid working.
- 25 days paid holiday.
- Pension contribution.
- Support for travel to conferences and delivering presentations.
- Learning and employee development is a priority, including work time dedicated to R&D and technology.