Deploying Performant Parallel Filesystems: Ansible and BeeGFS

For optimal reading, please switch to desktop mode.

BeeGFS is a parallel file system suitable for High Performance Computing with a proven track record in scalable storage solution space. In this article, we explore how different components of BeeGFS are pieced together and how we have incorporated them into an Ansible role for a seamless storage cluster deployment experience.

We've previously described ways of integrating OpenStack and High-Performance Data. In this post we'll focus on some practical details for how to dynamically provision BeeGFS filesystems and/or clients running in cloud environments. There are actually no dependencies on OpenStack here - although we do like to draw our Ansible inventory from Cluster-as-a-Service infrastructure.

As described here, BeeGFS has components which may be familiar concepts to those working in parallel file system solution space:

Management service: for registering and watching all other services
Storage service: for storing the distributed file contents
Metadata service: for storing access permissions and striping info
Client service: for mounting the file system to access stored data
Admon service (optional): for presenting administration and monitoring options through a graphical user interface.

Introducing our Ansible role for BeeGFS...

We have an Ansible role published on Ansible Galaxy which handles the end-to-end deployment of BeeGFS. It takes care of details all the way from deployment of management, storage and metadata servers to setting up client nodes and mounting the storage point. To install, simply run:

ansible-galaxy install stackhpc.beegfs

There is a README that describes the role parameters and example usage.

An Ansible inventory is organised into groups, each representing a different role within the filesystem (or its clients). An example inventory-beegfs file with two hosts bgfs1 and bgfs2 may look like this:

[leader]
bgfs1 ansible_host=172.16.1.1 ansible_user=centos

[follower]
bgfs2 ansible_host=172.16.1.2 ansible_user=centos

[cluster:children]
leader
follower

[cluster_beegfs_mgmt:children]
leader

[cluster_beegfs_mds:children]
leader

[cluster_beegfs_oss:children]
leader
follower

[cluster_beegfs_client:children]
leader
follower

Through controlling the membership of each inventory group, it is possible to create a variety of use cases and configurations. For example, client-only deployments, server-only deployments, or hyperconverged use cases in which the filesystem servers are also the clients (as above).

A minimal Ansible playbook which we shall refer to as beegfs.yml to configure the cluster may look something like this:

---
- hosts:
  - cluster_beegfs_mgmt
  - cluster_beegfs_mds
  - cluster_beegfs_oss
  - cluster_beegfs_client
  roles:
  - role: stackhpc.beegfs
    beegfs_state: present
    beegfs_enable:
      mgmt: "{{ inventory_hostname in groups['cluster_beegfs_mgmt'] }}"
      oss: "{{ inventory_hostname in groups['cluster_beegfs_oss'] }}"
      meta: "{{ inventory_hostname in groups['cluster_beegfs_mds'] }}"
      client: "{{ inventory_hostname in groups['cluster_beegfs_client'] }}"
      admon: no
    beegfs_mgmt_host: "{{ groups['cluster_beegfs_mgmt'] | first }}"
    beegfs_oss:
    - dev: "/dev/sdb"
      port: 8003
    - dev: "/dev/sdc"
      port: 8103
    - dev: "/dev/sdd"
      port: 8203
    beegfs_client:
      path: "/mnt/beegfs"
      port: 8004
    beegfs_interfaces:
    - "ib0"
    beegfs_fstype: "xfs"
    beegfs_force_format: no
    beegfs_rdma: yes
...

To create a BeeGFS cluster spanning the two nodes as defined in the inventory, run a single Ansible playbook to handle the setup and the teardown of BeeGFS storage cluster components by setting beegfs_state flag to present or absent:

# build cluster
ansible-playbook beegfs.yml -i inventory-beegfs -e beegfs_state=present

# teardown cluster
ansible-playbook beegfs.yml -i inventory-beegfs -e beegfs_state=absent

The playbook is designed to fail if the path specified for BeeGFS storage service under beegfs_oss is already being used for another service. To override this behaviour, pass an extra option as -e beegfs_force_format=yes. Be warned that this will cause data loss as it formats the disk if a block device is specifed and also erase management and metadata server data if there is an existing BeeGFS deployment.

Highlights of the Ansible role for BeeGFS:

The idempotent role will leave state unchanged if the configuration has not changed compared to the previous deployment.
The tuning parameters for optimal performance of the storage servers recommended by the BeeGFS maintainers themselves are automatically set.
The role can be used to deploy both storage-as-a-service and hyperconverged architecture by the nature of how roles are ascribed to hosts in the Ansible inventory. For example, the hyperconverged case would have storage and client services running on the same nodes while in the disaggregated case, the clients are not aware of storage servers.

Other things we learnt along the way:

BeeGFS is sensitive to hostname. It prefers hostnames to be consistent and permanent. If the hostname changes, services refuse to start. As a result, this is worth being mindful of during the initial setup.
This is unrelated to BeeGFS specifically but we had to set a -K flag when formatting NVME devices in order to prevent it from discarding blocks under instructions from Dell otherwise the disk would disappear with the following error message:

[ 7926.276759] nvme nvme3: Removing after probe failure status: -19
[ 7926.349051] nvme3n1: detected capacity change from 3200631791616 to 0

Looking Ahead

The simplicity of BeeGFS deployment and configuration makes it a great fit for automated cloud-native deployments. We have seen a lot of potential in the performance of BeeGFS, and we hope to be publishing more details from our tests in a future post.

We are also investigating the current state of Kubernetes integration, using the emerging CSI driver API to support the attachment of BeeGFS filesystems to Kubernetes-orchestrated containerised workloads.

Watch this space!

In the meantime, if you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

StackHPC

Other articles

HPC Container Orchestration with Bare Metal Magnum