Conch-quering Federated Cloud SSH!

For optimal reading, please switch to desktop mode.

In our current digital age, no other combination of words have been more on everyone's lips than "AI" and "Cloud", perhaps other than "GPUs" and "sorry, how much!?". Despite the ever-increasing cost of AI-capable infrastructure, domestic and international collaborative research continues to grow.

Federated cloud resources provide a solution by pooling investments in AI infrastructure among a group of collaborating institutions. But that raises the question, how do you efficiently authenticate and manage an evolving group of external users from various institutions for specific services, without it becoming a logistical nightmare?

Auth to a Great Start

One of OpenStack's strengths is its support for federated authentication. This includes being able to configure Keystone (OpenStack's identity service) to permit external users, without a local OpenStack account, to access OpenStack services; achieved by off-loading identity verification to the user's federated identity provider. When OpenStack services are accessed by a federated user, the basic authentication flow orchestrated by the Keystone service can be outlined by four steps:

The OpenStack service provider redirects the user to their institution's identity provider, requesting authentication.

The user provides credentials and is authenticated at the identity provider.

The user is referred back to the OpenStack service provider with further user metadata.

Finally, the OpenStack service provider determines authorisation depending on the metadata provided.

A more detailed explanation on the background of OpenStack federated authentication can be found in a chapter of The Crossroads of Cloud and HPC, written by members of the OpenStack Scientific SIG.

Mapping and registering federated authentication metadata claims into authorisations on the OpenStack cloud can be complex already as it is, and it only gets multiplied by the variations in the metadata claims returned by the various identity providers from the different institutions; which is neither efficient nor good for your blood pressure. This is where Keycloak comes in, acting as a proxy to the federated identity providers and presenting consistent metadata claims to the OpenStack Keystone service for mapping to authorisations; more on Keycloak's role in federated authentication between cloud systems can be found in a previous blog post.

Platform Auth-Turnatives

The same federated authentication workflows can be used for user access to compute platforms deployed on the OpenStack cloud. This is all well and good for web-based user interfaces such as Open OnDemand, but what about console-based access, such as SSH? Terminal access is important in many cases, and web-based alternatives introduce additional friction.

Notwithstanding the limitation of a lack of web-based authentication, SSH authentication has other limitations in a dynamic and shared cloud environment:

Password-based authentication is vulnerable to dictionary-based attacks and keylogger malware.
Public key authentication introduces additional operational headaches for public key management, installation and revocation. Users can mishandle the private key, for example by sharing access with other team members, potentially leading to compromises.

SSH certificates provide an alternative authentication mechanism, with a similar user experience to public keys, but with the added benefit that they are account-bound, can be easily revoked, and have a limited lifetime; more on this later.

So if there were a way to automate the verification, authentication and distribution of SSH certificates to users, it would provide cloud-native federations an opportunity for a more consistent and seamless user onboarding experience. All while retaining the ability to offer access to web-based interfaces for those who prefer it.

Clouded by Federated Frustrations

Cloud federations are a great way to provide users with access to compute resources without requiring them to make local investments in additional hardware. However, managing the authentication of federated users can be an operational burden, particularly given the diversity of compute platforms and their authentication requirements.

Picture a scenario where a researcher from an external institute requires access to a Slurm cluster deployed on a federated cloud. A conventional process for setup of an authorised external user may begin with a request for a temporary account in the hosting institution's LDAP service. On-boarding a user usually entails further administrative toil, such as a local mailbox, signing of institutional agreements and periodic revalidation of the local account.

Alternatively, an institution may treat users authenticated via a federated partner with equivalence, and accept authorisations granted by the federation at large. OpenID Connect's web-based authentication flow would still require a separate mechanism for management of SSH public keys and their installation, rotation and revocation.

A new way to automate and streamline the process of registering and authorising user SSH keys has been developed by our colleagues at the Bristol Centre for Supercomputing. BriCS is host to the Isambard-AI service, a cornerstone of the UK's AI Research Resource Federation. The BriCS team have developed a certificate authority, Conch, and SSH connection manager, Clifton. By building on these components we were able to create an authentication flow using SSH certificates suitable for federated projects. These certificates differ from the traditional 'cross-referencing' of private and public SSH keys in a number of ways. Mainly, by binding a user's account to a newly generated public SSH key in the form of a certificate removes the need for SSH keys to be manually installed in the target system's user home directory ~/.ssh/authorized_keys; which eventually requires rotation, starting this process all over again. Instead, we have the user present an SSH key and authenticate their account; after which they're returned a signed SSH certificate, granting them instant access to their permitted services.

Now, that was a brief description of the user facing side of the process, which obviously doesn't account for what actually happens behind the scenes and the roles Conch & Clifton play in it. Therefore, an outline of the services and how it all comes together is expanded upon below!

Conch

As the SSH certificate authority (CA), Conch is responsible for the signing & issuing of SSH certificates. As such, the SSH CA must be trusted by the identity provider, user and target client.

Conch works by being able to communicate with an OIDC provider, which itself is likely configured to trust authorisations from federated institutional identity services. Once Conch has received a user's metadata and SSH key from Clifton, it waits for the user to try logging into their account to prove their identity; the success or failure of which will be reported via OIDC to Conch. If successful, Conch will then take the user's metadata and SSH key, and encrypt it all together with Conch's own private key, before the signed SSH certificate is returned to the user.

Conch's CA public key (the public counterpart to the private signing key) is added to the target compute platforms /etc/ssh/sshd_config as a TrustedUserCAKeys. This is so that the certificate's signature can be decrypted and verified, granting the user access.

Clifton

Clifton is installed on the the user's system.

Being the SSH connection manager, Clifton acts as the middleman between the user and Conch, an SSH certificate broker, if you will. The user presents their private SSH key to Clifton wanting it signed, and Clifton, on behalf of Conch, requests the addition of the user's metadata and proof of their identity in exchange. This is elegantly done by being able to launch a browser, or presenting a QR code, to an online authentication portal.

Once provided, Clifton then forwards all this information to Conch, who, if all the requirements are met, then signs the user's SSH key. Clifton then returns the signed SSH certificate to the user.

Unlocking the Cloud for Federated Freedom

Clifton's power is in how it handles and uses a range of custom metadata fields; finally providing federated institutes the level of per-user control and permission customisation that they've been craving for.

One such example of this is the ability to assign a list of projects to a user's account as part of their metadata. Which in turn provides a range of possibilities, by default, an individual certificate is provided for each project, each with its own unique user defined as {username}.{project_name}. However, with a little Rust know-how (the programming language, not the 2013 multiplayer survival game which was sadly not written in Rust) it is possible to scope, a single account to a dynamic set of projects whose permissions are inherited through defined group scopes.

A way in which we, at StackHPC, have utilised this is by taking advantage of this feature to make it compatible with the HPC management platform Waldur to provide a seamless experience for users to access project scoped Slurm clusters, with all the stress of managing authentication and permissions taken care of by Keycloak, Conch & Clifton. We'll have the headaches, so you don't have to!.

Building Bridges

As Conch and Clifton are both still in development, there are still a few kinks to iron out when it comes to installing and configuring them. Therefore, below will be a brief outline of the steps required to get them up and running using Helm.

Note

The following steps assume that you have an external Kubernetes cluster running, Helm installed with access to the cluster's kubeconfig, as well as, being exported to the KUBECONFIG environment variable.

Before we can deploy Conch, the OIDC provider, in this case Keycloak, needs to be configured to map a few custom user attributes so that some variables from Keycloak are made available to Conch. As the user attributes Conch is expecting are not part of the default Keycloak user attributes, they need to be added as a custom attribute field which can be filled for each user. To make these attributes appear as fields in the user's profile, they need to be created in Realm Settings -> User profile ->``Attributes``->``Create attribute``.

Note

Realms are isolated spaces where users, permissions and groups, amongst other things, are managed within Keycloak.

The two attributes that need to be added are:

short_name: The user's short name.

projects: A JSON list of project names the user is assigned to.

Note

Make sure to set Enabled when to Always for both attributes.

Once configured they should appear as fields in the user's profile, and can be filled in as needed. However, Keycloak now needs to know to pass these attributes onto Conch. This is done by:

Create a client scope called something like 'extra' in the Client scopes menu.

Set Assigned type to Default, then click on the 'extra' client scope.

Within 'extra', in the Mappers tab, click Add mapper and select By configuration.

Select the User Attribute.

Fill in the following fields for short_name:

Name: short_name

User Attribute: select 'short_name' from the dropdown.

Token Claim Name: short_name

Claim JSON Type: String

Add to ID token: On

Add to access token: On

Repeat steps 3 & 4 but filling in the following fields for projects:

Name: projects

User Attribute: select 'projects' from the dropdown.

Token Claim Name: projects

Claim JSON Type: JSON

Add to ID token: On

Add to access token: On

Now we need a Keycloak client to point the authentication to:

Create a new client in the Clients menu.

Fill in the following fields:

Client ID: conch

Valid Redirect URIs: *

Valid post logout redirect URIs: *

Web origins: *

Authentication Flow: Select 'Standard Flow', 'Direct Access Grants' and 'OAuth 2.0 Device Authorization Grant'.

Click Save.

Note

Don't forget to set the user's new attributes in the user's profile. The projects attribute should be a JSON list of project names like so: {\"proj1\": [\"slurm.ai.example\"], \"proj2\": [\"slurm.ai.example2\", \"random.example\"]}

From here, after making sure your KUBECONFIG variable has been exported, we are ready to configure and deploy Conch using Helm:

Create a values.yaml file with the following content:

---
config:
issuer: "https://keycloak.example.address.com/realms/example-realm-name"
platforms:
    service-ood:
    service-login:
    alias: "conch.auth"
    hostname: target IP address
    proxy_jump: can use the same as hostname but cannot be blank
port: 3000
signing_key_dir: "directory/in/hostname/where/keys/are/stored"
log_level: info
replicas:
ssh_signing_key_secret_name: conch-signing-key-secret

Create signing keys for Conch:

ssh-keygen -q -t ed25519 -f ssh_signing_key -C '' -N ''

Copy the public key to the hostname target client:

# If running on a local machine
scp ssh_signing_key.pub user@hostname_ip:/etc/ssh/ssh_signing_key.pub

# If running from the target client
cp ssh_signing_key.pub /etc/ssh/ssh_signing_key.pub

Add the public key to the target client's /etc/ssh/sshd_config:
TrustedUserCAKeys /path/to/ssh_signing_key.pub

Create a Kubernetes secret with the private signing key:

kubectl create secret generic conch-signing-key-secret --from-file=key=ssh_signing_key

Deploy Conch using Helm:

helm upgrade conch oci://ghcr.io/isambard-sc/charts/conch --version x.y.z --install --values values.yaml

Note

Avoid using a resolvable domain name for alias as it will cause issues when it us added to the user's ~/.ssh/config.

In order to be able to access conch externally you may need to configure an Ingress service with a configuration similar to the following:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: conch
annotations:
    cert-manager.io/cluster-issuer: Only applicable if using cert-manager
    meta.helm.sh/release-name: helm release name for conch
    meta.helm.sh/release-namespace: helm release namespace for conch
spec:
ingressClassName: nginx
tls:
- hosts:
    - ca.hostname-IP-address-here.sslip.io (dashes instead of dots in the IP address, example below)
    secretName: conch-signing-key-secret
rules:
- host: ca.123-123-123-123.sslip.io
    http:
    paths:
        - path: /
        pathType: Prefix
        backend:
            service:
            name: conch
            port:
                number: 80

Next up is Clifton, which is a little simpler:

Clone the Clifton repository:

git clone https://github.com/isambard-sc/clifton.git

From inside the Clifton directory build the Clifton binary:
cargo build

Create a config.toml file with the following content:

# Should the browser be automatically opened when authenticating
open_browser = true

# Should the QR code be shown when authenticating
show_qr = true

# The URL of the Keycloak realm
issuer_url = "https://keycloak.example.address.com/realms/example-realm-name"

# The OIDC (Keycloak) client ID
client_id = "conch"

# The default location of the identity to use
identity = "/path/to/the/ssh_key"

From inside the Clifton directory still, run Clifton:

./target/debug/clifton --config-file /path/to/your/clifton/config.toml auth

You should be presented with a QR code to scan, or your browser should open to authenticate. In order to log in the user must exist in Keycloak under Users.

Once authenticated, save the ssh config by running:
./target/debug/clifton ssh-config write

This will create new file ~/.ssh/config_clifton and add Include ~/.ssh/config_clifton to your ~/.ssh/config file.

Congratulations, you are now the proud new owner of an automated SSH certificate distributor with federation capabilities!

Conch-clusion

Here we have discussed the benefits of federated cloud resources, the challenges faced when authenticating external users and how this is amplified when considering SSH authentication. However, we have also demonstrated how, with the use of Conch & Clifton, the use of SSH certificates can provide a pathway to a more manageable and secure user onboarding experience.

These may be early days, but the potential for these tools to be used in federated cloud projects certainly looks promising. The main blockades preventing a wider adoption as it stands are the lack of native support by cloud management platforms to streamline the integration of services like Conch & Clifton; with the main trouble coming from the vast amount of unique user attribute mappings required for each service. Therefore, if user attributes were to be standardised or shared across a cloud management service, like Waldur, then configuring federated SSH authentication would seemingly become a trivial task.

Acknowledgements

I would like to take a moment to specifically thank and highlight Matt Williams, who is the mastermind behind both Conch and Clifton. His work has helped collaborative research via federated cloud take another step closer to being a reality.

Useful links

StackHPC FluxCD demo apps (Still in development)
Conch sources - GitHub repository
Clifton sources - GitHub repository
SSH Certificates in more detail

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Bluesky, LinkedIn or directly via our contact page.

StackHPC

Other articles

Putting RoCE to work - troubleshooting High Performance Ethernet

High Performance Ethernet - to IB or not to IB