For optimal reading, please switch to desktop mode.
In our current digital age, no other combination of words have been more on everyone's
lips than "AI" and "Cloud", perhaps other than "GPUs" and "sorry, how much!?".
Despite the ever-increasing cost of AI-capable infrastructure,
domestic and international collaborative research continues to grow.
Federated cloud resources provide a solution by pooling investments
in AI infrastructure among a group of collaborating institutions.
But that raises the question, how do you efficiently authenticate
and manage an evolving group of external users from various
institutions for specific services, without it becoming a logistical
nightmare?
Auth to a Great Start
One of OpenStack's strengths is its support for federated authentication.
This includes being able to configure Keystone
(OpenStack's identity service) to permit external users, without a local OpenStack account, to access OpenStack
services; achieved by off-loading identity verification to the user's federated identity provider.
When OpenStack services are accessed by a federated user, the basic authentication flow orchestrated by the Keystone
service can be outlined by four steps:
- The OpenStack service provider redirects the user to their institution's identity provider, requesting authentication.
- The user provides credentials and is authenticated at the identity provider.
- The user is referred back to the OpenStack service provider with further user metadata.
- Finally, the OpenStack service provider determines authorisation depending on the metadata provided.
A more detailed explanation on the background of OpenStack federated
authentication can be found in a chapter of The Crossroads of Cloud and HPC,
written by members of the OpenStack Scientific SIG.
Mapping and registering federated authentication metadata claims into
authorisations on the OpenStack cloud can be complex already as it is,
and it only gets multiplied by the variations in the metadata claims returned
by the various identity providers from the different institutions; which
is neither efficient nor good for your blood pressure. This is where Keycloak
comes in, acting as a proxy to the federated identity providers and presenting consistent metadata claims to the OpenStack Keystone service for mapping to authorisations;
more on Keycloak's role in federated authentication between cloud systems can be found in a previous blog post.
Clouded by Federated Frustrations
Cloud federations are a great way to provide users with access to compute resources
without requiring them to make local investments in additional hardware. However, managing
the authentication of federated users can be an operational burden, particularly given
the diversity of compute platforms and their authentication requirements.
Picture a scenario where a researcher from an external institute
requires access to a Slurm cluster deployed on a federated cloud.
A conventional process for setup of an authorised external user may
begin with a request for a temporary account in the hosting
institution's LDAP service. On-boarding a user usually entails
further administrative toil, such as a local mailbox, signing of
institutional agreements and periodic revalidation of the local
account.
Alternatively, an institution may treat users authenticated via a
federated partner with equivalence, and accept authorisations granted
by the federation at large. OpenID Connect's web-based authentication
flow would still require a separate mechanism for management of SSH
public keys and their installation, rotation and revocation.
A new way to automate and streamline the process of registering and
authorising user SSH keys has been developed by our colleagues at
the Bristol Centre for Supercomputing.
BriCS is host to the Isambard-AI service,
a cornerstone of the UK's AI Research Resource Federation.
The BriCS team have developed a certificate authority, Conch, and SSH connection
manager, Clifton. By building on these components we were able to create an authentication
flow using SSH certificates suitable for federated projects. These certificates differ from the
traditional 'cross-referencing' of private and public SSH keys in a number of ways. Mainly, by binding
a user's account to a newly generated public SSH key in the form of a certificate removes the need for
SSH keys to be manually installed in the target system's user home directory ~/.ssh/authorized_keys; which
eventually requires rotation, starting this process all over again. Instead, we have the user present
an SSH key and authenticate their account; after which they're returned a signed SSH certificate,
granting them instant access to their permitted services.
Now, that was a brief description of the user facing side of the process, which obviously doesn't
account for what actually happens behind the scenes and the roles Conch & Clifton play in it.
Therefore, an outline of the services and how it all comes together is expanded upon below!
Conch
As the SSH certificate authority (CA), Conch is responsible for the signing & issuing of SSH
certificates. As such, the SSH CA must be trusted by the identity provider, user and target
client.
Conch works by being able to communicate with an OIDC provider, which itself is likely configured
to trust authorisations from federated institutional identity services. Once Conch has received
a user's metadata and SSH key from Clifton, it waits for the user to try logging into their
account to prove their identity; the success or failure of which will be reported via OIDC
to Conch. If successful, Conch will then take the user's metadata and SSH key, and encrypt it all
together with Conch's own private key, before the signed SSH certificate is returned to the user.
Conch's CA public key (the public counterpart to the private signing key)
is added to the target compute platforms /etc/ssh/sshd_config as a TrustedUserCAKeys. This
is so that the certificate's signature can be decrypted and verified, granting the user access.
Clifton
Clifton is installed on the the user's system.
Being the SSH connection manager, Clifton acts as the middleman between the user and Conch,
an SSH certificate broker, if you will. The user presents their private SSH key to Clifton
wanting it signed, and Clifton, on behalf of Conch, requests the addition of the user's metadata
and proof of their identity in exchange. This is elegantly done by being able to launch a
browser, or presenting a QR code, to an online authentication portal.
Once provided, Clifton then forwards all this information to Conch, who, if all the requirements
are met, then signs the user's SSH key. Clifton then returns the signed SSH certificate to the
user.
Unlocking the Cloud for Federated Freedom
Clifton's power is in how it handles and uses a range of custom metadata
fields; finally providing federated institutes the level of per-user control and permission
customisation that they've been craving for.
One such example of this is the ability to assign a list of projects to a user's account as part
of their metadata. Which in turn provides a range of possibilities, by default, an individual
certificate is provided for each project, each with its own unique user defined as
{username}.{project_name}. However, with a little Rust know-how (the programming language,
not the 2013 multiplayer survival game which was sadly not written in Rust) it is possible to
scope, a single account to a dynamic set of projects whose permissions are inherited through
defined group scopes.
A way in which we, at StackHPC, have utilised this is by taking advantage of this feature to
make it compatible with the HPC management platform Waldur to provide
a seamless experience for users to access project scoped Slurm clusters, with all the stress
of managing authentication and permissions taken care of by Keycloak, Conch & Clifton.
We'll have the headaches, so you don't have to!.
Building Bridges
As Conch and Clifton are both still in development, there are still a few kinks to iron out when
it comes to installing and configuring them. Therefore, below will be a brief outline of the steps
required to get them up and running using Helm.
Note
The following steps assume that you have an external Kubernetes cluster running, Helm
installed with access to the cluster's kubeconfig, as well as, being exported to
the KUBECONFIG environment variable.
Before we can deploy Conch, the OIDC provider, in this case Keycloak, needs to be configured to map
a few custom user attributes so that some variables from Keycloak are made available to Conch.
As the user attributes Conch is expecting are not part of the default Keycloak user attributes, they
need to be added as a custom attribute field which can be filled for each user. To make these attributes
appear as fields in the user's profile, they need to be created in Realm Settings -> User profile
->``Attributes``->``Create attribute``.
Note
Realms are isolated spaces where users, permissions and groups,
amongst other things, are managed within Keycloak.
The two attributes that need to be added are:
- short_name: The user's short name.
- projects: A JSON list of project names the user is assigned to.
Note
Make sure to set Enabled when to Always for both attributes.
Once configured they should appear as fields in the user's profile, and can be filled in as needed.
However, Keycloak now needs to know to pass these attributes onto Conch. This is done by:
- Create a client scope called something like 'extra' in the Client scopes menu.
- Set Assigned type to Default, then click on the 'extra' client scope.
- Within 'extra', in the Mappers tab, click Add mapper and select By configuration.
- Select the User Attribute.
- Fill in the following fields for short_name:
- Name: short_name
- User Attribute: select 'short_name' from the dropdown.
- Token Claim Name: short_name
- Claim JSON Type: String
- Add to ID token: On
- Add to access token: On
- Repeat steps 3 & 4 but filling in the following fields for projects:
- Name: projects
- User Attribute: select 'projects' from the dropdown.
- Token Claim Name: projects
- Claim JSON Type: JSON
- Add to ID token: On
- Add to access token: On
Now we need a Keycloak client to point the authentication to:
- Create a new client in the Clients menu.
- Fill in the following fields:
- Client ID: conch
- Valid Redirect URIs: *
- Valid post logout redirect URIs: *
- Web origins: *
- Authentication Flow: Select 'Standard Flow', 'Direct Access Grants' and 'OAuth 2.0 Device Authorization Grant'.
- Click Save.
Note
Don't forget to set the user's new attributes in the user's profile.
The projects attribute should be a JSON list of project names like so:
{\"proj1\": [\"slurm.ai.example\"], \"proj2\": [\"slurm.ai.example2\", \"random.example\"]}
From here, after making sure your KUBECONFIG variable has been exported,
we are ready to configure and deploy Conch using Helm:
- Create a values.yaml file with the following content:
---
config:
issuer: "https://keycloak.example.address.com/realms/example-realm-name"
platforms:
service-ood:
service-login:
alias: "conch.auth"
hostname: target IP address
proxy_jump: can use the same as hostname but cannot be blank
port: 3000
signing_key_dir: "directory/in/hostname/where/keys/are/stored"
log_level: info
replicas:
ssh_signing_key_secret_name: conch-signing-key-secret
- Create signing keys for Conch:
ssh-keygen -q -t ed25519 -f ssh_signing_key -C '' -N ''
- Copy the public key to the hostname target client:
# If running on a local machine
scp ssh_signing_key.pub user@hostname_ip:/etc/ssh/ssh_signing_key.pub
# If running from the target client
cp ssh_signing_key.pub /etc/ssh/ssh_signing_key.pub
- Add the public key to the target client's /etc/ssh/sshd_config:
TrustedUserCAKeys /path/to/ssh_signing_key.pub
- Create a Kubernetes secret with the private signing key:
kubectl create secret generic conch-signing-key-secret --from-file=key=ssh_signing_key
- Deploy Conch using Helm:
helm upgrade conch oci://ghcr.io/isambard-sc/charts/conch --version x.y.z --install --values values.yaml
Note
Avoid using a resolvable domain name for alias as it will cause issues
when it us added to the user's ~/.ssh/config.
In order to be able to access conch externally you may need to configure an Ingress service with
a configuration similar to the following:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: conch
annotations:
cert-manager.io/cluster-issuer: Only applicable if using cert-manager
meta.helm.sh/release-name: helm release name for conch
meta.helm.sh/release-namespace: helm release namespace for conch
spec:
ingressClassName: nginx
tls:
- hosts:
- ca.hostname-IP-address-here.sslip.io (dashes instead of dots in the IP address, example below)
secretName: conch-signing-key-secret
rules:
- host: ca.123-123-123-123.sslip.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: conch
port:
number: 80
Next up is Clifton, which is a little simpler:
- Clone the Clifton repository:
git clone https://github.com/isambard-sc/clifton.git
- From inside the Clifton directory build the Clifton binary:
- Create a config.toml file with the following content:
# Should the browser be automatically opened when authenticating
open_browser = true
# Should the QR code be shown when authenticating
show_qr = true
# The URL of the Keycloak realm
issuer_url = "https://keycloak.example.address.com/realms/example-realm-name"
# The OIDC (Keycloak) client ID
client_id = "conch"
# The default location of the identity to use
identity = "/path/to/the/ssh_key"
- From inside the Clifton directory still, run Clifton:
./target/debug/clifton --config-file /path/to/your/clifton/config.toml auth
You should be presented with a QR code to scan, or your browser should open to authenticate. In order to
log in the user must exist in Keycloak under Users.
- Once authenticated, save the ssh config by running:
./target/debug/clifton ssh-config write
This will create new file ~/.ssh/config_clifton and add Include ~/.ssh/config_clifton
to your ~/.ssh/config file.
Congratulations, you are now the proud new owner of an automated SSH certificate distributor
with federation capabilities!
Conch-clusion
Here we have discussed the benefits of federated cloud resources, the challenges faced when
authenticating external users and how this is amplified when considering SSH authentication.
However, we have also demonstrated how, with the use of Conch & Clifton, the use of SSH
certificates can provide a pathway to a more manageable and secure user onboarding experience.
These may be early days, but the potential for these tools to be used in federated cloud projects
certainly looks promising. The main blockades preventing a wider adoption as it stands are the
lack of native support by cloud management platforms to streamline the integration of services like
Conch & Clifton; with the main trouble coming from the vast amount of unique user attribute
mappings required for each service. Therefore, if user attributes were to be standardised or shared
across a cloud management service, like Waldur, then configuring federated SSH authentication would
seemingly become a trivial task.
Acknowledgements
I would like to take a moment to specifically thank and highlight Matt Williams,
who is the mastermind behind both Conch and Clifton. His work has helped collaborative research via
federated cloud take another step closer to being a reality.
Get in touch
If you would like to get in touch we would love to hear from you. Reach out to
us via Twitter, LinkedIn or directly via our contact
page.