For optimal reading, please switch to desktop mode.
Automation What Is It Good For?
Here at StackHPC we endeavour to automate everything we can when deploying and maintaining OpenStack and related services for our clients. Automation of these tasks is typically achieved with Ansible, which is a popular suite of software that enables provisioning, configuration and management of hardware and services via Infrastructure as Code (IaC). The advantage of using software such as Ansible, is to perform these tasks in an efficient, reliable and repeatable manner.
This blog post will explore automation going on behind the scenes rather than automation that directly interacts with client deployments. The tasks that have been automated include running tests, synchronising with upstream, tagging new commits ready for release in addition to managing our GitHub organisation as a whole. These tasks can be described as mundane and monotonous and it would be a waste of developer time to manually perform these actions.
There are two automation services that have been utilised in this project. Firstly, GitHub Workflows has been used when the automation concerns itself with the contents and state of a GitHub repository. Secondly, Terraform has been used for configuring our GitHub organisation ensuring that the organisation matches the state described.
Automation Against Repositories
In this section we shall take a closer look at automation that takes place against repositories hosted on GitHub. GitHub provides a service known as GitHub Workflows which allows developers to compose jobs made up of steps or actions. These workflows can be triggered by various events such as git operations (push, merge or tagged), creation of pull requests or can be triggered manually via a workflow dispatch. When triggered the workflows will be picked either by a public runner maintained by GitHub or a private runner. Workflows are written in YAML and are broken down into jobs which are broken down into steps. Steps can use Actions which provide complex functionality similar to modules within Ansible. Theses actions are written using TypeScript, however, if you already have the desired functionality written in a Python script or Bash script then you can run them instead. The tasks that have been automated with the help of GitHub Workflows are as follows; test automation, upstream synchronisation and tag & release.
Performing tests is a crucial process within software development as thorough tests can help identify issues or fix failing unit tests. However, setting up an environment and running these tests can be quite time consuming so instead we use a GitHub workflow which will perform these tests every time someone opens a pull request (PR). This will provide immediate insight into the contribution they are making allowing them to make changes to meet coding style or fix failing unit tests.
Here at StackHPC we deploy OpenStack based clouds and as a result we maintain forks of many OpenStack projects, such as; Cloudkitty, Kolla Ansible and Barbican. We manage these downstream forks to quickly introduce bug fixes and features before submitting them upstream. However, an issue arises whereby the two repositories fall out of sync and changes will need to pulled down from upstream. As time goes on the gap between downstream and upstream will grow larger making any attempt to catch up more challenging. Therefore, we are using a GitHub Workflow which opens separate PRs containing all the changes for each of the currently supported release series (victoria, wallaby, xena, yoga). This workflow is configured to run once per week on a Monday at 8:15 AM UTC, this is achieved with GitHub Workflows which supports scheduling workflows with cron.
name: Upstream Sync 'on': schedule: - cron: '15 8 * * 1' workflow_dispatch:
Automating these tasks has freed up developers to work on more important items. Also it has ensured that upstream work is merged in a timely manner without introducing bugs or conflicts with the assistance of automated testing.
The final workflow is responsible for tagging of commits and publishing releases so they can be consumed within configurations and the StackHPC Release Train. We use git tags to make it easier to specify a version of a particular repository within our Kayobe Configuration. The tags follow the convention of using the stackhpc prefix followed by the latest upstream tag available with one additional version number. For example stackhpc/220.127.116.11, with the major version number equalling 15 it means it belongs to Xena and .5 means that is the fourth downstream tag. The workflow is triggered whenever a developer pushes to a given release series branch such as stackhpc/xena. The workflow will then attempt to determine the latest downstream tag and latest upstream tag we are carrying on that particular branch. If we already have downstream tags then it shall simply increment the fourth version number, otherwise it will start the count from one. Once a tag has been created and pushed another job within the workflow will publish a release in order for the release to show under the releases tab.
This PR has been opened by the upstream sync workflow in addition to this the tox tests had completed and reported back.
Finally, due to the requirements that workflows must be contained within repository and possibly branch you are targeting it means that these workflows must be disseminated throughout the intended repositories. Unfortunately GitHub does not provide such a feature therefore we have created one last workflow which will automatically distribute the workflows to where they need to be. Firstly, the workflows are stored within our .github repository which is a special repository most notably as it allows you to store workflow templates. In our .github repository we have the workflows themselves under .github/workflows and then under templates we have our workflow templates which simply uses the corresponding workflow. It is these templates that we aim to push to all required repositories and branches. To achieve this we have created a workflow known as source-repo-sync.yml. However, unlike the other workflows all of the functionality is stored within an Ansible Playbook. This has been done to take advantage of the idempotency that Ansible offers. Below is an Asciinema recording of this workflow/playbook in action.
From our experience GitHub Workflows provides a powerful yet sometimes limited experience regarding automating tasks surrounding git repositories.
Automation Against Our GitHub Organisation
In this section we shall explore how we automate the management of our GitHub organisation with the use of Terraform and the GitHub provider. Terraform is software designed to deploy and configure cloud infrastructure using purpose built providers for various cloud services such as; Amazon Web Services, Google Cloud, Digital Ocean. Providers also exist for GitHub, OpenStack and Dominos. These providers are built on top of the relevant APIs enabling Terraform users to configure their infrastructure without understanding the API.
Terraform codifies cloud APIs into declarative configuration files.
The GitHub provider covers a significant amount of the API enabling users to configure various aspects of the GitHub organisation and repositories. Currently we are using the provider to control the following items; branch protection rules, labels, teams, team members and repositories. By using Terraform in this manner we can ensure important code security techniques such as branch protection rules are enforced and employees are members of the correct team granting them the necessary privileges to carry out their job.
Branch protection rules are an important component of repositories as they ensure that branches can only be created or deleted by specific individuals. In addition to this branch protection rules can also enforce requirements on users writing commits to the branch. Such as preventing pushes and instead requiring a review to be performed by the relevant individuals such as code owners. Also these rules can require that specific GitHub workflows pass without issue, providing reassurance that code styling is maintained and unit tests pass. Terraform is used within our organisation to automate the deployment of such rules as manually applying such rules would be time consuming.
GitHub provides the ability to create teams to allow for structure within the organisation. Whilst this feature allows for refined permissions on a team by team basis, a private area for discussions and integration within codeowners we only use it for codeowners support. The reason for this is we use other services such as Slack for communicating within a team and to take advantage of the team permissions would require restructuring the team layout as we have a near flat structure. However, codeowners support provides us with the ability to label a repository as being the responsibility of a given team. In doing so that team will be automatically assigned and notified when a pull request is opened, prompting them for review.
One concern we had regarding the automation of repositories was the potential for irretrievable data loss whereby someone misconfigures the Terraform plan leading to the deletion of repositories. Whilst GitHub provides support for recovering deleted repositories it is best not to rely upon such a facility and instead prevent the possibility of deletion from occurring in the first place. Fortunately Terraform provides prevent_delete which is a lifecycle attribute that signals to Terraform that under no circumstance may this resource be deleted. Therefore, any plan that calls for a repository to be deleted will not pass checks and will be prevented from running.
Integrating Terraform into a GitHub workflow is easy with the HashiCorp provided action hashicorp/setup-terraform which can be used to setup Terraform with the required credentials. We then add steps for linting, validation and producing a plan before applying the changes. Note if running from a pull request the workflow will not perform any changes until after the pull request has been reviewed, accepted and merged. With the assistance of the HashiCorp guide on GitHub Actions you can also have the workflow report the contents of the plan as a comment on the pull request that triggered the workflow. This comment contains all of the information relating to the actions Terraform will perform if accepted and merged. This should provide the reviewer enough information to make an informed decision about whether to accept the pull request or request changes.
Automated comment left by the stackhpc-ci bot containing information about the changes Terraform will attempt to make if merged.
One final thing worth mentioning is that Terraform works best when it has complete control over the resources it has been told to manage. As all of the repositories that Terraform manages within our organisation have been created in the conventional manner, via the GitHub web UI or API, it means that the resources must be imported into the Terraform statefile. Contained within the documention for the GitHub provider are examples on how to import resources into the statefile, however, by now it should be clear we choose to automate where ever possible. So to achieve the aim of importing all of the existing repositories, teams, labels and branch protection rules we have developed a Python script capable of importing any missing resources called import_resources.py.
This script works by parsing the contents of the terraform.tfvars.json which contains of the resources we want Terraform to handle. With the contents of this file we can use terraform state list to query if the resource is contained within the statefile already. If the resource is missing from the statefile then the script will construct and execute a terraform import command which will cause Terraform to query the GitHub API such that is can populate the statefile. The end result is a statefile that contains any and all resources listed within the terraform.tfvars.json file and would run within a GitHub workflow if a pull request includes changes to include existing resources.
In this blog post we explored how we automate various tasks surrounding the management of our GitHub repositories and organisation through the use of GitHub workflows, Ansible playbooks and Terraform. The benefits to this approach are things such as the idempotency of Ansible and Terraform in addition to the reassurance they offer as they act as a single source of truth for what they manage. However, there are also some weaknesses of this approach such as the difficulties with developing and debugging as they were running against real repositories and our organisation. To fix this any new features should be tested against a test organisation and repositories which can be reverted back into a desired state allowing for thorough testing before deploying against production resources.