For optimal reading, please switch to desktop mode.
One of the great advantages arising from our technology choices has
been that through standardising on Ansible we have been able
to use a single, simple tool to drive everything we do.
Ansible is not really a programming language, and modularity cannot
be ensured without some amount of programmer discipline. One great
tool in providing a level of modularity and component reuse has
been Ansible Galaxy. Our
OpenStack deployment toolbag
has been steadily growing and we've been thrilled to see others
make use of our components as well. Share and enjoy!
Unfortunately, we are writing this post because of an event today
which apparently without notice broke all our builds, and also all
the work of our clients who use our technology.
It's Working Great, What Could Possibly Go Wrong?
We started to notice oddities when updating some of our roles on
Galaxy earlier today. The first thing was that the implicit naming convention
used for git repos such as our new BeeGFS role
was no longer being honoured, so that the role name on Galaxy
changed from beegfs to ansible-role-beegfs. As a result,
the role could no longer be found by playbooks that required it.
This we fixed through adding a metadata tag role_name which
explicitly sets the name. We did this to each of our 32 roles.
Our repos are long established, many are cloned, some are forked.
We can't simply rename them on a whim.
On pushing the change that sets this metadata tag, every one of our
roles with a hyphenated name was silently converted to using
underscores instead. This may seem innocuous, but the consequence
is that, again, every playbook that referenced these roles - which
is every playbook we write - could no longer retrieve the roles
it required from Ansible Galaxy.
The root cause appears to be the combined effect of two changes.
Ansible has removed the implicit naming convention for the git
repos that back Galaxy roles. Around the same time they have
introduced a newer, stricter naming convention for Galaxy roles
that prevents names containing hyphens. The backwards-compatibility
plans for these two changes are mutually exclusive. Unfortunately
most of our roles fall into both categories.
We are not out of the woods as it appears the role_name tag
that we now require to explicitly set the correct name for our roles
may also be about to be deprecated. This may leave us
needing to rename all the git repos for our roles.
What about Kayobe?
OpenStack Kayobe is a project that
makes extensive use of Galaxy for reuse and modularity. At the time
of writing Kayobe's CI is also broken by this change, and an extensive
search-and-replace patchset
is required, pending the outcome of our requests for upstream resolution.
What Do Our Clients Need to Do?
In summary, there seem to be a number of tedious but simple changes that
must be applied everywhere:
- All our roles now have underscores instead of hyphens in them from now on.
This appears to be an inevitable change to accommodate forwards compatibility
for future versions of Galaxy. We'd like to see a server-side fix to Galaxy to enable
recognition of either hyphens or underscores, thus enabling a smooth
transition.
- The requirements and role invocations of every playbook that references
them will need to be updated to change occurrences of - with _.
We will commit those changes to our repos, but all clients will need to pull
in the new changes. This should happen automatically when repos are cloned.
- We might not be done with these build-breaking changes yet, although
hopefully there will be a way forward that doesn't break things for users.
Let's hope this kind of event doesn't happen too often in future...