Enter the automation dream with Ansible and AWX

Dive into the world of automation and see how it can save your sanity.

Automaception

I try to be as laid-back as possible when it comes to my job, which is managing IT infrastructures. This means I paradoxically spend huge amounts of time optimizing my job. Luckily machines lend themselves all too well for this purpose (the game Factorio takes up more of my time than I wish to disclose). Computers even more so, which is probably half the reason I work in this particular field. So when I was introduced to
Ansible, I was sold.

It enabled me to watch computers manage themselves at the press of a button. One can neatly organize the infrastructure configuration into groups and roles and the community supplies collections to manage a plethora of components. Optimizing the Ansible book and making sure the server park is configured properly and change propagate through the server in an organized manner is a lovely challenge. Finishing (a part of) this Ansible book is bliss, just run the playbook and the controller does all the work for you. Now I have all the time in the world to lay back and look at the automation.

This, however, will give room for a small itch to grow. It starts in the back of your mind in
a sarcastic tone:

Don’t you love having to press buttons so computers will do things?
And at first you don’t understand, but then it adds:
Wouldn’t it be nice if you could automate that?
This gives rise to a vicious circle that will turtle all the way down.

So naturally some genius (well, multiple) thought to automate the automation and AWX (The upstream version of the Ansible Automation Platform) was born. Of course that is not enough and we automate the automation that automates the automation: controller configuration. This is a story of a possible avenue one could take walking down past all the turtles.

Disclaimer: You do not need to know much about AWX and/or the infra.controller_configuration collection to read this article, but I would advise to get familiar if you wish to follow in my footsteps. It does help if you know how Ansible works conceptually.

I’m going to skip the genesis of my journey and drop you a few months after the point where the itch in my mind got too loud. At this point the team that I’m a part of manages ~500 virtual and physical servers with one large Ansible book. We were manually running so many playbooks, it drove us nuts. We just had AWX running and were in the middle of migrating to it, when my keyboard loving coworker loudly complained “I feel I’ve regressed to clicking in AWX’s web interface instead of actually automating something!”
Everybody knew that such a comment meant he was on to something. It turned out he was on to two “somethings”:

Managing the actual Ansible code & projects stored and run within AWX
Managing AWX and the executions of above mentioned code

I will focus on the management of AWX itself and leave the migration to another time
and/or writer.

Managing AWX itself

This new layer of automation that we deployed has its own configuration that is stored in a database. You can change this configuration by talking to the AWX API or clicking in its web interface; these however are manual actions and we do not like those. The infra.controller_configuration collection helps us out by providing an Ansible way to manage the AWX configuration. We began to configure AWX and its components through an Ansible playbook. This resulted in a playbook that configures the following components of AWX:

Global AWX settings e.g. Log settings, LDAP settings (including global admins and auditors) and just general properties of this AWX instance.
Organizations, Teams and their roles within the organizations.
Execution Environments, because the default only works in a ‘Proof of Concept’.
Container groups, because the default really never works at all.
Credentials, which all come from Hashicorp Vault.
Projects, of course pulled in from Git repos.
Inventories, including variables, also pulled in from Git repos.
Job templates, because you want to deploy changes in stages and not all at once.
Schedules, to actually run those jobs.
Notifications, so when jobs failed we automatically get a Jira-issue and a notification in our chat application.

If we were still on the first turtle, we would write all the code, integrate it into our big Ansible book and run it alongside the rest. However, we have descended a turtle, so we put all this code in a separate Ansible book, run that and watch AWX take off with the big Ansible book. It was bliss. That bliss was sadly short-lived, because not soon after AWX took off, it ‘malfunctioned’. What I mean is, someone changed the AWX configuration described above, deployed the new ‘feature’, but made a mistake and everything came to a grinding halt.

Our team was all too quick with coming up with a very nice process to prevent this from happening: Someone else should review the changes, approve them and only then we would integrate the change in our Ansible books, in the case of a major change, we would have two co-workers review the code. This beautiful process has one small flaw: We seemed to be right back at our starting point, where we were spending huge amounts of time double/triple checking code. This deeper turtle level did not seem very useful after all. However if Inception taught us one thing, it is that you can always go down a level. So we traveled down to the next turtle.

And that involves GitOps, GitLab to be exact (read the closing remarks if you are surprised we do not already use GitOps). A GitLab-project was created, the Ansible book containing the code for AWX was uploaded and we started building a CI/CD pipeline to relieve us of our manual labor. That pipeline consists of the following jobs:

Checks our YAML files for correct syntax with yamllint.
Checks our Ansible code for pitfalls with ansible-lint.
Checks for potential vulnerabilities in our code, using the automated scanning features of GitLab.
Performs a dry-run of the entire code.
Deploys the change to our staging AWX instance.
Deploys the change to our production AWX instance, only after we approve and schedule it.

The astute reader will notice that we still have a manual action. While this is true, we no longer have to worry about syntax errors, typos and bugs, all this without pressing a button. And by using a staging AWX instance we can easily check if the change breaks anything instead of reasoning about it. On top of not having to run this yourself nor waiting for the deploy, our change management is improved in the following way:

Every change must be linked to a Jira-issue.
A majority of the team must approve of the change and a record of this approval is kept.
There is more time to inform end-users of upcoming changes.
Reverting is a piece of cake.
Security/Compliancy scans provide extra information to improve our infrastructure.
Code quality reports and test results are automatically made available to our Compliancy department.

Conclusion

This new way of managing Ansible does not only provide the team itself with a more robust way to manage the infrastructure, it also provides more scalability. Besides the advantages for us, the organization has better insight in the status of the underlying infrastructure. But the most important result is of course: I can now finally lay back and watch our servers manage themselves, a lovely feeling, a job well done. My team and I can now spend time thinking about new features or helping the organization to improve itself. If something needs changing, I just adjust the code, commit it to GitLab and watch the magic unfold. A true miracle of technology.
…
“Don’t you love having to press buttons, so the computer will do things?”
…
“Wouldn’t it be nice if you could automate that?”I’ll never be able to silence that annoying little voice in my head will I? Or maybe AI can finally silence that voice?

Closing remarks

This post is based on a real implementation done by me for a customer, but parts have been adapted to better suit the narrative. Some remarks on these alterations are:

GitOps was already in use, but was expanded and integrated properly instead of implemented from scratch.
A lot of peripheral matters have been glossed over or left out, otherwise it would have turned into a white-paper.
In my opinion, to implement secret management into AWX is a non-trivial matter and requires a great deal of attention.
The way changes to your Ansible books flow through your projects and jobs is an interesting challenge.
AWX is only supported by using the AWX-Operator on Kubernetes, they are dragons, you’ve been warned. But boy are they cool.

Automaception: Enter the automation dream with Ansible and AWX