Introduction
Tools such as CFEngine, Puppet, Chef, and Ansible have radically changed infrastructure automation by providing a structured framework for organizing and sharing configuration management code, with Ansible being one of the newer tools. Ansible is widely used to automate the configuration of your IT environment and is highly flexible. Although its creators position Ansible as a tool that "provides open-source automation that is simple, flexible, and powerful," it is not always clear what exactly makes robust Ansible code robust. And as simple as it sounds, it is not always so in practice—especially in more complex environments and setups.
Within Ansible, roles are a concept that allows you to encapsulate code. Roles contain tasks, variables, handlers, and files, brought together in a modular, reusable unit that, if written correctly, is easy to share between projects and teams. Whether you manage a handful of servers or orchestrate complex (multi-)cloud environments, roles help you write cleaner, more maintainable code. At the same time, there are many challenges, and it is difficult to design a role that you would label as robust and "complete." In this blog, we therefore introduce a number of rules of thumb for what characterizes a robust Ansible role, so that you can write better Ansible roles.
A few rules of thumb
The rules of thumb in this blog are:
- You only use the main.yml of a role to include or import tasks.
- All tasks must be optimized for efficiency.
- Data and code must be completely separate
- Tasks must run on an include basis
- Roles must handle check mode correctly
- All tasks must be idempotent.
- Variables must have the correct namespace and be hierarchical.
- Tags must be implemented and documented.
- Roles must be validated
- Roles must be documented
Please note: these are rules of thumb. Depending on the use and requirements of a role, some rules may be less relevant or additional rules may be necessary. In general, they are applicable in many situations. Below, we will elaborate on each one, with context and examples.
Illustrations with a role: linux_user
To illustrate the rules of thumb, we will create a role that Linux users can create. We do this as follows:
# Creation of a basic role structure using ansible-galaxy
ansible-galaxy role init linux_user
# Navigate into the new role
cd linux_user/
# Create a supplementary tasks file
touch tasks/linux_user.yml
Copy code
Our directory structure now looks like this:
# ls ./*
./README.md
./defaults:
main.yml
./files:
./handlers:
main.yml
./meta:
main.yml
./tasks:
linux_user.yml main.yml
./templates:
./tests:
inventory test.yml
./vars:
main.yml
Copy code
This provides us with a basic structure to work with. In main.yml, we list the tasks for all user-related tasks:
# cat tasks/main.yml
---
- name: Include Linux user tasks
loop: "{{ linux_users }}"
loop_control:
loop_var: user
label: "{{ user }}"
tags:
- linux_user
- linux_user_authorized_keys
- linux_user_create
when:
- ansible_facts["os_family"] in linux_user_allowed_os_families
- linux_user_allowed_target_hostgroups | intersect(group_names)
ansible.builtin.include_tasks: linux_user.yml
Copy code
The linux_user.yml file shown above contains the actual tasks for creating users.
# cat tasks/linux_user.yml
---
- name: Create user "{{ user.name }}"
become: true
notify:
- "Send mail notification about user creation"
tags:
- linux_user
- linux_user_create
ansible.builtin.user:
# Required
name: "{{ user.name }}"
# Optional
append: "{{ user.append | default(omit) }}"
comment: "{{ user.comment | default(omit) }}"
create_home: "{{ user.create_home | default(omit) }}"
force: "{{ user.force | default(omit) }}"
group: "{{ user.group | default(omit) }}"
groups: "{{ user.groups | default(omit) }}"
hidden: "{{ user.hidden | default(omit) }}"
home: "{{ user.home | default(omit) }}"
password: "{{ user.password | default(omit) }}"
password_expire_account_disable: "{{ user.password_expire_account_disable | default(omit) }}"
password_expire_max: "{{ user.password_expire_max | default(omit) }}"
password_expire_min: "{{ user.password_expire_min | default(omit) }}"
password_expire_warn: "{{ user.password_expire_warn | default(omit) }}"
password_lock: "{{ user.password_lock | default(omit) }}"
remove: "{{ user.remove | default(omit) }}"
shell: "{{ user.shell | default(omit) }}"
state: "{{ user.state | default(omit) }}"
system: "{{ user.system | default(omit) }}"
uid: "{{ user.uid | default(omit) }}"
umask: "{{ user.umask | default(omit) }}"
update_password: "{{ user.update_password | default(omit) }}"
- name: Set authorized keys for user "{{ user.name }}"
become: true
loop: "{{ user.authorized_keys | default(linux_user_default_authorized_keys) }}"
loop_control:
loop_var: key
label: "{{ key.key }}"
tags:
- linux_user
- linux_user_authorized_keys
ansible.posix.authorized_key:
# Required
key: "{{ key.key }}"
user: "{{ user.name }}"
# Optional
comment: "{{ key.comment | default(omit) }}"
exclusive: "{{ key.exclusive | default(omit) }}"
key_options: "{{ key.key_options | default(omit) }}"
path: "{{ key.path | default(omit) }}"
state: "{{ key.state | default(omit) }}"
Copy code
In the task file above, a handler named "Send mail notification about user creation" is configured, which looks like this:
# cat handlers/main.yml
---
- name: Send mail notification about user creation
delegate_to: localhost
when: linux_user_smtp_enabled
community.general.mail:
body: "{{ linux_user_smtp_body }}"
from: "{{ linux_user_smtp_mail_from }}"
host: "{{ linux_user_smtp_host }}"
port: "{{ linux_user_smtp_port }}"
subject: "{{ linux_user_smtp_subject }}"
to: "{{ linux_user_smtp_mail_to }}"
Copy code
As you can see, many variables are used. We define them in defaults and vars:
# cat defaults/main.yml
---
# Users
linux_user_default_authorized_keys: []
linux_users: []
# SMTP
linux_user_smtp_enabled: false
linux_user_smtp_body: "User mutations executed on {{ ansible_facts.hostname }}"
linux_user_smtp_host: ""
linux_user_smtp_mail_from: ""
linux_user_smtp_mail_to: ""
linux_user_smtp_port: ""
linux_user_smtp_subject: "User mutations"
# cat vars/main.yml
---
# Control
linux_user_allowed_os_families:
- "RedHat"
linux_user_allowed_target_hostgroups:
- "dev_servers"
Copy code
Rules of thumb in practice
1. Only use main.yml to include/import tasks.
If you use main.yml directly to define many tasks, a role quickly becomes chaotic and difficult to read. A best practice is therefore to use main.yml only for including or importing other tasks, as in the example.
2. Optimize tasks for efficiency
Tasks must be efficient. In our example, we use include_tasks instead of import_tasks. Importing is usually faster because Ansible preprocesses tasks, while including happens during runtime.
So why use include_tasks? Because import_tasks does not support a loop on the import statement itself. If you work with import_tasks, you have to do the loop in linux_user.yml. But because we also configure authorized_keys, you may have to do a double loop. Depending on the size of your runtime, this can actually be slower than a single runtime loop with include_tasks. In this scenario, include_tasks is therefore more efficient.
# cat tasks/main.yml
---
- name: Include Linux user tasks
tags:
- linux_user
- linux_user_authorized_keys
- linux_user_create
when:
- ansible_facts["os_family"] in linux_user_allowed_os_families
- linux_user_allowed_target_hostgroups | intersect(group_names)
ansible.builtin.import_tasks: linux_user.yml
# cat tasks/linux_user.yml
- name: Create user "{{ user.name }}"
become: true
loop: "{{ linux_users }}"
loop_control:
loop_var: user
label: "{{ user }}"
notify:
- "Send mail notification about user creation"
tags:
- linux_user
- linux_user_create
ansible.builtin.user:
{...}
Copy code
This preprocesses the user creation task and would indeed be faster if we only had that task. However, in our case, we also have a task to configure authorized SSH keys for all users. In that authorized key task, we would have to iterate again, because this does not happen with the import statement. Iterating twice over the users, pre-calculated, ultimately takes more time than doing this once during runtime (depending on the size of the runtime). Therefore, include_tasks is faster in this situation and our tasks are optimized for efficiency.
3. Completely separate data and code
Keep user data outside the role. Variables must be provided externally (via playbooks, inventory, group_vars/host_vars, or other methods). For example:
---
- name: Run role
hosts: localhost
vars:
linux_user_smtp_host: "smtp.sue.nl"
linux_user_smtp_mail_from: "source@sue.nl"
linux_user_smtp_mail_to: "recipient@sue.nl"
linux_user_smtp_port: "25"
linux_users:
- name: user1
state: absent
- name: user2
roles:
- role: sue.generic.linux_user
Copy code
4. Run tasks on an include basis
Automation is powerful, but it can also cause damage if you accidentally target the wrong machines. In the example, we only include user tasks if certain conditions are met:
when:
- ansible_facts["os_family"] in linux_user_allowed_os_families
- linux_user_allowed_target_hostgroups | intersect(group_names)
Copy code
These conditionals ensure that tasks are only performed on predefined permitted OS families and specific permitted host groups of servers. By conditionally including tasks where necessary, rather than performing them on all machines and excluding certain machines, we prevent users from being accidentally created on systems to which they do not need access. This increases security.
5. Roles must handle check mode correctly
Check mode handling must be implemented per task, and check mode must fully approximate the actual run: catching the same errors and producing the same output. To achieve this, you must include the check mode behavior for each individual task. Sometimes this means, for example, that you must disable check_mode for a task.
To illustrate this: suppose we expect a local file to exist with our users' passwords and we want to read this file with the shell module. There are, of course, better ways to do this, and there are also better ways to store secrets, such as a dynamically rotating secret as described in our blog post "Creating Automatically Rotating Secrets Using Terraform." But by using the shell module here, we clearly show why check mode should always approximate the actual run.
There is one caveat to this scenario: the file containing the password does not actually exist. We could retrieve the password with something like:
- name: Retrieve user password
changed_when: false
check_mode: false
failed_when: user_password_result.rc != 0
register: user_password_result
vars:
password_file: "/tmp/password.txt"
ansible.builtin.shell: |
if [ -f "{{ password_file }}" ]; then
cat "{{ password_file }}"
else
echo "Password file not found: {{ password_file }}"
exit 1
fi
Copy code
Note that check_mode is set to false here, which means that this task will always be executed. Why? Because the shell module does not run in check mode, and that can cause a difference between check mode and the actual run. If the password file does not exist and we are running in check mode, the task will not generate an error (because the script is not executed and therefore does not fail due to the file being missing). When running without check mode, the task will fail because the file does not exist. If we do not explicitly set check_mode to false in this task, check mode would not accurately reflect the actual run.
6. Make tasks idempotent
All tasks must be able to run more than once and have the same result if nothing changes. We call a task with this property idempotent. Usually, you don't have to worry about this, because most modules already handle idempotency. In edge cases—for example, when using the command, shell, or lineinfile modules, or when writing your own modules—it is important to explicitly include idempotency. In the case of our role, for example, we want to add an env var to the .bashrc file of the newly created users:
- name: Ensure custom environment variable is set in .bashrc for user "{{ user.name }}"
become: true
tags:
- linux_user
- linux_user_bashrc
ansible.builtin.lineinfile:
path: "{{ user.home | default('/home/' + user.name) }}/.bashrc"
regexp: '^export MY_CUSTOM_VAR='
line: 'export MY_CUSTOM_VAR="hello-world"'
state: present
create: true
owner: "{{ user.name }}"
group: "{{ user.group | default(user.name) }}"
mode: '0644'
Copy code
Note that by adding regexp, we write the lineinfile task in an idempotent way: the line is never added more than once. A task such as the one below is not idempotent, because it continues to add the env var to the users' .bashrc file with every run.
name: Always append line to .bashrc for user "{{ user.name }}" (NOT idempotent)
become: true
tags:
- linux_user
- linux_user_bashrc
ansible.builtin.lineinfile:
path: "{{ user.home | default('/home/' + user.name) }}/.bashrc"
line: 'export MY_CUSTOM_VAR="hello-world"'
insertafter: EOF
state: present
create: yes
owner: "{{ user.name }}"
group: "{{ user.group | default(user.name) }}"
mode: '0644'
Copy code
7. Variables must be correctly named and hierarchized.
All roles must be correctly namespaced to prevent unexpected overrides and conflicting variables, especially in larger environments where many variables are present. This is usually done by prefixing each variable of a role with the name of the role, as we have done in our role (linux_user).
In addition to namespacing, variables must also be correctly hierarchical. All variables, with the exception of linux_user_allowed_os_family, are defined in the role's defaults. This is the lowest level in the variable hierarchy, making them easy to override. For variables within roles that are important to keep at a fixed value—such as our linux_user_allowed_os_family variable—vars should be used instead of defaults. This makes it more difficult to accidentally override these variables.
For a complete overview of variable precedence, please refer to the official Ansible documentation on variable precedence.
8. Implement and document tags
Tags enable targeted runs. In the example: linux_user_create for user creation only and linux_user_authorized_keys for SSH keys only. Always document tags so that their purpose is clear.
9. Validate roles
Always validate your role with linting and testing. Ansible-lint is the standard for linting. There are several options for testing, with Ansible Molecule being a well-known choice. (A full explanation is beyond the scope of this blog.)
10. Document roles
Documentation is often overlooked, but it is just as important as the role itself. A clear README with requirements, variables, sample playbook, and tags makes reuse much easier. Below is an example README as shown in the blog.
# cat README.md
# linux_user
Ansible role for creating Linux users
# Requirements
This role requires the following collections to be present:
- ansible.builtin
- ansible.posix
- community.general
# Role Variables
## Defaults
User related defaults:
`linux_user_default_authorized_keys`: []
`linux_users`: []
Where users can be configured as follows, where the values come from the [ansible user module](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/user_module.html):
`linux_users.user.name` (required)
`linux_users.user.append`
`linux_users.user.comment`
`linux_users.user.create_home`
`linux_users.user.force`
`linux_users.user.group`
`linux_users.user.groups`
`linux_users.user.hidden`
`linux_users.user.home`
`linux_users.user.password`
`linux_users.user.password_expire_account_disable`
`linux_users.user.password_expire_max`
`linux_users.user.password_expire_min`
`linux_users.user.password_expire_warn`
`linux_users.user.password_lock`
`linux_users.user.remove`
`linux_users.user.shell`
`linux_users.user.state`
`linux_users.user.system`
`linux_users.user.uid`
`linux_users.user.umask`
`linux_users.user.update_password`
Optionally, SMTP notifications can be configured:
`linux_user_smtp_enabled`: false
`linux_user_smtp_body`: "User mutations executed on {{ ansible_facts.hostname }}"
`linux_user_smtp_host`: ""
`linux_user_smtp_mail_from`: ""
`linux_user_smtp_mail_to`: ""
`linux_user_smtp_port`: ""
`linux_user_smtp_subject`: "User mutations"
## Variables
`linux_user_allowed_os_families`: ["RedHat"]
# Example Playbook
```yaml
---
- name: Run linux_user role
hosts: localhost
vars:
linux_user_smtp_body: "A user mutation has been done in the sue.nl domain!"
linux_user_smtp_host: "smtp.sue.nl"
linux_user_smtp_mail_from: "source@sue.nl"
linux_user_smtp_mail_to: "recipient@sue.nl"
linux_user_smtp_port: "25"
linux_user_smtp_subject: "User mutations in sue.nl"
linux_users:
- name: user_1
- name: user_2
state: absent
- name: svc_user_1
create_home: false
shell: "/bin/bash"
uid: 15001
- name: svc_user_2
authorized_keys:
- "key1"
- "key2"
roles:
- role: sue.generic.linux_user
```
# Tags
This role supports a multiple of tags:
- `linux_user`: runs all plays
- `linux_user_create`: only create users
- `linux_user_authorized_keys`: only configure authorized keys for a user
# Supported
Tested and working on the following operating systems:
- AlmaLinux 9.5 (Teal Serval)
# License
MIT
# Author Information
- Nathan van Buuren (Sue B.V.)
Copy code
Conclusion
And those were the rules of thumb. Again, these are not hard and fast rules, but guidelines to help you write more consistent, robust, and reusable Ansible roles.
Ansible is a powerful automation tool with many applications. Would you like to learn more about writing your own Ansible modules, setting up your Ansible configuration correctly, or when to use Ansible and when not to? Feel free to contact us, we are happy to help.