Introduction

Tools like CFEngine, Puppet, Chef and Ansible have revolutionized infrastructure automation by providing a structured framework for organizing and sharing configuration management code, with Ansible being one of the more recent tools. Ansible is widely used for automating the configuration of your IT environment, and it is highly flexible. While the creators of Ansible position it as a tool that ‘offers open-source automation that is simple, flexible, and powerful’, it might not be entirely clear to anyone what makes for robust ansible code, and things might not be so simple as they state, especially in more complex environments and setups.

Within Ansible, roles are a concept in which code is encapsulated. Within roles there are tasks, variables, handlers, and files, all shipped into a modular, reusable unit that, if written correctly, can be easily shared across projects and teams. Whether you’re managing a handful of servers or orchestrating complex (multi-)cloud environments, roles enable you to write cleaner, more maintainable code. Writing roles has many challenges, however, and it can be quite difficult to conceive an Ansible role that one would consider to be robust and ‘complete’. In this blog post, we would like to introduce some rules of thumbs on what a robust ansible role consists of, which will hopefully enable you to write better ansible roles.

Some Rules of Thumb

The rules of thumb proposed here are as follows:

The main.yml of a role should only be used for including and importing tasks
All tasks should be optimized for efficiency
Data and code should be fully separated
Tasks should run on an include basis
Roles should handle check mode correctly
All tasks should be idempotent
Variables should be properly namespaced and hierarchized
Tags should be implemented and documented
Roles should be validated
Roles should be documented

Note that these are rules of thumb only. Depending on the usage and requirements of the role, some rules should be left out or added. Generally speaking, however, these will be applicable. Below, we will expand on all of these rules with illustrations and context to show you what we mean by them.

Illustrations with a role: linux_user

To illustrate the rules of thumb given above, we will create a role that can create Linux users. The creation is done as such:

# Creation of a basic role structure using ansible-galaxy
ansible-galaxy role init linux_user

# Navigate into the new role
cd linux_user/

# Create a supplementary tasks file
touch tasks/linux_user.yml

Copy code

Our directory structure now looks like this:

# ls ./*
./README.md

./defaults:
main.yml

./files:

./handlers:
main.yml

./meta:
main.yml

./tasks:
linux_user.yml  main.yml

./templates:

./tests:
inventory  test.yml

./vars:
main.yml

Copy code

This provides us a basic structure to work with. In the main.yml, we will be including the tasks for all user related tasks:

# cat tasks/main.yml
---
- name: Include Linux user tasks
  loop: "{{ linux_users }}"
  loop_control:
    loop_var: user
    label: "{{ user }}"
  tags:
    - linux_user
    - linux_user_authorized_keys
    - linux_user_create
  when:
    - ansible_facts["os_family"] in linux_user_allowed_os_families
    - linux_user_allowed_target_hostgroups | intersect(group_names)
  ansible.builtin.include_tasks: linux_user.yml

Copy code

The linux_user.yml file included above contains the actual user creation tasks.

# cat tasks/linux_user.yml
---
- name: Create user "{{ user.name }}"
  become: true
  notify:
    - "Send mail notification about user creation"
  tags:
    - linux_user
    - linux_user_create
  ansible.builtin.user:
    # Required
    name: "{{ user.name }}"

    # Optional
    append: "{{ user.append | default(omit) }}"
    comment: "{{ user.comment | default(omit) }}"
    create_home: "{{ user.create_home | default(omit) }}"
    force: "{{ user.force | default(omit) }}"
    group: "{{ user.group | default(omit) }}"
    groups: "{{ user.groups | default(omit) }}"
    hidden: "{{ user.hidden | default(omit) }}"
    home: "{{ user.home | default(omit) }}"
    password: "{{ user.password | default(omit) }}"
    password_expire_account_disable: "{{ user.password_expire_account_disable | default(omit) }}"
    password_expire_max: "{{ user.password_expire_max | default(omit) }}"
    password_expire_min: "{{ user.password_expire_min | default(omit) }}"
    password_expire_warn: "{{ user.password_expire_warn | default(omit) }}"
    password_lock: "{{ user.password_lock | default(omit) }}"
    remove: "{{ user.remove | default(omit) }}"
    shell: "{{ user.shell | default(omit) }}"
    state: "{{ user.state | default(omit) }}"
    system: "{{ user.system | default(omit) }}"
    uid: "{{ user.uid | default(omit) }}"
    umask: "{{ user.umask | default(omit) }}"
    update_password: "{{ user.update_password | default(omit) }}"

- name: Set authorized keys for user "{{ user.name }}"
  become: true
  loop: "{{ user.authorized_keys | default(linux_user_default_authorized_keys) }}"
  loop_control:
    loop_var: key
    label: "{{ key.key }}"
  tags:
    - linux_user
    - linux_user_authorized_keys
  ansible.posix.authorized_key:
    # Required
    key: "{{ key.key }}"
    user: "{{ user.name }}"

    # Optional
    comment: "{{ key.comment | default(omit) }}"
    exclusive: "{{ key.exclusive | default(omit) }}"
    key_options: "{{ key.key_options | default(omit) }}"
    path: "{{ key.path | default(omit) }}"
    state: "{{ key.state | default(omit) }}"

Copy code

In the above tasks file, a handler is configured called ‘Send mail notification about user creation’, which looks like this:

# cat handlers/main.yml
---
- name: Send mail notification about user creation
  delegate_to: localhost
  when: linux_user_smtp_enabled
  community.general.mail:
    body: "{{ linux_user_smtp_body }}"
    from: "{{ linux_user_smtp_mail_from }}"
    host: "{{ linux_user_smtp_host }}"
    port: "{{ linux_user_smtp_port }}"
    subject: "{{ linux_user_smtp_subject }}"
    to: "{{ linux_user_smtp_mail_to }}"

Copy code

As you can see, many variables are used in the above files. These are configured in the default and vars files of the role:

# cat defaults/main.yml
---
# Users
linux_user_default_authorized_keys: []
linux_users: []

# SMTP
linux_user_smtp_enabled: false
linux_user_smtp_body: "User mutations executed on {{ ansible_facts.hostname }}"
linux_user_smtp_host: ""
linux_user_smtp_mail_from: ""
linux_user_smtp_mail_to: ""
linux_user_smtp_port: ""
linux_user_smtp_subject: "User mutations"

# cat vars/main.yml
---
# Control
linux_user_allowed_os_families:
  - "RedHat"
linux_user_allowed_target_hostgroups:
  - "dev_servers"

Copy code

Rules of Thumb in Practice

1. The main.yml of a role should only be used for including and importing tasks

When using the main.yml directly for defining tasks, especially with roles that contain many tasks, the role becomes very unorganized, harder to read, and harder to understand. It is therefore a good practice to use the main.yml only for including tasks, as we do in the role given above.

2. All tasks should be optimized for efficiency

All tasks, for obvious reasons, should be optimized for efficiency. As stated before, the main.yml should only be used for importing or including tasks.

In our example role, we include tasks from linux_user.yml, as opposed to importing them. The latter, importing, is actually faster. The difference in speed lies in the way the role is processed by Ansible. When including tasks, the statement is calculated in runtime, whereas imported tasks are precalculated before tasks run. The latter process is faster most of the time, but it heavily depends on the size of the runtime.

Why then does our main.yml use include instead of import? This is because importing doesn’t support looping over variables on the import statement itself. So, if we wanted to speed up our user creation tasks, we could use import_tasks in the main.yml, and iterate over the users in linux_user.yml, something like the following:

# cat tasks/main.yml
---
- name: Include Linux user tasks
  tags:
    - linux_user
    - linux_user_authorized_keys
    - linux_user_create
  when:
    - ansible_facts["os_family"] in linux_user_allowed_os_families
    - linux_user_allowed_target_hostgroups | intersect(group_names)
  ansible.builtin.import_tasks: linux_user.yml

# cat tasks/linux_user.yml
- name: Create user "{{ user.name }}"
  become: true
  loop: "{{ linux_users }}"
  loop_control:
    loop_var: user
    label: "{{ user }}"
  notify:
    - "Send mail notification about user creation"
  tags:
    - linux_user
    - linux_user_create
  ansible.builtin.user:
{...}

Copy code

This preprocesses the user creation task and would be faster indeed if we only had the user creation task. But in our case, we also have a task to configure authorized SSH keys for all users. In this authorized key task, we would now need to iterate once more, as it is not being done on the import statement. Iterating over the users twice, precalculated, will actually cost more time than doing this once during runtime (depending on the size of the runtime), and thus include_tasks is faster in this situation, and our tasks are optimized for efficiency.

3. Data and code should be fully seperated

To keep our role fully modular and scalable, all user data should be left out of the role itself. Depending on your Ansible setup, variables should be passed to the role externally in some way. Here is an example of passing vars straight to the role through a playbook that runs the role. Note that I published my role into an Ansible Galaxy collection called ‘sue.generic’:

---
- name: Run role
  hosts: localhost
  vars:
    linux_user_smtp_host: "smtp.sue.nl"
    linux_user_smtp_mail_from: "source@sue.nl"
    linux_user_smtp_mail_to: "recipient@sue.nl"
    linux_user_smtp_port: "25"
    linux_users:
      - name: user1
        state: absent
      - name: user2
  roles:
    - role: sue.generic.linux_user

Copy code

4. Tasks should run on an include basis

An automation tool like Ansible is very effective in running certain actions, like pushing some configuration files, yet because of that also possibly very effective at polluting or destroying your environment. In some cases, when running the wrong tasks on a machine or set of machines, you can do serious damage to your IT infrastructure. In our example role, we set the following conditionals on when the linux user tasks get included:

  when:
    - ansible_facts["os_family"] in linux_user_allowed_os_families
    - linux_user_allowed_target_hostgroups | intersect(group_names)

Copy code

These conditionals limit the tasks to only run against some predefined allowed OS families and some predefined allowed hostgroups of servers. By conditionally including tasks where needed as opposed to running tasks against all machines and excluding certain machines, we prevent accidentally pushing users to systems they don’t need access to, enhancing security.

5. Roles should handle check mode correctly

Check mode handling should be implemented for each task and check mode should fully resemble the actual run, catch the same errors and output the same output. To achieve this, one will need to consider the check mode behaviour for each and every task. Sometimes, for example, check_mode should be disabled for a task.

To illustrate this, let’s say that we expect that there exists a local file containing our users’ password and that we want to read this file using the shell module. There are of course better ways to do this and there’s also better ways of storing secrets, like having a dynamically rotating secret as described in our blog post called ‘Creating Automatically Rotating Secrets Using Terraform’, but using the shell module here illustrates the point that check_mode should always resemble the actual run.

There is a caveat to this scenario: the file containing the password doesn’t actually exist. We could retrieve the password using something in the lines of:

- name: Retrieve user password
  changed_when: false
  check_mode: false
  failed_when: user_password_result.rc != 0
  register: user_password_result
  vars:
    password_file: "/tmp/password.txt"
  ansible.builtin.shell: |
    if [ -f "{{ password_file }}" ]; then
      cat "{{ password_file }}"
    else
      echo "Password file not found: {{ password_file }}"
      exit 1
    fi

Copy code

Note that check_mode is set to false here, causing this task to always run. Why? Because the shell module doesn’t run in check mode, and that can cause an inequality between our check mode and actual run. If the password file doesn’t exist and we run in check mode, the task would not generate an error (as the script doesn’t run and thus doesn’t fail because the file is missing). When running without check mode, it would fail as the file doesn’t exist. So in the case we don’t explicitly set check_mode to false in this task, check mode would not properly resemble the actual run.

6. All tasks should be idempotent

All tasks should be able to run more than once and have the same outcome if nothing changes. A task that has this property is said to be idempotent. Most of the time, you don’t need to worry about this as most modules will have idempotency handling in them. In edge cases, however, like when using the command, shell or lineinfile modules or when writing your own modules, having idempotency in mind is important. In the case of our role, we might want to add an env var to the newly created users’ .bashrc file:

- name: Ensure custom environment variable is set in .bashrc for user "{{ user.name }}"
  become: true
  tags:
    - linux_user
    - linux_user_bashrc
  ansible.builtin.lineinfile:
    path: "{{ user.home | default('/home/' + user.name) }}/.bashrc"
    regexp: '^export MY_CUSTOM_VAR='
    line: 'export MY_CUSTOM_VAR="hello-world"'
    state: present
    create: true
    owner: "{{ user.name }}"
    group: "{{ user.group | default(user.name) }}"
    mode: '0644'

Copy code

Note that by adding regexp, we write the lineinfile task in an omnipotent way, i.e. the line will never be added more than once, whereas doing something like the task below is not omnipotent as it will keep on adding the env var to the users’ .bashrc file in every run:

 name: Always append line to .bashrc for user "{{ user.name }}" (NOT idempotent)
  become: true
  tags:
    - linux_user
    - linux_user_bashrc
  ansible.builtin.lineinfile:
    path: "{{ user.home | default('/home/' + user.name) }}/.bashrc"
    line: 'export MY_CUSTOM_VAR="hello-world"'
    insertafter: EOF
    state: present
    create: yes
    owner: "{{ user.name }}"
    group: "{{ user.group | default(user.name) }}"
    mode: '0644'

Copy code

7. Variables should be properly namespaced and hierarchized

All roles should be properly namespaced to avoid unforeseen variable overrides and conflicting variables, especially in bigger environments where many variables are present. This is usually done by prefixing every var for the role with the role name, like we have done in our role (`linux_user`).

Next to namespacing variables, the variables need to be properly hierarchized. All of the variables, except for linux_user_allowed_os_family, are defined in the role’s defaults, which is the lowest in the variable hierarchy, making it very easy to override them. As for variables within roles: if a role contains variables that are more important to be set at a certain value, like our linux_user_allowed_os_family var, vars should be used instead of defaults, making it harder to override them. For the exact list of variable precedence, refer to the official ansible documentation on variable precedence.

8. Tags should be implemented and documented

All roles should support tagging for finegrained, targeted runs. In our role for instance, we can run only the user creation task by using the linux_user_create, or only run the authorized_keys task by using the linux_user_authorized_keys tag. Each tag should be documented such that their purpose is clear.

9. Roles should be validated

When finalizing your new role, it should always be validated by at least linting and testing it. For linting, ansible-lint is most commonly used. For testing, there’s multiple options, a popular one being Ansible Molecule. To cover linting and testing in this post would be way too extensive, so we’re just mentioning it and not including any examples. If you would like to see examples of linting and testing, however, make sure to reach out to us!

10. Roles should be documented

Our new role isn’t complete without documentation on how to use it. Documentation is often neglected, but no less important than properly writing the role itself. There are tools to auto-generate the README.md for an ansible role, but we provide a custom written one below:

# cat README.md
# linux_user
Ansible role for creating Linux users

# Requirements
This role requires the following collections to be present:
- ansible.builtin
- ansible.posix
- community.general

# Role Variables
## Defaults
User related defaults:
`linux_user_default_authorized_keys`: []
`linux_users`: []

Where users can be configured as follows, where the values come from the [ansible user module](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/user_module.html):
`linux_users.user.name` (required)
`linux_users.user.append`
`linux_users.user.comment`
`linux_users.user.create_home`
`linux_users.user.force`
`linux_users.user.group`
`linux_users.user.groups`
`linux_users.user.hidden`
`linux_users.user.home`
`linux_users.user.password`
`linux_users.user.password_expire_account_disable`
`linux_users.user.password_expire_max`
`linux_users.user.password_expire_min`
`linux_users.user.password_expire_warn`
`linux_users.user.password_lock`
`linux_users.user.remove`
`linux_users.user.shell`
`linux_users.user.state`
`linux_users.user.system`
`linux_users.user.uid`
`linux_users.user.umask`
`linux_users.user.update_password`

Optionally, SMTP notifications can be configured:
`linux_user_smtp_enabled`: false
`linux_user_smtp_body`: "User mutations executed on {{ ansible_facts.hostname }}"
`linux_user_smtp_host`: ""
`linux_user_smtp_mail_from`: ""
`linux_user_smtp_mail_to`: ""
`linux_user_smtp_port`: ""
`linux_user_smtp_subject`: "User mutations"

## Variables
`linux_user_allowed_os_families`: ["RedHat"]

# Example Playbook
```yaml
---
- name: Run linux_user role
  hosts: localhost
  vars:
    linux_user_smtp_body: "A user mutation has been done in the sue.nl domain!"
    linux_user_smtp_host: "smtp.sue.nl"
    linux_user_smtp_mail_from: "source@sue.nl"
    linux_user_smtp_mail_to: "recipient@sue.nl"
    linux_user_smtp_port: "25"
    linux_user_smtp_subject: "User mutations in sue.nl"
    linux_users:
      - name: user_1
      - name: user_2
        state: absent
      - name: svc_user_1
        create_home: false
        shell: "/bin/bash"
        uid: 15001
      - name: svc_user_2
        authorized_keys:
          - "key1"
          - "key2"
  roles:
    - role: sue.generic.linux_user
```

# Tags
This role supports a multiple of tags:
- `linux_user`: runs all plays
- `linux_user_create`: only create users
- `linux_user_authorized_keys`: only configure authorized keys for a user

# Supported
Tested and working on the following operating systems:
- AlmaLinux 9.5 (Teal Serval)

# License
MIT

# Author Information
- Nathan van Buuren (Sue B.V.)

Copy code

Conclusion

And that concludes the rules of thumbs. Again, these are by no means fixed rules, but more like guidelines that will hopefully help you write better, more consistent and more useful ansible roles.

Ansible is an incredibly powerful automation tool that can be used for many purposes. Are you interested in things like how you can write your own ansible modules, how to manage your ansible setup properly, insights on when or when not to use ansible or any other topic that got covered in this post? Make sure to reach out to us!

The Art of Writing Ansible Roles

Introduction

Some Rules of Thumb

Illustrations with a role: linux_user

Rules of Thumb in Practice

1. The main.yml of a role should only be used for including and importing tasks

2. All tasks should be optimized for efficiency

3. Data and code should be fully seperated

4. Tasks should run on an include basis

5. Roles should handle check mode correctly

6. All tasks should be idempotent

7. Variables should be properly namespaced and hierarchized

8. Tags should be implemented and documented

9. Roles should be validated

10. Roles should be documented

Conclusion

Ready to improve your Ansible configuration?

Let's talk!

Ready to improve your Ansible configuration?

From Strategy to Execution

Industry Experts

Knowledge that Drives Innovation

Learn, Grow, Innovate

Empowering Innovation Since 1997

The Art of Writing Ansible Roles

Introduction

Some Rules of Thumb

Illustrations with a role: linux_user

Rules of Thumb in Practice

1. The main.yml of a role should only be used for including and importing tasks

2. All tasks should be optimized for efficiency

3. Data and code should be fully seperated

4. Tasks should run on an include basis

5. Roles should handle check mode correctly

6. All tasks should be idempotent

7. Variables should be properly namespaced and hierarchized

8. Tags should be implemented and documented

9. Roles should be validated

10. Roles should be documented

Conclusion

Ready to improve your Ansible configuration?

Let's talk!

Ready to improve your Ansible configuration?

Related articles

Why developers may resist adopting an Internal Developer Platform: the hidden challenges of migration

Solving tech challenges without disruption: Exploring rehosting, replatforming, and refactoring

Why should I modernize? A quick guide on how to stay up to date