Ah, software infrastructure and service management… an inexhaustible source of spiritual wisdom and enrichment. It can teach you so much about life, about love… about humanity. One of the many parallels with infrastructure work is that it all seems simple at first; the monsters and challenges lurking beneath the surface only reveal themselves once you’ve grown old and weary (which, fortunately, happens rather quickly in the software world: if you work on the same thing for more than a year, you’re practically a fossil already).
Take a moment to consider a common career path in this field. You join a startup as a “DevOps specialist,” start tinkering with a home lab, or set up a website for the local chess club. At first, everything seems wonderfully simple: you plug an old laptop into a forgotten Ethernet port or rent a VPS for a few cents a day, and download the most promising Google result for “good free server operating system please help no Chinese scams no Microsoft.” You install a few packages, explain to your coworkers how to access the system, and wait to see what happens.
Eventually, something goes wrong: the server runs out of memory, a rodent chews through the motherboard, or your server has the misfortune of encountering a “regular user” (see Figure 1). At that moment, you realize that your server and its users might need monitoring after all. You spend a whole night browsing the internet, and the next day you’ve pieced together a few components into a beautiful observability stack. Maybe it’s a SystemD drop-in, maybe ElasticSearch, Grafana, and Metricbeat. Maybe you’ve installed Prometheus and a few exporters. Either way, everything seems fine… until one day…

Figure 1: A typical user who completely ignores your carefully configured VPS and brings it down.

Figure 2: Separation of configuration and runtime environments through the use of tools such as Ansible.
Version Control
When it comes to version control, looking at your infrastructure through this lens opens up a number of new options. In combination with Ansible, it’s quite common to version your configuration using Git and to use GitOps (for example, with GitLab runners) to deploy your systems based on that configuration.
Because you've separated the configuration from the underlying engine, you can version only the underlying system code—which actually needs to be managed by engineers—in Git, while the system state itself is stored elsewhere.
To some, this may seem pointless or unnecessarily quirky. And yet, even Kubernetes uses an etcd database to store and manage the configuration of the services it is responsible for. The system status and objects are stored in a database, allowing the system to utilize replication and scale far beyond what a set of GitLab runners can provide, while at the same time allowing the actual Kubernetes source code to be modified and versioned separately.
If your goal is to set up a few services, the “usual way” is perfectly fine. But if you’re working on enabling your company or customers to set up systems independently—similar to Kubernetes or Azure—then the extra flexibility of storing configuration “your own way” can be well worth the extra effort.
Pre-flight Configuration Validation
As for pre-flight configuration validation: when you look at infrastructure from this perspective, another important area for design decisions emerges.
In traditional Ansible or Terraform workflows, validation is often performed implicitly during execution itself: you run a plan, execute a playbook, and discover during the deployment phase whether something is correct or not. This works fine on a smaller scale, but becomes fragile as soon as the environment becomes more complex and changes are implemented more quickly and by more teams.
By explicitly separating configuration, deployment, and runtime, you can move validation earlier in the chain. Instead of waiting until a system is actually deployed, you can perform a “pre-flight check” on the projected configuration: a controlled evaluation to determine whether the desired state is consistent, complete, and executable in the first place.
This can range from simple schema validation to semantic checks that verify that resources do not conflict, dependencies are correct, and policies are followed before even a single change is deployed to production. On a larger scale, this becomes essential: errors that only become apparent during deployment are not only costly but can also propagate horizontally through interconnected systems.
In modern platforms, this principle is reflected in various forms. Kubernetes, for example, performs declarative reconciliation, in which the desired state is first validated and then continuously compared with the actual state in etcd. Other systems build additional layers on top of this, such as policy engines or admission controllers, which block or modify configurations before they are even applied.
The result of such a pre-flight layer is not only reliability, but also a shift in responsibility: from “we hope the deployment works” to “we know in advance that this change is consistent with the system model.”
When you connect your infrastructure to a public API, an internal developer portal, or a team responsible for service delivery, the likelihood of errors finding their way into your backend increases significantly. Furthermore, once you’re locked into a single specific way of storing, managing, and representing your configuration, it becomes difficult to validate that configuration for compliance, security, and correctness.
However, if you allow yourself to decide how configuration works for your platform, you can use a wide range of tools—such as OpenAPI specifications—or even your own custom code to validate, modify, or reject configurations based on what you actually want to support within your platform.
