Service Level Objective-Aware Infrastructure as Code Generation: Bridging AI and Cloud Deployment Constraints

In collaboration with:

Back to overview

Download research

* required

In collaboration with Utrecht University

This research explores the automation of Infrastructure as Code (IaC) generation in cloud-native environments and examines how Large Language Models, combined with Service Level Objectives (SLOs), can overcome existing limitations in cloud infrastructure management. Although more than 90% of enterprises use cloud computing, complex design requirements and a shortage of expertise remain a major challenge. This study analyzes how AI-driven IaC automation can enable dynamic cloud deployments while maintaining performance and reliability requirements.

Research question and methodology

The central research question is: "How can an IaC automation framework for dynamic cloud deployment be designed that ensures that generated IaC code complies with an application's SLO constraints?"
This question is particularly relevant because designing cloud infrastructure requires in-depth expertise, must comply with deployment constraints defined by SLOs, and, despite the emergence of IaC tools, remains a bottleneck for cloud adoption. The research uses a mixed-methods approach, combining literature review, theoretical framework design, and empirical evaluation. The evaluation was conducted through controlled experiments with load testing and metric collection in Kubernetes-based deployments.

Research design and techniques

The study introduces a dual-track framework that utilizes both Large Language Models and statistical prediction methods. The approach uses GPT-4o to generate Terraform code and iteratively modify it based on specified SLO constraints. In addition, statistical models—primarily polynomial regression—are applied to predict whether infrastructure will violate SLOs before actual deployment. The framework supports two modes:

Manual SLO definition, where developers specify CPU and memory constraints.
Metric-based SLO creation, whereby the system derives SLOs from baseline performance data using polynomial regression models.

The evaluation utilizes the Google Microservices Demo application, load generation via Locust benchmarks, and monitoring with Prometheus and Grafana. The tests were conducted over extended periods with varying user loads.

Results: the promise of automation versus constraint considerations

The findings paint a nuanced picture of LLM-driven IaC automation with clear practical limitations. In the metric-based approach—where SLOs are predicted based on observed baseline metrics—the framework achieved up to 79% of the target throughput (476 RPS versus a target of 600 RPS) with no SLO violations after three LLM-driven code adjustments. This significantly outperformed the manual SLO approach, which achieved only 22% of the target throughput (131 RPS against a target of 600 RPS). However, this improved throughput came with measurable drawbacks, such as higher average response times and an increased failure rate compared to the baseline infrastructure.

A crucial finding concerns the behavior of LLMs: the quality of the generated IaC code is highly dependent on prompt design and information structuring. When specific service-level SLOs were provided, the model tended to overemphasize these services and neglect others. In addition, the system appeared to struggle with multi-service reasoning when infrastructure components failed. Furthermore, the LLM's native resource predictions were too simplistic: resource limits were added up without taking actual usage into account, leading to underprovisioned or inconsistently tuned infrastructure.

Critical implications and limitations

The research shows that effective SLO-aware IaC automation requires more than just LLM-generated code. Metric-based SLO derivation proved superior to manual definitions because it is based on actual application behavior rather than expert estimates, reducing both violations and hallucinated infrastructure assumptions. At the same time, fundamental limitations were exposed: LLMs are non-deterministic, meaning identical inputs can lead to different outputs, and limitations in the context window make it impractical to include large IaC files directly in prompts without careful abstraction.

The main conclusion concerns the interaction between constraint satisfaction and performance optimization. Tighter SLO constraints prevent overload, but inherently degrade performance metrics such as throughput and latency compared to unlimited baseline systems. The research concludes that successful IaC automation must strike a balance between three competing objectives: minimizing hallucinations through careful prompt engineering, basing SLOs on empirical data rather than assumptions, and iteratively verifying compliance through load testing rather than relying solely on predictive models.

Download

Service Level Objective-Aware Infrastructure as Code Generation: Bridging AI and Cloud Deployment Constraints

Download research

In collaboration with Utrecht University

Research question and methodology

Research design and techniques

Results: the promise of automation versus constraint considerations

Critical implications and limitations

Related research

Adaptive Vertical Scaling with Granular Degradation Prediction & Contextualized Multi-Armed Bandits

Application Profiling for Automated VM to Container Migrations

Collaborative Edge-Cloud Computing for Efficient Resource Utilization