Research question and methodology
The central research question is: "How can an IaC automation framework for dynamic cloud deployment be designed that ensures that generated IaC code complies with an application's SLO constraints?"
This question is particularly relevant because designing cloud infrastructure requires in-depth expertise, must comply with deployment constraints defined by SLOs, and, despite the emergence of IaC tools, remains a bottleneck for cloud adoption. The research uses a mixed-methods approach, combining literature review, theoretical framework design, and empirical evaluation. The evaluation was conducted through controlled experiments with load testing and metric collection in Kubernetes-based deployments.
Research design and techniques
The study introduces a dual-track framework that utilizes both Large Language Models and statistical prediction methods. The approach uses GPT-4o to generate Terraform code and iteratively modify it based on specified SLO constraints. In addition, statistical models—primarily polynomial regression—are applied to predict whether infrastructure will violate SLOs before actual deployment. The framework supports two modes:
- Manual SLO definition, where developers specify CPU and memory constraints.
- Metric-based SLO creation, whereby the system derives SLOs from baseline performance data using polynomial regression models.
The evaluation utilizes the Google Microservices Demo application, load generation via Locust benchmarks, and monitoring with Prometheus and Grafana. The tests were conducted over extended periods with varying user loads.
Results: the promise of automation versus constraint considerations
The findings paint a nuanced picture of LLM-driven IaC automation with clear practical limitations. In the metric-based approach—where SLOs are predicted based on observed baseline metrics—the framework achieved up to 79% of the target throughput (476 RPS versus a target of 600 RPS) with no SLO violations after three LLM-driven code adjustments. This significantly outperformed the manual SLO approach, which achieved only 22% of the target throughput (131 RPS against a target of 600 RPS). However, this improved throughput came with measurable drawbacks, such as higher average response times and an increased failure rate compared to the baseline infrastructure.
A crucial finding concerns the behavior of LLMs: the quality of the generated IaC code is highly dependent on prompt design and information structuring. When specific service-level SLOs were provided, the model tended to overemphasize these services and neglect others. In addition, the system appeared to struggle with multi-service reasoning when infrastructure components failed. Furthermore, the LLM's native resource predictions were too simplistic: resource limits were added up without taking actual usage into account, leading to underprovisioned or inconsistently tuned infrastructure.
Critical implications and limitations
The research shows that effective SLO-aware IaC automation requires more than just LLM-generated code. Metric-based SLO derivation proved superior to manual definitions because it is based on actual application behavior rather than expert estimates, reducing both violations and hallucinated infrastructure assumptions. At the same time, fundamental limitations were exposed: LLMs are non-deterministic, meaning identical inputs can lead to different outputs, and limitations in the context window make it impractical to include large IaC files directly in prompts without careful abstraction.
The main conclusion concerns the interaction between constraint satisfaction and performance optimization. Tighter SLO constraints prevent overload, but inherently degrade performance metrics such as throughput and latency compared to unlimited baseline systems. The research concludes that successful IaC automation must strike a balance between three competing objectives: minimizing hallucinations through careful prompt engineering, basing SLOs on empirical data rather than assumptions, and iteratively verifying compliance through load testing rather than relying solely on predictive models.