Research Question and Methodology
The primary research question addressed is: “How can an IaC automation framework for dynamic cloud deployment be designed to ensure that generated IaC code adheres to SLO constraints of an application?” This question is particularly critical given that designing cloud infrastructure requires extensive expertise, must meet deployment constraints defined by Service Level Objectives (SLOs), and remains a bottleneck in cloud adoption despite the rise of IaC tools. The study employs a mixed-methods approach, combining literature review, theoretical framework design, and empirical evaluation through controlled experiments with load testing and metric collection using Kubernetes-based deployments.
Research Design and Techniques
The study proposes a two-pronged framework leveraging both Large Language Models and statistical prediction methods. The approach uses GPT-4o to generate and iteratively adjust Terraform code based on specified SLO constraints, while incorporating statistical models—primarily polynomial regression—to predict whether infrastructure will violate specified SLOs before actual deployment. The framework operates in two modes: manual SLO definition where developers specify CPU and memory constraints, and metric-based SLO creation where the system derives SLOs from baseline performance data using polynomial regression models. Evaluation employs the Google Microservices Demo application with load generation via Locust benchmarks, monitoring through Prometheus and Grafana, and testing across variable user loads over extended periods.
Findings: Automation Promise vs. Constraint Trade-offs
The findings reveal a complex picture of LLM-enabled IaC automation with significant practical constraints. In the metric-based approach—which predicts SLOs from observed baseline metrics—the framework achieved up to 79% of target throughput (476 RPS vs. 600 RPS target) without SLO violations after three LLM-guided code adjustments, significantly outperforming the manual SLO approach which achieved only 22% of target throughput (131 RPS vs. 600 RPS). However, this improved throughput came at measurable costs: increased average response times and higher failure rates compared to baseline infrastructure.
A critical finding emerged regarding LLM behavior: the quality of generated IaC code is heavily dependent on prompt design and information structuring. When specific service-level SLOs were provided, the model tended to over-emphasize those services while neglecting others, and the system struggled with multi-service reasoning when infrastructure components failed. Additionally, the LLM’s native predictions of resource usage proved overly simplistic, merely summing resource limits rather than accounting for actual usage patterns, leading to infrastructure that was unnecessarily under-provisioned or inconsistently tuned.
Critical Implications and Limitations
The research demonstrates that effective SLO-aware IaC automation requires more than LLM code generation alone. Metric-based SLO derivation proved superior to manual definitions because it grounds constraints in actual application behavior rather than expert guesses, reducing both violations and hallucinated infrastructure assumptions. However, the research also revealed fundamental limitations: LLMs remain non-deterministic, meaning identical inputs can produce different outputs, and context window constraints made it impractical to include large IaC files in prompts without careful abstraction.
The most significant finding concerns the interplay between constraint satisfaction and performance optimization: tighter SLO constraints, while preventing system overload, inherently degrade performance metrics like throughput and latency compared to unconstrained baseline systems. The study concludes that successful IaC automation must balance three competing objectives—minimizing hallucinations through careful prompt engineering, grounding SLOs in empirical data rather than assumptions, and iteratively verifying compliance through applied load testing rather than predictive models alone.