Adaptive Vertical Scaling with Granular Degradation Prediction & Contextualized Multi-Armed Bandits

Research question and methodology

The central research question is: "How can we minimize over-allocation of compute resources in cloud-native orchestration platforms without degrading performance?" This question is particularly relevant, given that more than 65% of containers deployed with Kubernetes use less than half of their allocated CPU and memory resources. The research uses a quantitative methodology and analyzes metrics such as CPU utilization, memory utilization, disk I/O, network I/O, CPU throttling, out-of-memory errors, and end-to-end latency. It explicitly takes into account multiple dimensions of resource contention, beyond just CPU and memory.

Research design and techniques

The study introduces a two-phase predictive vertical scaling mechanism that combines kernel-level telemetry with online learning algorithms. In the first phase, holistic metrics are collected—ranging from kernel-level run-queue latency and block I/O stalls to container-level CPU and memory usage—and fed into a calibrated Random Forest classifier. This classifier generates a performance degradation likelihood score. In the second phase, a contextual multi-armed bandit algorithm uses this degradation estimate, along with current utilization metrics, to learn over successive iterations how CPU and memory allocations should be adjusted. This involves striking a balance between resource savings and performance risk.

The framework has been implemented on Kubernetes and evaluated against industry standards, including the Vertical Pod Autoscaler and SHOWAR.

Results: resource savings versus performance and stability considerations

The results paint a nuanced picture of optimization considerations. The proposed mechanism eliminated all out-of-memory errors in both benchmark applications tested (Google Cloud Online Boutique and Train Ticket), while existing state-of-the-art solutions exhibited multiple OOM failures. In addition, up to 3x less CPU throttling was achieved compared to these solutions, while maintaining a similar level of CPU resource allocation.

However, these gains come with trade-offs. The approach exhibits higher end-to-end latency than simpler existing solutions, likely due to the overhead introduced by frequent resource adjustments and kernel-level instrumentation.

Implications and future research

The research highlights the close correlation between resource efficiency and application stability in containerized environments. Although the proposed mechanism excels at preventing performance degradation thanks to its predictive capabilities, it entails significant overhead in terms of CPU and memory usage for monitoring and decision-making components. The findings show that no single autoscaling configuration is universally optimal. Operators must consciously choose where their systems lie on the curve between aggressive resource optimization and conservative performance protection.

Future research could focus on reducing the overhead of kernel-level instrumentation, optimizing the granularity of scaling actions, and integrating this approach with horizontal autoscaling to achieve more holistic resource management strategies.

Download
Privacy overview
This website uses cookies. We use cookies to ensure that our website and services function properly, to gain insight into the use of our website, and to improve our products and marketing. For more information, please read our privacy and cookie policy.