Introduction

Going production-grade ready

Although containers (and Kubernetes) are extensively deployed across European enterprise environments, questions persist among developers regarding secure, scalable implementation strategies. As such, we will show proven methodologies, critical security considerations and operational pitfalls that distinguish production-grade ready clusters. May it inspire starting your Kubernetes journey while following best practices.

Select an offering

The complexity of Kubernetes orchestration requires a systematic approach to cluster architecture, security policy enforcement, and operational procedures. Production environments require careful choices around etcd (the distributed key-value store topology), certificate and external secret management, network isolation, and monitoring strategies that go beyond basic container orchestration. European organizations also face additional compliance requirements, such as extensive audit trails, data sovereignty, and automated security policies via tools such as Kyverno. An alternative is to use a managed Kubernetes offering, such as Amazon EKS or SUE Managed Kubernetes Services.

Secure by design

Traditional Linux distributions introduce unnecessary attack surface and operational complexity in Kubernetes environments. Purpose-built operating systems like Talos Linux can eliminate SSH access vulnerabilities, shell interfaces and configuration drift while maintaining immutable infrastructure principles. It removes common attack vectors that plague conventional server deployments.

Avoid common mistakes

Kubernetes security failures enable complete infrastructure compromise through container escape and privilege escalation chains. Recent assessments demonstrate that misconfigured RBAC policies and permissive security contexts create attack pathways from containerized workloads directly to host systems. Organizations deploying Kubernetes without comprehensive security hardening expose themselves to lateral movement attacks that can compromise entire infrastructure stacks.

Making the right decisions

We focus on fundamental architecture-driven decisions, security hardening procedures, scalability, and operational practices that determine the long-term viability of your Cluster. That is why we include concrete configuration examples, policy implementations, and monitoring strategies that have been validated in production environments.

What exactly is Kubernetes—from a technical perspective?

Across multiple locations

Kubernetes is a distributed container orchestration platform that can scale horizontally across multiple physical nodes or virtual machines, even across different physical locations and data centers. It abstracts the underlying infrastructure complexity through declarative configuration management and automated workload scheduling.

Control plane components

The control plane consists of critical components such as the API server, the distributed etcd key-value store, the scheduler, and the controller manager. The kube-apiserver handles authentication, authorization, and validation of API requests before changes are stored in etcd. The scheduler analyzes resource requirements, node constraints, and affinity rules to optimally place pods across available worker nodes.

The truth of the cluster

etcd acts as a single source of truth for all cluster configuration and runtime information. It stores pod specs, Kubernetes services, secrets, and ConfigMaps, as well as cluster membership data. etcd typically runs in an odd number of instances (e.g., 3) to enable quorum-based leader election and maintain consistency during maintenance. All control plane components communicate with etcd via the API server, which ensures transactional consistency and enables recovery via etcdctl snapshots.

Control mechanisms

The controller manager contains multiple specialized controllers that continuously monitor the cluster state and perform corrective actions to maintain the desired configuration. Examples include deployment controllers for replica sets, service controllers for load balancing, and namespace controllers for isolation. Controllers work via watch mechanisms on the API server and respond to relevant changes.

Keep your cubes healthy

Worker nodes run the container workloads via three main components: kubelet, kube-proxy, and the container runtime. The kubelet manages the pod lifecycle, container health, and resource reporting to the control plane. Via the Container Runtime Interface (CRI), it starts, stops, and monitors containers and reports status information back.

Networking in the Kubernetes world

Container networking uses the Container Network Interface (CNI) to enable pod-to-pod communication across nodes. CNI plugins create network namespaces, assign IP addresses, and configure network interfaces. Popular implementations include Cilium (eBPF-based), Calico (policy-based routing), and Flannel (simple overlay networks). kube-proxy provides service load balancing across pod endpoints.

Compute units

Pods are the fundamental deployment unit in Kubernetes and contain one or more closely cooperating containers that share network and storage. Each pod is assigned a unique cluster IP. In practice, the single-container pattern is usually used, with multi-container pods for sidecar patterns such as logging, monitoring, or proxies.

Schedulers bring balance

The scheduler evaluates pod requirements against node capacity (CPU, memory, storage, and custom resources). Advanced features include node affinity, anti-affinity, taints and tolerations, and topology spread constraints. The goal is to strike a balance between efficient resource utilization, performance, and operational constraints.

Storage the way you need it

Kubernetes abstracts storage via persistent volumes and persistent volume claims. Storage classes define available storage types with specific performance and provisioning properties. Dynamic provisioning automatically creates volumes when applications request them. This ensures portability across different infrastructure providers.

Sidecar or not?

Service mesh solutions extend networking via sidecar proxies for traffic management, security, and observability. They offer mTLS, retries, circuit breaking, and tracing without modifying application code. Well-known solutions include Linkerd, Istio, and NGINX Service Mesh. Cilium chooses an alternative path via eBPF at the kernel level, without sidecar patterns.

Getting started

Managing complexity (or choose an offering)

A production-grade Kubernetes deployment requires systematic preparation, including infrastructure requirements, toolchain configuration, and the establishment of a solid security baseline. Kubernetes clusters require significant computing power and thoughtful network planning before the first deployment takes place. Organizations often underestimate the complexity of transitioning from development environments to production-ready clusters suitable for enterprise workloads.

Minimum requirements

Infrastructure requirements form the basis for reliable cluster operations and start with correct hardware sizing and network architecture. Control plane nodes require at least 4 CPU cores, 8GB RAM, and high-performance storage for etcd operations. Worker nodes must be scalable based on the expected workload. Network requirements include dedicated subnets for pod networking, service load balancing, and cluster management traffic, including firewall configurations that allow the appropriate ports. Inadequate network planning can lead to connectivity issues during cluster scaling and cross-node communication.

Choose your tools

The essential toolchain starts with the configuration of the kubectl client, the choice of a container runtime, and additional cluster management utilities. The kubectl binary is the primary interface for cluster management and requires proper authentication via kubeconfig files or service account tokens.
The choice of container runtime affects the performance of the cluster. In production environments, containerd generally offers better resource efficiency than Docker. Additional tools such as Helm for package management, stern for log aggregation, and k9s for interactive cluster monitoring increase operational effectiveness during the initial setup.

Initialize your cluster

The way you initialize a cluster varies greatly depending on your infrastructure goals and operational requirements. Kubeadm offers a standard bootstrap procedure that is suitable for bare metal and virtual machine deployments. Managed cloud services such as SUE Managed K8s Services, Amazon EKS, Google GKE, and Azure AKS abstract away much of the infrastructure complexity.
Self-managed approaches with tools such as Terraform and Ansible enable custom configurations but require in-depth operational expertise.

Are strong security foundations already in place?

The initial security configuration establishes basic protection before workloads are rolled out. Role-based access control (RBAC) defines permission boundaries for cluster operations and prevents unauthorized access to sensitive resources and namespaces. NetworkPolicy implementations (see, for example, "kubectl explain netpol") restrict pod-to-pod communication and create microsegmentation, significantly reducing the attack surface in compromised containers.

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netpol-backend
namespace: backend
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: # Do NOT forget this second `- to`, to mark the start of an ADDITIONAL rule
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
```

Copy code

The above logic can be interpreted in pseudo-code as: (destination pod has label app=postgres AND port is 5432) OR (destination pod has label app=mysql AND port is 3306)

Basic runtime tests

Validation procedures verify that the cluster is correctly installed and configured before production workloads are rolled out. Basic connectivity tests verify that control plane components communicate correctly with worker nodes via routing and firewall rules. Pod scheduling tests confirm that the scheduler places workloads correctly, taking into account resource constraints and affinity rules. Storage provisioning tests verify that dynamic volumes are created correctly via the configured storage classes and CSI implementations.

Maintain state without drift

Configuration management ensures a consistent cluster state through declarative definitions and integration with version control. Infrastructure as Code with tools such as Terraform can manage cluster operations, including node provisioning.
Kubernetes resource manifests in version control enable reproducible deployments and support drift detection via GitOps methodologies. In addition, we recommend using Helm charts for template-based (Go-like) resource definitions, which allow you to apply environment-specific configurations while maintaining consistency.

A dashboard for everyone

Monitoring and observability are essential from day one. Prometheus or Victoria Metrics installations provide metrics for cluster components, node resources, and application performance. Grafana dashboards visualize this data using predefined templates that show critical indicators such as CPU usage, memory usage, and storage capacity.
Log aggregation with solutions such as Fluentd or Fluent Bit centralizes container logs and cluster events for analysis and alerting.

Thoroughly test your recovery procedures

Backup strategies protect against cluster state loss and support disaster recovery. Automated etcd backups ensure that cluster configurations and resource definitions remain recoverable in the event of control plane failures.
Backups of persistent volumes protect application data. Disaster recovery tests validate restore procedures and establish recovery time objectives for critical workloads.

Host OS Best Practices

Choose your OS wisely

Purpose-built operating systems for container orchestration offer better security posture and operational efficiency than traditional Linux distributions in Kubernetes environments. Talos Linux is an optimal host OS for production Kubernetes clusters thanks to its immutable infrastructure design. The operating system boots directly into Kubernetes node functionality without unnecessary services, reducing the attack surface by approximately 80% compared to common distributions such as Ubuntu or CentOS.

Traditional Linux distributions introduce additional complexity due to package dependency management, configuration drift, and persistent state changes that affect the reliability of the cluster. Host hardening requires extensive service disabling, firewall configurations, and security policies, which increases the maintenance burden.

Talos Linux uses declarative configuration via machine configuration files that define the entire system state without runtime adjustments. The API-driven architecture supports GitOps workflows for infrastructure management and prevents configuration drift through an immutable root filesystem. Security updates are performed via full image replacements instead of package-level patching, preventing partial updates and policy bypasses.

In production environments, Talos benefits from simplified disaster recovery thanks to its stateless node architecture: hosts are completely replaced via automated provisioning instead of manual recovery. Security and compliance frameworks such as NIST 800-190 and the CIS Kubernetes Benchmark align well with this architecture, reducing audit complexity and remediation efforts.

(The code example for secret management within a single configuration file remains unchanged.)

Certificate.yaml
{{- range .Values.ingress.tls }}
{{- if eq .provider &quot;ckms&quot; }}
# Option 1: automatic certificate and key from a key management store (rotated every 3
months)
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
spec:
secretName: {{ .secretName }}
renewBefore: {{ $.Values.certificate.renewBefore }}
subject:
organizations:
- {{ $.Values.certificate.organization }}
countries:
- {{ $.Values.certificate.country }}
organizationalUnits:
- {{ $.Values.certificate.organizationalUnit }}
{{ with index .hosts 0 }}
commonName: {{ . }}
{{- end }}
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 4096
rotationPolicy: Always
usages:
- server auth
dnsNames:
{{- range .hosts }}
- {{ . }}
{{- end }}
issuerRef:
name: ckms--clusterissuer
kind: ClusterIssuer
group: cert-manager.io
{{- else if eq .provider &quot;vault&quot; }}
# Option 2: certificate and key from Hashicorp Vault key value store
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
spec:
refreshInterval: &quot;15s&quot;
secretStoreRef:
# kubectl get css

name: {{ $.Values.vaultBackend }}
kind: ClusterSecretStore
target:
name: {{ .secretName }}
template:
type: kubernetes.io/tls
data:
- secretKey: tls.key
remoteRef:
key: otap/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{
.secretName }}
property: key
- secretKey: tls.crt
remoteRef:
key: otap/k8s-tst/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{
.secretName }}
property: certificate
{{- else if eq .provider &quot;file&quot; }}
# Option 3 (for debug): certificate and key from data in values (files)
---
apiVersion: v1
kind: Secret
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
type: kubernetes.io/tls
data:
tls.key: |
{{ .key | b64enc }}
tls.crt: |
{{ .certificate | b64enc }}
{{- end }}
{{- end }}
With the relevant value.yaml section of the helmchart:
ingress:
enabled: true
className: &quot;nginx&quot; # Already outdated.. Consider Gateway API from the SIG-Network project
labels: {}
annotations: {}
# depending on annotations: https://cert-manager.io/docs/usage/ingress/#optional-
configuration
# kubernetes.io/tls-acme: &quot;true&quot;
hosts:
- host: fqdn
paths:
- path: /
pathType: Prefix
tls:
- secretName: app-certificate
provider: ckms
hosts:
- fqdn

Copy code

Need additional Kubernetes expertise?

Ready to deploy Kubernetes securely and efficiently at scale? We are SUE, your trusted partner in building resilient infrastructure. Let's turn complexity into opportunity. Contact us today and start your Kubernetes journey.

Best practices for setting up a Kubernetes cluster

Introduction

What exactly is Kubernetes—from a technical perspective?

Getting started

Host OS Best Practices

Need additional Kubernetes expertise?

Ready to deploy Kubernetes at scale?

Let's chat!

Ready to deploy Kubernetes at scale?

Best practices for setting up a Kubernetes cluster

Introduction

What exactly is Kubernetes—from a technical perspective?

Getting started

Host OS Best Practices

Need additional Kubernetes expertise?

Ready to deploy Kubernetes at scale?

Let's chat!

Ready to deploy Kubernetes at scale?

Related articles

How to scale your On-Prem Elastic Observability cluster - part 1

Cost optimization as a catalyst for cloud sustainability

From zero to insight: how to get started with observability using Elastic Cloud