Best practices for setting up a Kubernetes cluster

Introduction

Ready for Production
Although containers (and Kubernetes) are widely used in European enterprise environments, developers still have questions about secure and scalable implementation strategies. In this blog, we highlight proven methodologies, critical security considerations, and operational pitfalls that distinguish an experimental setup from a production-grade cluster. We hope this inspires you to begin your Kubernetes journey, guided by best practices.

Choose an offering
The complexity of Kubernetes orchestration requires a systematic approach to cluster architecture, security policy enforcement, and operational procedures. Production environments demand careful decisions regarding etcd (the distributed key-value store topology), certificate and external secret management, network isolation, and monitoring strategies that go beyond basic container orchestration. European organizations also face additional compliance requirements, such as comprehensive audit trails, data sovereignty, and automated security policies via tools like Kyverno. An alternative is to use a managed Kubernetes offering, such as Amazon EKS or SUE Managed Kubernetes Services.

Secure-by-design-
Traditional Linux distributions introduce unnecessary attack surfaces and operational complexity into Kubernetes environments. Purpose-built operating systems such as Talos Linux eliminate SSH access, shell interfaces, and configuration drift, while maintaining immutable infrastructure principles. This eliminates common attack vectors found in conventional server deployments.

Avoid Common Mistakes in K
Kubernetes security vulnerabilities can be exploited via container escape and privilege escalation lead to complete takeover of the infrastructure. Recent analyses show that misconfigured RBAC policies and overly broad security contexts create attack paths from container workloads to the host system. Organizations that deploy Kubernetes without thorough security hardening are vulnerable to lateral movement, which can compromise entire infrastructure stacks are at risk.

Making the Right Decisions
We focus on fundamental architecture-driven decisions, security hardening procedures, scalability, and operational practices that determine the long-term viability of your cluster. That is why we include concrete configuration examples, policy implementations, and monitoring strategies that have been validated in production environments.

What exactly is Kubernetes—from a technical perspective?


Across Multiple Locations
Kubernetes is a distributed container orchestration platform that can scale horizontally across multiple physical nodes or virtual machines, even across different physical locations and data centers. It abstracts the underlying infrastructure complexity through declarative configuration management and automated workload scheduling.


control plane components
The control plane consists of critical components such as the API server, the distributed etcd key-value store, the scheduler, and the controller manager. The kube-apiserver handles authentication, authorization, and validation of API requests before changes are stored in etcd. The scheduler analyzes resource requirements, node constraints, and affinity rules to optimally place pods across available worker nodes.

The truth of the cluster
Etcd serves as the single source of truth for all cluster configuration and runtime information. It stores pod specs, Kubernetes services, secrets, and ConfigMaps, as well as cluster membership data. Etcd typically runs in an odd number of instances (for example, 3) to enable quorum-based leader election and maintain consistency during maintenance. All control plane components communicate with etcd via the API server, which ensures transactional consistency and enables recovery via etcdctl snapshots.

Control mechanisms with
The controller manager contains multiple specialized controllers that continuously monitor the cluster state and take corrective actions to maintain the desired configuration. Examples include deployment controllers for replica sets, service controllers for load balancing, and namespace controllers for isolation. Controllers operate via watch mechanisms on the API server and respond to relevant changes.

Keep your kubelets healthy
Worker nodes run container workloads through three main components: kubelet, kube-proxy, and the container runtime. The kubelet manages the pod lifecycle, container health, and resource reporting to the control plane. Through the Container Runtime Interface (CRI), it starts, stops, and monitors containers and reports status information back.


Container networking uses the Container Network Interface (CNI) to enable pod-to-pod communication across nodes. CNI plugins create network namespaces, assign IP addresses, and configure network interfaces. Popular implementations include Cilium (eBPF-based), Calico (policy-based routing), and Flannel (simple overlay networks). kube-proxy handles service load balancing across pod endpoints.

Compute units
Pods are the fundamental deployment unit in Kubernetes and contain one or more closely collaborating containers that share a network and storage. Each pod is assigned a unique cluster IP. In practice, the single-container pattern is typically used, with multi-container pods for sidecar patterns such as logging, monitoring, or proxies.

Schedulers ensure balance with
The scheduler evaluates pod requirements against node capacity (CPU, memory, storage, and custom resources). Advanced features include node affinity, anti-affinity, taints and tolerations, and topology spread constraints. The goal is to balance efficient resource utilization, performance, and operational constraints.

Storage tailored to your needs
Kubernetes abstracts storage through persistent volumes and persistent volume claims. Storage classes define available storage types with specific performance and provisioning characteristics. Dynamic provisioning automatically creates volumes when applications request them. This ensures portability across different infrastructure providers.

Sidecar or not?
Service mesh solutions extend networking capabilities via sidecar proxies for traffic management, security, and observability. They offer features such as mTLS, retries, circuit breaking, and tracing without requiring changes to application code. Well-known solutions include Linkerd, Istio, and NGINX Service Mesh. Cilium takes an alternative approach using eBPF at the kernel level, without sidecar patterns.

Getting started

Managing Complexity (or Choose an Offering)
A production-grade Kubernetes deployment requires systematic preparation, including infrastructure requirements, toolchain configuration, and establishing a solid security baseline. Kubernetes clusters require significant computing power and careful network planning before the first deployment takes place. Organizations often underestimate the complexity of transitioning from development environments to production-ready clusters suitable for enterprise workloads.


Minimum Requirements
Infrastructure requirements form the foundation for reliable cluster operations and start with proper hardware sizing and network architecture. Control plane nodes require a minimum of 4 CPU cores, 8GB of RAM, and high-performance storage for etcd operations. Worker nodes must be scalable based on the expected workload. Network requirements include dedicated subnets for pod networking, service load balancing, and cluster management traffic, including firewall configurations that allow the appropriate ports. Inadequate network planning can lead to connectivity issues during cluster scaling and cross-node communication.

Choose your tools
The essential toolchain starts with configuring the kubectl client, choosing a container runtime, and selecting additional cluster management utilities. The kubectl binary is the primary interface for cluster management and requires proper authentication via kubeconfig files or service account tokens.
The choice of container runtime affects the cluster’s performance. In production environments, containerd typically offers better resource efficiency than Docker. Additional tools such as Helm for package management, stern for log aggregation, and k9s for interactive cluster monitoring enhance operational capabilities during the initial setup.

Initialize your cluster
The way you initialize a cluster varies greatly depending on your infrastructure goals and operational requirements. Kubeadm provides a standard bootstrap procedure suitable for bare-metal and virtual machine deployments. Managed cloud services such as SUE Managed K8s Services, Amazon EKS, Google GKE, and Azure AKS abstract away much of the infrastructure complexity.
Self-managed approaches using tools like Terraform and Ansible enable custom configurations but require in-depth operational expertise.

Are strong security foundations already in place?
The initial security configuration establishes basic protection before workloads are deployed. Role-based access control (RBAC) defines permission boundaries for cluster operations and prevents unauthorized access to sensitive resources and namespaces. NetworkPolicy implementations (see, for example, ‘kubectl explain netpol’) restrict pod-to-pod communication and create microsegmentation, significantly reducing the attack surface in the event of compromised containers.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: netpol-backend
  namespace: backend
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: mysql
      ports:
        - protocol: TCP
          port: 3306
                                                        

Copy code

The above logic can be interpreted in pseudo-code as: (destination pod has label app=postgres AND port is 5432) OR (destination pod has label app=mysql AND port is 3306)

Basic runtime tests
Validation procedures verify that the cluster is correctly installed and configured before production workloads are deployed. Basic connectivity tests verify that control plane components communicate correctly with worker nodes via routing and firewall rules. Pod scheduling tests confirm that the scheduler places workloads correctly, taking into account resource constraints and affinity rules. Storage provisioning tests verify that dynamic volumes are created correctly using the configured storage classes and CSI implementations.

Maintain state without drift
Configuration management ensures a consistent cluster state through declarative definitions and integration with version control. Infrastructure as Code using tools such as Terraform can manage cluster operations, including node provisioning.
Placing Kubernetes resource manifests under version control enables reproducible deployments and supports drift detection via GitOps methodologies. Additionally, we recommend using Helm charts for template-based (Go-style) resource definitions, which allow you to apply environment-specific configurations while maintaining consistency.

A Dashboard for Everyone
Monitoring and observability are essential from day one. Installations of Prometheus or Victoria Metrics provide metrics for cluster components, node resources, and application performance. Grafana dashboards visualize this data using predefined templates that display critical indicators, such as CPU usage, memory usage, and storage capacity.
Log aggregation with solutions such as Fluentd or Fluent Bit centralizes container logs and cluster events for analysis and alerting.

Thoroughly test your recovery procedures
Backup strategies protect against loss of cluster state and support disaster recovery. Automated etcd backups ensure that cluster configurations and resource definitions remain recoverable in the event of control plane failures.
Backups of persistent volumes protect application data. Disaster recovery tests validate restore procedures and establish recovery time objectives for critical workloads.

Host OS Best Practices

Choose Your OS Wisely
Purpose-built operating systems for container orchestration offer better security posture and operational efficiency than traditional Linux distributions in Kubernetes environments. Talos Linux is an optimal host OS for production Kubernetes clusters thanks to its immutable infrastructure design. The operating system boots directly into Kubernetes node functionality without unnecessary services, reducing the attack surface by approximately 80% compared to common distributions such as Ubuntu or CentOS.

Traditional Linux distributions introduce additional complexity due to package dependency management, configuration drift, and persistent state changes that affect the reliability of the cluster. Host hardening requires extensive service disabling, firewall configurations, and security policies, which increases the maintenance burden.

Talos Linux uses declarative configuration via machine configuration files that define the entire system state without runtime adjustments. The API-driven architecture supports GitOps workflows for infrastructure management and prevents configuration drift through an immutable root filesystem. Security updates are performed via full image replacements instead of package-level patching, preventing partial updates and policy bypasses.

In production environments, Talos benefits from simplified disaster recovery thanks to its stateless node architecture: hosts are completely replaced via automated provisioning instead of manual recovery. Security and compliance frameworks such as NIST 800-190 and the CIS Kubernetes Benchmark align well with this architecture, reducing audit complexity and remediation efforts.

{{- range .Values.ingress.tls }}
  {{- if eq .provider "ckms" }}
  ---
  apiVersion: cert-manager.io/v1
  kind: Certificate
  metadata:
    name: {{ .secretName }}
    namespace: {{ $.Release.Namespace }}
  spec:
    secretName: {{ .secretName }}
    renewBefore: {{ $.Values.certificate.renewBefore }}
    subject:
      organizations:
        - {{ $.Values.certificate.organization }}
      countries:
        - {{ $.Values.certificate.country }}
      organizationalUnits:
        - {{ $.Values.certificate.organizationalUnit }}
    {{- with index .hosts 0 }}
    commonName: {{ . }}
    {{- end }}
    isCA: false
    privateKey:
      algorithm: RSA
      encoding: PKCS1
      size: 4096
      rotationPolicy: Always
    usages:
      - server auth
    dnsNames:
      {{- range .hosts }}
      - {{ . }}
      {{- end }}
    issuerRef:
      name: ckms--clusterissuer
      kind: ClusterIssuer
      group: cert-manager.io

  {{- else if eq .provider "vault" }}
  ---
  apiVersion: external-secrets.io/v1beta1
  kind: ExternalSecret
  metadata:
    name: {{ .secretName }}
    namespace: {{ $.Release.Namespace }}
  spec:
    refreshInterval: "15s"
    secretStoreRef: # kubectl get css
      name: {{ $.Values.vaultBackend }}
      kind: ClusterSecretStore
    target:
      name: {{ .secretName }}
    template:
      type: kubernetes.io/tls
    data:
      - secretKey: tls.key
        remoteRef:
          key: otap/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{ .secretName }}
          property: key
      - secretKey: tls.crt
        remoteRef:
          key: otap/k8s-tst/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{ .secretName }}
          property: certificate

  {{- else if eq .provider "file" }}
  ---
  apiVersion: v1
  kind: Secret
  metadata:
    name: {{ .secretName }}
    namespace: {{ $.Release.Namespace }}
  type: kubernetes.io/tls
  data:
    tls.key: {{ .key | b64enc }}
    tls.crt: {{ .certificate | b64enc }}
  {{- end }}
{{- end }}

ingress:
  enabled: true
  className: "nginx"
  labels: {}
  annotations: {}
  hosts:
    - host: fqdn
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-certificate
      provider: ckms
      hosts:
        - fqdn
                                                        

Copy code

Need additional Kubernetes expertise?

Ready to deploy Kubernetes securely and efficiently at scale? We are SUE, your trusted partner in building resilient infrastructure. Let's turn complexity into opportunity. Contact us today and start your Kubernetes journey.

Stay informed
By subscribing to our newsletter, you declare that you agree with our privacy statement.

Ready to deploy Kubernetes at scale?

stefan.behlen 1
Stefan Behlen

Let's chat!


Ready to deploy Kubernetes at scale?

* required

By submitting this form, you confirm that you have read and understood our privacy statement.