Introduction
Going production-grade ready
Although containers (and Kubernetes) are extensively deployed across European enterprise environments, questions persist among developers regarding secure, scalable implementation strategies. As such, we will show proven methodologies, critical security considerations and operational pitfalls that distinguish production-grade ready clusters. May it inspire starting your Kubernetes journey while following best practices.
Choose an offering
The complexity of Kubernetes orchestration demands systematic approaches to cluster architecture, security policy enforcement and operational procedures. Production environments require careful consideration of etcd (the distributed key- value store) topology, certificate and external secret management, network isolation and monitoring strategies that extend beyond basic container orchestration. European organizations face additional compliance requirements that necessitate comprehensive audit trails, data sovereignty controls and security policy automation through tools like Kyverno. Or use a managed Kubernetes offering, such as Amazon EKS or SUE Managed Kubernetes Services.
Secure by design
Traditional Linux distributions introduce unnecessary attack surface and operational complexity in Kubernetes environments. Purpose-built operating systems like Talos Linux can eliminate SSH access vulnerabilities, shell interfaces and configuration drift while maintaining immutable infrastructure principles. It removes common attack vectors that plague conventional server deployments.
Prevent common mistakes
Kubernetes security failures enable complete infrastructure compromise through container escape and privilege escalation chains. Recent assessments demonstrate that misconfigured RBAC policies and permissive security contexts create attack pathways from containerized workloads directly to host systems. Organizations deploying Kubernetes without comprehensive security hardening expose themselves to lateral movement attacks that can compromise entire infrastructure stacks.
Making the right decisions
We want to address fundamental architectural decisions, security hardening procedures, scalability and operational practices that determine long-term cluster viability. Therefore we will include specific configuration examples, policy implementations and monitoring strategies validated in production environments.
What is Kubernetes - technically – all about?
Going across the country
Kubernetes operates as a distributed container orchestration platform. It enables horizontal scaling across multiple physical nodes or virtual machines. Even across physical locations and data centres. Kubernetes abstracts underlying infrastructure complexity through declarative configuration management and automated scheduling of workloads.
Control plane parts
The control plane encompasses critical components including the API server, etcd distributed key-value store, scheduler and controller manager. The kube-apiserver handles authentication, authorization and API request validation before persisting changes to etcd. It processes commands, controller requests and external tool integrations through RESTful API calls that maintain cluster state consistency. The scheduler analyzes resource requirements, node constraints and affinity rules to determine optimal pod placement across available worker nodes.
Cluster truth
etcd serves as the single source of truth for all cluster configuration and runtime information. It stores pod specifications, k8s service, secrets and configuration map definitions, as well as cluster membership data. The etcd cluster normally runs with an odd-numbered amount of instances (e.g. 3, each control plane node 1 instance) to enable quorum-based leader election and be able to maintain consistency if one node is drained for maintenance. All control plane components interact only with etcd through the API server, ensuring transactional consistency and enabling cluster state recovery through etcdctl made snapshots.
Control mechanisms
The controller manager operates multiple specialized controllers that continuously monitor cluster state and implement corrective actions to maintain desired configurations. These controllers include deployment controllers managing replica sets (of pods), k8s network service controllers configuring load balancing and control over namespaces for user space isolation. Each controller implements watch patterns against the API server, triggering when relevant resources change.
Keeping your kubelets healthy
Worker nodes execute the actual container workloads through three primary components: kubelet, kube-proxy and container runtime. The kubelet acts as the node agent responsible for pod lifecycle management, container health and resource reporting to the control plane. This component receives pod specifications from the API server and coordinates with the container runtime through the Container Runtime Interface (CRI) to create, start, stop and monitor containers. The kubelet implements health checks, resource limits and volume mounting while continuously reporting node and pod status back to the control plane.
Networking in the kubernetes world
Container networking relies on the Container Network Interface (CNI) specification to provide pod-to-pod communication across cluster nodes. CNI plugins create network namespaces for pods, allocate IP addresses through IPAM plugins and configure network interfaces to enable container connectivity. Popular CNI implementations include Cilium for eBPF-based networking, Calico for policy-based routing and Flannel for simple overlay networks. The kube-proxy component implements service load balancing distributing traffic across pod endpoints.
Units of compute
Pods represent the fundamental deployment unit in Kubernetes, encapsulating one or more tightly coupled containers that share network and storage resources. Each pod receives a unique cluster IP address and containers within the pod communicate through localhost interfaces. Pod specifications define container images, resource requirements, environment variables and volume mounts through declarative YAML manifests. Pods typically follow the single container pattern, though multi-container pods support sidecar patterns for logging, monitoring, or proxy functionality.
Schedulers bring balance
The scheduler algorithm evaluates pod requirements against node capacity, considering CPU, memory, storage and custom resource constraints. Advanced scheduling features include node affinity rules, anti-affinity constraints, taints and tolerations for specialized workloads and topology spread constraints for fault domain distribution. Scheduling decisions balance resource utilization efficiency with application performance requirements and operational constraints.
Storage provisioned however you like and need
Kubernetes implements storage abstraction through persistent volumes and persistent volume claims, separating storage provisioning from consumption. Storage classes define available storage types with specific performance characteristics, backup policies and provisioners. Dynamic volume provisioning automatically creates storage resources when applications request persistent volume claims, while static provisioning requires manual volume creation by cluster administrators. This architecture enables storage portability across different infrastructure providers while maintaining application-level storage abstractions.
To sidecar or not to sidecar
Service mesh integration extends Kubernetes networking capabilities through sidecar proxy patterns that provide advanced traffic management, security policies and observability features. These solutions intercept pod-to-pod communication to implement mutual TLS authentication, circuit breaking, retry logic and distributed tracing without modifying application code. Some popular technologies to implement a service mesh in Kubernetes clusters are Linkerd, Istio and NGINX Service Mesh. This is in contrast to Celium who relies on eBPF to support similar capabilities on a kernel level. (no sidecar patterns)
Getting started
Manage complexity (or choose an offering)
Production-grade Kubernetes deployment requires systematic preparation encompassing infrastructure prerequisites, toolchain configuration, and security baseline establishment. Kubernetes clusters demand substantial computational resources and network infrastructure planning before initial deployment procedures. Organizations frequently underestimate the complexity involved in transitioning from development environments to production-ready clusters capable of handling enterprise workloads.
Minimum requirements
Infrastructure prerequisites establish the foundation for reliable cluster operations through proper hardware dimensioning and network architecture design. Control plane nodes require minimum specifications of 4 CPU cores, 8GB RAM, and high- performance storage for etcd operations, while worker nodes will have to scale to anticipated workload demands. Network requirements include dedicated subnets for pod networking, service load balancing, and cluster management traffic with appropriate firewall configurations permitting required port ranges. Inadequate network planning could create connectivity issues during cluster scaling operations and cross-node communication scenarios.
Choose your utilities
Essential toolchain installation begins with kubectl client configuration, container runtime selection, and cluster management utilities. The kubectl binary serves as the primary interface for cluster administration and requires proper authentication configuration through kubeconfig files or service account tokens. Container runtime selection influences cluster performance characteristics, with containerd providing superior resource efficiency compared to Docker for production deployments. Additional utilities including helm for package management, stern for log aggregation, and k9s for interactive cluster monitoring enhance operational capabilities during initial cluster setup phases.
Initialize your cluster
Cluster initialization approaches vary significantly based on infrastructure targets and operational requirements. Kubeadm provides the standard cluster bootstrapping procedures suitable for bare metal and virtual machine deployments. Cloud provider managed services including SUE Managed K8s Services, Amazon EKS, Google GKE, and Azure AKS abstract infrastructure complexity away. Self-managed approaches using infrastructure automation tools like Terraform and Ansible enable customized cluster configurations while requiring comprehensive operational expertise.
Strong security foundations already in-place?
Initial security configuration establishes baseline protection mechanisms before workload deployment begins. Role-based access control (RBAC) configuration defines permission boundaries for cluster operations, preventing unauthorized access to sensitive cluster resources and namespace isolation. Network policy (try `kubectl explain netpol`) implementation restricts pod-to-pod communication patterns, creating microsegmentation boundaries that limit attack surface exposure in compromised container scenarios.
```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: netpol-backend
namespace: backend
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
- to: # Do NOT forget this second `- to`, to mark the start of an ADDITIONAL rule
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
```
Copy code
The above can be interpreted in pseudo-code as: (destination pod has label app=postgres AND port is 5432) OR (destination pod has label app=mysql AND port is 3306)
Basic runtime tests
Cluster validation procedures verify proper installation and configuration before production workload deployment. Basic connectivity testing ensures control plane components communicate correctly with worker nodes through proper network routing and firewall configurations. Pod scheduling verification confirms that the scheduler algorithm correctly places workloads across available nodes while respecting resource constraints and affinity rules. Storage provisioning tests validate dynamic volume creation through configured storage classes and container storage interface implementations.
Keep your state without drift
Configuration management establishes consistent cluster state through declarative resource definitions and version control integration. Infrastructure as code approaches using tools like Terraform can manage cluster operations such as node provisioning. Kubernetes resource manifests stored in version control repositories enable reproducible deployments and configuration drift detection through GitOps methodologies. In addition we can recommend using Helm charts to provide templated (Go-like) resource definitions that support environment-specific customization while ensuring consistent kubernetes resource definitions.
Everyone gets a dashboard
Monitoring and observability configuration enables cluster health assessment and troubleshooting capabilities from initial deployment. Prometheus or Victoria Metrics installation provides metrics collection infrastructure for cluster components, node resources and application performance indicators. Grafana dashboard configuration visualizes cluster metrics through predefined templates that highlight critical performance indicators including CPU utilization, memory consumption, and storage capacity trends. Log aggregation through solutions like Fluentd or Fluent Bit centralizes container logs and cluster events for analysis and alerting purposes.
Your should really test your recovery procedures
Initial backup procedures protect against cluster state loss and enable disaster recovery capabilities. etcd backup automation ensures cluster configuration and resource definitions remain recoverable during control plane failures. Persistent volume backup strategies protect application data stored in dynamically provisioned storage resources. Disaster recovery procedures validate backup restoration processes and document recovery time objectives for critical workloads.
Host OS Best Practices
Choose your OS wisely
Purpose-built operating systems designed specifically for container orchestration provide superior security posture and operational efficiency compared to traditional Linux distributions in Kubernetes environments. Talos Linux represents the optimal host operating system architecture for production Kubernetes clusters through its immutable infrastructure design. The operating system boots directly into Kubernetes node functionality without unnecessary services, reducing the attack surface by approximately 80% compared to mainstream Linux distro’s, such as Ubuntu or CentOS.
Traditional Linux distributions introduce significant operational complexity through package dependency management, configuration drift, and persistent state modifications that compromise cluster reliability. Host hardening procedures on traditional distributions require extensive service disabling, firewall configuration, and security policy implementation that increases maintenance overhead.
Talos Linux implements declarative configuration management through machine configuration files that define complete system state without runtime modifications. The API-driven architecture enables GitOps workflows for infrastructure management, ensuring consistent deployment patterns across cluster nodes while preventing configuration drift through immutable root filesystem design. Automatic security updates occur through complete image replacement rather than package-level patching, eliminating partial update failures and security policy bypasses.
Talos production deployments benefit from simplified disaster recovery procedures through stateless node architecture where complete host replacement occurs through automated provisioning rather than configuration restoration. Security compliance frameworks including NIST 800-190 and CIS Kubernetes Benchmark requirements align naturally with Talos architecture principles, reducing audit complexity and remediation requirements.
Code example for 3 different ways to manage secrets within one configuration file (note the (helmchart) templating language, based on Go templating + Sprig template library).
Certificate.yaml
{{- range .Values.ingress.tls }}
{{- if eq .provider "ckms" }}
# Option 1: automatic certificate and key from a key management store (rotated every 3
months)
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
spec:
secretName: {{ .secretName }}
renewBefore: {{ $.Values.certificate.renewBefore }}
subject:
organizations:
- {{ $.Values.certificate.organization }}
countries:
- {{ $.Values.certificate.country }}
organizationalUnits:
- {{ $.Values.certificate.organizationalUnit }}
{{ with index .hosts 0 }}
commonName: {{ . }}
{{- end }}
isCA: false
privateKey:
algorithm: RSA
encoding: PKCS1
size: 4096
rotationPolicy: Always
usages:
- server auth
dnsNames:
{{- range .hosts }}
- {{ . }}
{{- end }}
issuerRef:
name: ckms--clusterissuer
kind: ClusterIssuer
group: cert-manager.io
{{- else if eq .provider "vault" }}
# Option 2: certificate and key from Hashicorp Vault key value store
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
spec:
refreshInterval: "15s"
secretStoreRef:
# kubectl get css
name: {{ $.Values.vaultBackend }}
kind: ClusterSecretStore
target:
name: {{ .secretName }}
template:
type: kubernetes.io/tls
data:
- secretKey: tls.key
remoteRef:
key: otap/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{
.secretName }}
property: key
- secretKey: tls.crt
remoteRef:
key: otap/k8s-tst/{{ $.Release.Namespace }}/{{ $.Release.Name }}/certificates/{{
.secretName }}
property: certificate
{{- else if eq .provider "file" }}
# Option 3 (for debug): certificate and key from data in values (files)
---
apiVersion: v1
kind: Secret
metadata:
name: {{ .secretName }}
namespace: {{ $.Release.Namespace }}
type: kubernetes.io/tls
data:
tls.key: |
{{ .key | b64enc }}
tls.crt: |
{{ .certificate | b64enc }}
{{- end }}
{{- end }}
With the relevant value.yaml section of the helmchart:
ingress:
enabled: true
className: "nginx" # Already outdated.. Consider Gateway API from the SIG-Network project
labels: {}
annotations: {}
# depending on annotations: https://cert-manager.io/docs/usage/ingress/#optional-
configuration
# kubernetes.io/tls-acme: "true"
hosts:
- host: fqdn
paths:
- path: /
pathType: Prefix
tls:
- secretName: app-certificate
provider: ckms
hosts:
- fqdn
Copy code
Need additional Kubernetes expertise?
Ready to deploy Kubernetes at scale, securely and efficiently? We are SUE, your trusted partner in building resilient infrastructure. Let’s turn complexity into capability. Contact us today to start your journey.