Visual for Worker Node & kubelet Reliability

Program narrative

You spend most of the week inside nodes: cgroup pressure, image pulls, CNI timeouts, and kubelet PLEG loops. Labs rotate through deliberate faults so you practice narrowing blame with evidence instead of restarting everything. We also cover graceful drain choreography for enterprise clients who cannot afford surprise eviction storms.

Inclusions

kubelet flags that matter in production versus toy clusters
Runtime class differences without picking a vendor winner
Host-level networking checks that complement kubectl
Node problem detector patterns and when to silence them
Quality standards for cordon/drain communications
Pair debugging etiquette under time pressure

Outcomes you can evidence

Complete a drain plan with risk callouts for stateful sets
Produce a kubelet log excerpt that proves root cause
Ship a one-page postmortem your manager can skim in two minutes

Common questions

Which container runtime?

Labs default to containerd. If your employer still ships another supported runtime, bring screenshots and we adapt exercises verbally.

Physical data center access?

Limitations?

From our cohorts

“Client in cloud operations — the PLEG loop reproduction finally convinced our vendor it was not “just Kubernetes being Kubernetes.””

D. in Busan