Why we rehearse cordon and drain like a musicians tuning

We treat node maintenance like a rehearsal studio, not a surprise concert. Before anyone touches production paths, cohorts write down assumptions about pod disruption budgets, stateful set ordering, and who owns the bridge line. That paperwork feels slow until the first time a kubelet wedges and the room already knows who reads logs versus who updates stakeholders.

In the lab we inject slow etcd responses while participants still have coffee. The goal is not speedrunning kubectl; it is proving that your drain plan mentions custom metrics hooks your employer actually runs. When a team realizes their monitoring excludes a critical namespace, that is a good morning.

By Thursday we force a swap: the quietest engineer runs comms while the usual lead stays on keyboard. Rotation exposes gaps in runbooks that looked polished when a single hero wrote them. We end with a five-bullet retro that must include one thing we would ship differently next quarter—not vague platitudes about communication.

None of this replaces your internal change management. It gives you language and artifacts external reviewers can follow when they ask why traffic shifted during a maintenance window.

← Back to field notes

Top