Async Reconciliation in Kubernetes
Kubernetes doesn’t try to make changes happen immediately and perfectly. Instead, it continuously reconciles what is with what should be. This is the single most important idea in Kubernetes, and once you internalize it, everything else about the system makes more sense.
The reconciliation loop
You declare your desired state: “I want 3 replicas of this pod.” Kubernetes stores that declaration in etcd. Then controllers - dozens of them, each responsible for one resource type - run loops that compare actual state to desired state and take action to close the gap.
while true:
actual = get_current_state()
desired = get_desired_state()
if actual != desired:
take_action_to_reconcile()
sleep(interval)
That’s the whole pattern. Every controller in Kubernetes is a variation of this loop. The Deployment controller watches for Deployments and manages ReplicaSets. The ReplicaSet controller watches for ReplicaSets and manages Pods. The kubelet on each node watches for Pods assigned to it and manages containers. It’s loops all the way down.
The key insight isn’t the loop itself - it’s what the loop implies about failure handling. Instead of trying to execute a perfect sequence of operations (create this, then that, handle this error, retry that), you just keep nudging reality toward the goal. If something fails, the next loop iteration will try again. If state drifts, the loop will notice and fix it. You don’t need to enumerate every failure mode because the reconciliation handles them all the same way: notice the gap, try to close it.
I covered the broader pattern in my post on desired state systems - Kubernetes is just the purest software implementation of an idea that shows up everywhere from thermostats to organizational leadership. But Kubernetes specifically taught me some things about how reconciliation works in practice that the abstract pattern doesn’t capture.
Self-healing as an emergent property
The first time I killed a pod to test recovery, it felt like magic. The pod disappeared and a new one appeared seconds later, without anyone doing anything. But it’s not magic - it’s just the ReplicaSet controller noticing that actual count (2) doesn’t match desired count (3) and creating a pod to close the gap.
This compounds across controllers. A node goes down? The node controller marks it as NotReady. The pod eviction controller notices pods on that node aren’t running. The ReplicaSet controller notices the pod count is wrong and creates replacements. The scheduler picks new nodes for them. The kubelets on those nodes pull images and start containers. No single component orchestrated this recovery. Each controller just ran its loop and responded to the gap it saw.
This is enormously powerful, but it’s also what makes debugging Kubernetes hard. When something goes wrong, there’s no single execution path to follow. No stack trace that shows you cause and effect. Instead, you’re reading events across multiple controllers, piecing together which loop saw what gap and took what action. The system works by design, but when it doesn’t, the indirection makes root-cause analysis a real pain.
Eventual consistency in practice
When you kubectl apply a change, you’re not waiting for it to complete. You’re declaring intent. The API server accepts your declaration, stores it, and returns. The actual work happens asynchronously, driven by controllers that might not even run for a few seconds.
This means Kubernetes is eventually consistent in a very real sense. There’s a window between when you declare a change and when reality matches. During a rolling deployment, old and new pods coexist. During a config change, some pods have the new config and some have the old. The system is always converging, but it’s rarely fully converged.
In practice, this usually doesn’t matter. Rolling deployments are designed to work with mixed versions. Health checks gate traffic to pods that are ready. The eventual consistency is managed.
Where it bites you is when you depend on ordering or immediacy. “Deploy this change, then run this migration, then flip this flag” - that kind of sequential logic fights the reconciliation model. Each step is eventually consistent, and you can’t assume one completes before the next starts unless you explicitly wait for it. I’ve seen teams build elaborate scripts with kubectl wait and polling loops on top of Kubernetes, essentially building their own synchronization layer over an intentionally async system.
The operator pattern
The smartest thing Kubernetes did architecturally was making the reconciliation loop extensible. Custom Resource Definitions let you define new desired-state declarations. Operators implement custom controllers that reconcile those declarations.
A database operator, for example, might define a PostgresCluster resource. You declare “I want a 3-node Postgres cluster with streaming replication.” The operator’s controller loop watches for that resource and reconciles: are there 3 nodes? Is replication configured? Is the primary healthy? If any of those answers is no, it takes action. The same pattern that manages pods now manages database clusters, message queues, monitoring stacks - anything you can express as desired state.
This works because the pattern is general enough to fit almost anything. If you can answer “what is the current state?” and “what should the state be?” and “what action closes the gap?”, you can write a controller. The hardest part, in my experience, is getting the state observation right. A controller that misreads current state will “fix” things that aren’t broken or miss things that are. The reconciler is only as good as its view of reality.
What Kubernetes taught me
Before Kubernetes, I thought about system reliability in terms of preventing bad states. Don’t let the wrong thing happen. Validate inputs, handle errors, make operations atomic. Kubernetes taught me a different model: bad states are inevitable, so build systems that detect and correct them. Don’t prevent drift - fix it continuously.
It’s the same philosophy behind CRDTs (design for convergence), behind AP systems (stay available, reconcile later), behind eventual consistency in general. You give up the certainty of “this action completed successfully” and accept “the system will converge toward the right state.” Coming from synchronous request-response systems, that felt wrong at first. But it’s more honest about how things actually work. Things fail. State drifts. The question isn’t whether your system will be in the wrong state - it will - but how quickly it notices.