Conflict-Free Replicated Data Types

Most distributed systems problems boil down to: multiple nodes need to update shared state, and they need to agree on the result. The standard answer is coordination - locks, consensus, leader election. Get everyone to agree before making changes.

CRDTs ask a different question: what if we designed the data structure so agreement is automatic? What if any node could make changes independently, and merging always produces the same result, regardless of order or timing?

The math that makes it work

If your merge function is commutative (order doesn’t matter), associative (grouping doesn’t matter), and idempotent (merging twice is the same as once), then replicas will always converge. No coordination. No conflict resolution. Just algebra.

Marc Shapiro and others formalized this in their 2011 paper using join semilattices - for any two states, there’s always a deterministic “least upper bound.” The constraint limits what data structures you can build, but within those constraints, you get strong eventual consistency without any coordination overhead.

“Without coordination overhead” is doing real work in that sentence. In a world where the CAP theorem forces you to choose between consistency and availability during partitions, CRDTs give you a genuine third option: both, as long as you accept eventual rather than immediate consistency. That’s a better deal than it sounds.

What you can build

The basic CRDT types look almost too simple. A grow-only counter where each node tracks its own count and the total is the sum? That barely feels like a data structure. But the simplicity is the point - each design decision reveals something about what makes the math work and what makes it break.

A G-Counter only goes up. Each node has its own slot. To increment, bump your slot. To merge, take the max of each slot. Max is commutative, associative, idempotent - it satisfies all three properties. The constraint (no decrementing) is what buys you the guarantee.

Want to count down too? A PN-Counter uses two G-Counters - one for increments, one for decrements. The value is the difference. This works, but notice what happened: to support one new operation, you doubled the state. That’s the CRDT tax. Every relaxation of constraints costs you something.

An LWW-Register (last-writer-wins) stores a value with a timestamp. Higher timestamp wins on merge. It works for any value type, which makes it flexible. But concurrent writes don’t get merged - one gets silently dropped. If two users update a profile name at the same time, one wins and the other’s change disappears. Nobody gets an error. Nobody gets notified. The data just… picks one. For a lot of applications this is fine. For some it’s a subtle source of lost updates that are incredibly hard to debug.

The OR-Set (observed-remove set) is where things get really interesting, because add and remove don’t naturally commute. If node A adds element X and node B concurrently removes it, what should the result be? OR-Set resolves this with unique tags on each addition. A remove only removes tags it has observed. So concurrent add and remove? The add survives. The bias is toward keeping things, which is usually the right default - losing data silently is worse than having extra data.

The tombstone problem follows directly from this design. When you remove an element, you can’t just delete the entry - you need to keep a record of the removal (a tombstone), because a future merge might try to bring back an element you’ve already removed. Without the tombstone, you’d have no evidence it was ever deleted. In a high-churn set with lots of adds and removes, these tombstones accumulate. Garbage collecting them safely - figuring out when every replica has seen the removal - requires coordination, which is exactly the thing CRDTs are supposed to avoid. There’s no free lunch.

Two ways to replicate

Operation-based CRDTs broadcast operations: “increment by 1,” “add element X.” Operations need to be commutative. The network needs to deliver each operation at least once, but order doesn’t matter. Small messages, but you need reliable delivery or idempotent operations.

State-based CRDTs periodically ship their entire state to other replicas, which merge it in. Bigger messages, but much more forgiving about the network - duplicate deliveries, lost messages, whatever. Merge is idempotent, so you just keep sending until everyone’s caught up.

In practice most implementations use delta-state CRDTs - ship only the changes since the last sync. Small messages, robust delivery, best of both. This is what you’ll see in production systems.

Where they show up

Collaborative editing is the application everyone reaches for, and it’s a good one. Yjs and Automerge implement text editing as CRDTs - every keystroke is a local operation that eventually merges with everyone else’s keystrokes. Two people editing the same paragraph converge without coordination. Google Docs uses something similar (likely Operational Transformation, an older approach to the same problem, but the spirit is the same).

Redis CRDB uses CRDTs for active-active geo-replication. Write to Redis in Virginia and Redis in Frankfurt simultaneously, and the data converges. Riak was one of the first databases to expose CRDTs directly to developers as native data types.

The connection to the broader distributed systems landscape is direct. CRDTs are what you reach for when the synchronization overhead of consensus is too expensive and you can design your data model around commutative operations. They’re the AP answer to the CAP dilemma: stay available, accept writes everywhere, converge later.

The limits

The core limitation is structural: not everything can be a CRDT. Counters, sets, registers, text sequences - these have natural CRDT representations because the operations can be made to commute. But anything with cross-replica integrity constraints - “this username must be unique,” “this account balance can’t go negative” - fights the model. Enforcing those constraints requires checking state on other replicas, which requires coordination, which is the thing you’re trying to avoid.

There’s a deeper design tension too. CRDTs push conflict resolution into the data structure, which means you’re making resolution decisions at design time rather than at runtime. An OR-Set always biases toward addition. An LWW-Register always takes the latest timestamp. These are reasonable defaults, but they’re baked in. If your application needs context-dependent conflict resolution - “merge these text edits but reject this concurrent price change” - CRDTs at the data structure level might not be the right abstraction.

What I like about CRDTs is the instinct behind them. Instead of fighting the constraints of distributed systems - no global clock, unreliable networks, independent failures - they work with them. Accept that coordination is expensive. Design data structures that don’t need it. It’s the same move Lamport made with logical clocks: the constraints aren’t obstacles to engineer around, they’re the starting point for a different kind of solution.