Logical Clocks in Distributed Systems

In 1978, Leslie Lamport published a paper called “Time, Clocks, and the Ordering of Events in a Distributed System.” It’s one of those papers that reframes a problem so cleanly that once you’ve read it, you can’t think about it the old way anymore.

The insight: stop trying to answer “what time did this happen?” and instead answer “could this event have caused that event?” Physical time is unreliable in a distributed system - clocks drift, networks have variable latency, simultaneity is relative. But causality is a property of the system itself. If A sends a message that B receives, A happened before B. That relationship holds regardless of what any clock says.

Lamport clocks

The mechanism Lamport proposed is almost embarrassingly simple. Each node keeps a counter. Three rules:

  1. Before any local event, increment the counter.
  2. When sending a message, include the counter value.
  3. When receiving a message, set the counter to max(local, received) + 1.

That’s it. Here’s what it looks like:

Node A: 1 ——→ 2 ——→ 3 ——→ 4
                ↘ (msg, ts=2)
Node B: 1 ————————→ 3 ——→ 4
                   (max(1,2)+1)

Node A does some work (counter goes to 1, then 2), sends a message to B with timestamp 2. B’s local counter is at 1, so it sets its counter to max(1, 2) + 1 = 3. From this point on, B’s counter reflects that it has seen A’s state as of timestamp 2.

The guarantee: if event A causally precedes event B (A happened before B), then A’s timestamp is less than B’s. Always. But the converse isn’t true - if A’s timestamp is less than B’s, A might have happened before B, or they might be unrelated. The clock captures causality in one direction only.

This is a subtle but important limitation. Lamport clocks give you a partial order - enough to say “A definitely happened before B” when there’s a causal chain, but not enough to say “A and B are concurrent.” Two events with different timestamps might be causally related or might not be. You can’t tell from the timestamps alone.

For a lot of systems, this is perfectly adequate. If all you need is a consistent ordering for a replicated log, Lamport timestamps plus a tiebreaker (like node ID) give you a total order that’s cheap to maintain and respects causality. This is why they show up in consensus protocols and event sourcing systems.

Vector clocks

But what if you need to detect concurrency? What if you need to know, definitively, whether two events are causally related or independent?

Vector clocks solve this. Instead of one counter, each node maintains a vector - one counter per node in the system. When a node does local work, it increments its own entry. When sending a message, it includes the whole vector. On receiving, it takes the component-wise maximum.

Node A does work:     [2, 0, 0]
Node B does work:     [0, 2, 0]
Node C does work:     [0, 0, 1]

A sends to B:
B receives, merges:   [2, 3, 0]  (max of [2,0,0] and [0,2,0], then increment B's entry)

Now compare:
  A [2, 0, 0] vs C [0, 0, 1]
  Neither dominates → A and C are concurrent

  A [2, 0, 0] vs B [2, 3, 0]
  A ≤ B in all components → A happened before B

The rule is clean: if every component of vector V1 is less than or equal to the corresponding component of V2, then V1 happened before V2. If neither vector dominates, the events are concurrent.

This is strictly more informative than Lamport clocks. You can now distinguish “happened before” from “concurrent” - something Lamport clocks fundamentally can’t do. Amazon’s DynamoDB used vector clocks (in its original Dynamo paper incarnation) to detect conflicting writes: if two writes have concurrent vector clocks, the system knows there’s a conflict and can surface it to the application for resolution.

The cost is O(n) space per event, where n is the number of nodes. For a cluster of 5 database replicas, that’s fine. For a system with thousands of nodes, it becomes a real problem. Every message carries a growing vector, every comparison is O(n), and if nodes join and leave, you need to manage the vector size. This is why systems at scale often fall back to Lamport clocks with application-level conflict resolution, or use techniques like dotted version vectors that compress the representation.

Hybrid logical clocks

Hybrid logical clocks (HLCs) are the pragmatic middle ground, and they’re what most modern distributed databases actually use.

The problem they solve: Lamport clocks give you causality but no connection to wall-clock time. Vector clocks give you precise causality detection but don’t scale. And in practice, you often want both causality ordering and timestamps that mean something to humans. “This transaction committed at 2:03:47 PM” is useful for debugging, auditing, and reasoning about your system.

An HLC has two components: a physical timestamp (wall-clock time) and a logical counter. The physical part is always as close to real time as possible. The logical counter breaks ties when the physical timestamps are identical or when a received message has a future physical timestamp (because clocks skew).

HLC = (physical_time, logical_counter)

Local event:
  pt = max(local_pt, wall_clock)
  if pt == local_pt: lc = local_lc + 1
  else: lc = 0

Receive message (msg_pt, msg_lc):
  pt = max(local_pt, msg_pt, wall_clock)
  if pt == local_pt == msg_pt: lc = max(local_lc, msg_lc) + 1
  elif pt == local_pt: lc = local_lc + 1
  elif pt == msg_pt: lc = msg_lc + 1
  else: lc = 0

The rules look fiddly, but the effect is clean: the HLC always advances forward, it respects causality (like Lamport clocks), and it stays close to physical time. When clock skew is zero, the logical counter stays at zero and you’re just using wall-clock time. When there is skew, the logical counter absorbs it.

CockroachDB uses HLCs as its primary timestamp mechanism. Every transaction gets an HLC timestamp, and those timestamps drive the MVCC (multi-version concurrency control) system. This lets CockroachDB offer serializable isolation across a distributed cluster without Google’s TrueTime infrastructure - the HLCs provide causal ordering and the physical component keeps timestamps meaningful for things like AS OF SYSTEM TIME queries.

Spanner took a different path with TrueTime - GPS and atomic clocks that give bounded uncertainty intervals. Rather than tracking causality through message passing, Spanner just makes the physical clocks good enough and waits out the uncertainty. It’s arguably more elegant, but requires hardware most teams don’t have access to.

Which clock for which situation

The choice usually comes down to what you need and what you can afford:

Lamport clocks when you need a consistent ordering and causality detection isn’t critical. Cheap, simple, scale to any number of nodes. Good for event logs, message queues, anything where a total order (with arbitrary tiebreaking) is sufficient.

Vector clocks when you need to detect concurrent operations - specifically, when your application needs to know about conflicts rather than silently resolving them. Worth the O(n) cost when n is small and correctness matters more than efficiency.

HLCs when you’re building a database or any system where you need both causal ordering and human-meaningful timestamps. The practical default for modern distributed systems that need to reason about time.

What I like about this whole progression is that each step is a direct response to the limitations of the last. Lamport saw that physical time was the wrong abstraction and invented logical time. Others saw that Lamport clocks couldn’t detect concurrency and added dimensions. HLC designers saw that purely logical time loses practical utility and blended physical time back in. Nobody started from scratch. Each solution preserved what worked before and solved exactly one limitation.

There’s something in there about how good ideas evolve in general. Not by throwing out the previous approach and redesigning from first principles, but by identifying the specific point where it breaks and extending it just enough. Lamport clocks are still in there, underneath vector clocks, underneath HLCs. The layers accumulate. It’s how most robust systems get built - not designed whole, but grown by people who understood exactly where the current thing fell short.