Time Is Wrong Everywhere All at Once

June 04, 2026 · 13 min read

The previous posts in this series covered how humans agree on time, how clocks count it, how physics bends it, whether you can travel through it, and why your brain gets it wrong. This post asks a more mundane but equally maddening question: how do computers agree on what time it is? The answer is that they don’t – not really – and the entire field of distributed systems is, in a sense, the study of what to do about that.

The fundamental problem

Two computers cannot agree on the time.

This sounds like an engineering problem with an engineering solution: just synchronise the clocks. And we do. NTP (Network Time Protocol) has been synchronising clocks across the internet since 1985. A well-configured NTP client can keep its clock within a few milliseconds of UTC. That’s good enough for log files, cron jobs, and displaying the time on your screen.

It’s not good enough for answering the question: “did event A happen before event B?”

Take two servers, Alice and Bob. Alice receives an order at 14:00:00.003 by her clock. Bob processes a cancellation at 14:00:00.001 by his clock. Did the cancellation arrive before the order? If Alice’s clock is 5 milliseconds ahead of Bob’s, the order actually came first – but the timestamps say otherwise. Every distributed system that uses wall-clock timestamps to determine ordering is vulnerable to this. And it’s not a theoretical concern. It’s the kind of bug that causes duplicate charges, lost messages, and inventory discrepancies that take weeks to track down.

The problem is fundamental, not technical. Even if you had perfect clocks (you don’t – How Clocks Work explained why), the speed of light imposes an irreducible minimum delay on communication between machines. A signal from London to Sydney takes at least 50 milliseconds. During those 50 milliseconds, events can happen at both ends, and neither machine can know about the other’s events until the signal arrives. There is no way – not with better cables, not with faster processors, not with atomic clocks on every server – to create a globally consistent “now” across a distributed system. Relativity says the same thing about the universe. Computer science says it about networks.

Lamport clocks: forgetting what time it is

In 1978, Leslie Lamport published “Time, Clocks, and the Ordering of Events in a Distributed System.” It remains one of the most cited papers in computer science, and its core insight is deceptively simple: you don’t need to know what time it is. You only need to know what happened before what.

A Lamport clock is not a clock in the physical sense. It’s a counter. Every process maintains its own counter. When a process does something, it increments its counter. When it sends a message, it attaches its current counter value. When it receives a message, it sets its counter to the maximum of its own counter and the received value, then increments.

That’s it. No NTP. No atomic clocks. No synchronisation at all. The counter doesn’t represent a time. It represents a position in a causal sequence.

The rule is: if event A causally precedes event B (A happened before B, and B could have been influenced by A), then A’s counter value is less than B’s. Lamport called this the “happened-before” relation. It’s a partial order – not every pair of events is comparable. If Alice does something and Bob does something at the same time with no communication between them, neither “happened before” the other. They’re concurrent. And that’s fine. The system doesn’t need to order them, because they couldn’t have influenced each other.

It’s like a family tree. Your grandmother happened before you – there’s a clear causal chain. Your cousin in another country did things today that you know nothing about. Neither of you happened “before” the other. You’re concurrent. A family tree doesn’t need to put all the cousins in order. It only needs to know who descended from whom.

Lamport clocks capture exactly this: causality, not chronology. They tell you “A could have caused B” or “A and B are independent.” They don’t tell you which happened first on a wall clock, because that question, in a distributed system, often has no meaningful answer.

Vector clocks: who knew what when

Lamport clocks have a limitation: if A’s counter is less than B’s, you know A might have caused B, but you can’t be sure. The ordering is consistent with causality but doesn’t perfectly capture it. In 1988, Colin Fidge and Friedemann Mattern independently invented vector clocks, which fix this.

A vector clock is an array of counters, one per process. When process Alice does something, she increments her entry. When she sends a message, she attaches the entire vector. When Bob receives it, he takes the element-wise maximum of his vector and Alice’s, then increments his own entry.

The result: you can look at two vector timestamps and determine not just whether one might have caused the other, but whether they’re definitely concurrent. If every entry in A’s vector is less than or equal to the corresponding entry in B’s vector, then A happened before B. If some entries are greater and some are less, they’re concurrent – neither caused the other.

It’s like a group chat where everyone keeps a diary. Each diary entry notes what the writer did and the last thing they heard from everyone else. If Alice’s diary says she’s seen Bob’s message #5 and Carol’s message #3, and Bob’s diary says he’s seen Alice’s message #2 and Carol’s message #4, you can reconstruct exactly who knew what when. Two entries are concurrent if neither person had seen the other’s latest update.

Vector clocks are used in real systems. Amazon’s Dynamo database (the foundation of DynamoDB) used them to detect conflicting writes. Riak, a distributed key-value store, used them for the same purpose. They’re more expensive than Lamport clocks – the vector grows with the number of processes – but they give you something Lamport clocks can’t: a definitive answer about concurrency.

The CAP theorem and the cost of consistency

In 2000, Eric Brewer proposed (and in 2002, Seth Gilbert and Nancy Lynch proved) the CAP theorem: a distributed system can provide at most two of three guarantees:

  • Consistency: every read receives the most recent write.
  • Availability: every request receives a response.
  • Partition tolerance: the system continues to operate even if network messages between nodes are lost or delayed.

Since network partitions happen in real systems (cables get cut, switches fail, datacentres lose connectivity), you effectively have to choose between consistency and availability. You can’t have both when the network is broken.

This is a theorem about time in disguise. “Consistency” means “every node agrees on the current state.” “Current” means “right now.” But “right now” across multiple machines separated by a network is the exact problem we started with. The CAP theorem is, at its heart, a formal proof that the speed of light makes global agreement expensive.

CP systems (consistent, partition-tolerant) sacrifice availability: if the system can’t guarantee that all nodes agree, it refuses to answer rather than give a possibly-stale response. Traditional relational databases in a distributed setting often work this way. Your query might time out, but it won’t give you wrong data.

AP systems (available, partition-tolerant) sacrifice consistency: every node answers every request, even if it means some nodes are serving stale data. Eventually, when the partition heals, the nodes reconcile. This is “eventual consistency” – the system will converge to the correct state, but there’s a window where different nodes disagree. DynamoDB, Cassandra, and most eventually-consistent NoSQL databases work this way. Your query always gets an answer, but it might not be the latest answer.

The choice between CP and AP is a choice about how to handle the impossibility of shared time. Do you pause and wait for agreement (CP), or do you keep going and sort it out later (AP)?

Google Spanner: buying time with atomic clocks

In 2012, Google published a paper describing Spanner, a globally distributed database that appears to violate the CAP theorem. It offers strong consistency (every read sees the most recent write) across datacentres on different continents, with high availability. How?

The trick is hardware. Google put GPS receivers and atomic clocks in every datacentre. Not NTP. Not “synchronise to a time server.” Actual atomic clocks – caesium and rubidium oscillators – sitting in the server racks, cross-checked against GPS signals. This gives each datacentre a clock that’s accurate to within about 7 milliseconds of true time, with known uncertainty bounds.

Spanner uses an API called TrueTime, which doesn’t return a single timestamp. It returns an interval: “the current time is definitely between earliest and latest.” The interval is typically a few milliseconds wide. Every transaction gets a timestamp, and the system guarantees that if transaction A’s timestamp is before transaction B’s, then A actually happened before B in real time. If the system isn’t sure about the ordering – if the intervals overlap – it waits until the uncertainty resolves. This is called “commit wait,” and it typically adds a few milliseconds to each transaction.

Google is buying consistency with atomic clocks and patience. The speed of light still prevents perfect synchronisation, but by bounding the uncertainty and waiting it out, Spanner creates the illusion of a single global timeline. It’s not cheap – the atomic clocks, the GPS receivers, the global network, the engineering team that maintains all of it – but it works. It’s been running Google’s advertising system (among other things) since 2012.

It’s like a courtroom. Two witnesses disagree about whether the red car or the blue car arrived first. In most distributed systems, you’d have to choose: either stop the trial until you can resolve the disagreement (CP), or let both witnesses testify and live with the inconsistency (AP). Spanner’s approach is different: give both witnesses a clock so precise that their testimony overlaps only slightly, then pause just long enough for the overlap to resolve. The trial continues. The record is consistent. It costs you a good clock and a little patience.

Conflict resolution: when time isn’t enough

Even with perfect clocks, distributed systems face a problem that time alone can’t solve: conflicting writes. Two users edit the same document at the same time. Two processes update the same database row. Two nodes accept contradicting requests during a network partition. What wins?

Last-writer-wins (LWW) is the simplest policy: whichever write has the latest timestamp wins. It’s used widely – Cassandra defaults to it. It’s simple, deterministic, and almost always wrong. If Alice saves a document at 14:00:00.003 and Bob saves a different version at 14:00:00.005, Bob’s version wins and Alice’s changes vanish. Nobody is notified. The data loss is silent. If the clocks are even slightly wrong, the “wrong” write wins. LWW trades correctness for simplicity, and in many cases the trade is terrible.

CRDTs (Conflict-Free Replicated Data Types) take a fundamentally different approach. Instead of asking “which write happened last?”, they design the data structure so that all writes can be merged without conflict. A CRDT counter, for instance, tracks each node’s increments separately and sums them on read. Two nodes can increment independently, with no communication, and when they eventually sync, the counter is correct. No timestamps needed. No conflict resolution needed. The data type’s mathematical properties guarantee convergence.

CRDTs work for counters, sets, registers, and certain kinds of text editing (Google Docs uses a CRDT-like approach for collaborative editing). They don’t work for everything – some operations are inherently conflicting (two users setting the same field to different values), and CRDTs can only merge what the data structure’s rules allow.

Operational transformation (OT) is the older approach to the same problem, used by Google Docs before CRDTs and still used in many collaborative editors. OT transforms each operation against concurrent operations to produce a consistent result. If Alice inserts a character at position 5 and Bob deletes a character at position 3, the system transforms Alice’s insertion to account for Bob’s deletion: Alice’s insert moves to position 4. The result is the same regardless of the order the operations arrive.

All of these techniques exist because time – even perfectly synchronised time – isn’t enough to resolve concurrent events. When two things happen at the same time, you need a policy, not a clock.

Logical time in practice

The theoretical framework of Lamport clocks and vector clocks shows up in practical systems, often under different names:

Version vectors in distributed databases (Riak, Dynamo) are vector clocks by another name. Each node maintains a counter, and the vectors are compared to detect conflicts. When a conflict is detected, the system either merges automatically (if it can) or presents both versions to the application for resolution.

Sequence numbers in consensus protocols like Raft and Paxos are, at their core, Lamport clocks. Each proposal gets a monotonically increasing number. The ordering of proposals is determined by these numbers, not by wall-clock time. This is why consensus protocols work even when clocks disagree: they never consult a clock.

Log-structured systems – Kafka, event sourcing architectures, blockchain – use an append-only log as their source of truth. The position in the log is the logical time. Event #4,721 happened before event #4,722 because 4,721 < 4,722. No timestamps needed. The log imposes a total order. This is Lamport’s insight, made concrete.

Even Git uses a form of logical time. A commit’s position in the DAG (directed acyclic graph) determines its causal relationship to other commits. Commit A is an ancestor of commit B – A happened before B. Two commits on different branches are concurrent. Git doesn’t care when they were created (the author date is just metadata). It cares about the graph structure. Causality, not chronology.

The speed of light is a systems problem

Every problem in this post traces back to the same root cause: information takes time to travel. Light from London to Sydney: 50 milliseconds. A packet across a datacentre: maybe 0.5 milliseconds. A signal between two chips on the same board: nanoseconds. The delays are different, but they’re never zero, and as long as they’re not zero, two observers can’t agree on “now.”

The relativity posts made this point about the universe. The speed of light means there’s no universal “now.” Simultaneity is relative. The block universe might be the correct picture: everything already exists, and our experience of “the present” is local and subjective.

Distributed systems live in the same reality, just at a smaller scale. The speed of light in a fibre optic cable (about two-thirds the speed of light in vacuum) means that two servers in different datacentres can never share a “now.” They can get close – Google’s TrueTime gets within milliseconds – but “close” and “exact” are different things, and the gap between them is where bugs live.

Leslie Lamport’s great insight was that you don’t have to solve this problem. You can sidestep it. Stop asking “what time is it?” and start asking “what happened before what?” Stop synchronising clocks and start tracking causality. The universe can’t agree on “now” either. It gets along fine by tracking the causal structure of events – the light cones that determine what can influence what.

Distributed computing reinvented the same solution, decades later, for the same reason. It turns out that the question we started this series with – “what time is it?” – is just as hard for computers as it is for physicists. And the answer, in both domains, is the same: it depends on who’s asking, and what they need to know.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.