CRDT Basics for Collaboration

TL;DR / CRDTs are data structures that allow multiple replicas to be updated independently and concurrently, then merged automatically into a consistent state without coordination or conflict resolution logic.

How It Works

 ┌───────────────┐
 │   Replica A   │
 │  insert('x')  │─────┐
 │               │     │
 └───────────────┘     │
                       │
                       │
 ┌───────────────┐     │     ┌────────────┐          ┌────────────┐
 │   Replica B   │     │     │ Automatic  │          │ Converged  │
 │ delete(pos=1) │─────┌────→│   Merge    │─────────→│   State    │
 │               │     │     │            │          │            │
 └───────────────┘     │     └────────────┘          └────────────┘
                       │
                       │
 ┌───────────────┐     │
 │   Replica C   │     │
 │  insert('y')  │─────┘
 │               │
 └───────────────┘

Edit diagram

Conflict-free Replicated Data Types come in two flavors: state-based (CvRDTs) and operation-based (CmRDTs). State-based CRDTs synchronize by shipping their entire state and merging via a function that must be commutative, associative, and idempotent -- forming a join-semilattice. Operation-based CRDTs ship individual operations that must be commutative (and the delivery layer must guarantee exactly-once, causal delivery). Both variants guarantee strong eventual consistency: any two replicas that have received the same set of updates will be in identical states, with zero coordination required.

The simplest CRDT is the G-Counter (grow-only counter). Each replica maintains its own count. To increment, a replica bumps its local value. To merge, take the element-wise maximum across all replica entries. The global count is the sum. A PN-Counter extends this with a second G-Counter for decrements. These are the building blocks -- Riak, Redis CRDT, and Akka all implement counter CRDTs.

G-Sets (grow-only sets) merge via union -- trivially conflict-free since elements can only be added. The challenge arises with removal. A 2P-Set (two-phase set) maintains an add-set and a remove-set (tombstone set), but once removed an element can never be re-added. The OR-Set (observed-remove set) solves this by tagging each addition with a unique identifier. Removal only removes observed tags, so a concurrent add with a different tag survives the remove. This is the set variant most collaborative applications actually use.

For collaborative text editing, sequence CRDTs are the critical data structure. Yjs uses a variant called YATA, while Automerge uses an RGA-like structure. The key insight is assigning each character a globally unique, immutable position identifier that establishes a total order. When two users insert at the same position concurrently, the identifier comparison (typically incorporating replica ID as a tiebreaker) deterministically resolves the ordering without any coordination.

LWW-Register (last-writer-wins register) is the simplest approach for single values -- attach a timestamp (logical or physical) and take the value with the highest timestamp on merge. LWW-Map extends this per-field. While technically a CRDT, LWW semantics can lose concurrent updates silently, making it inappropriate when both concurrent values carry user intent.

Performance considerations are real. Sequence CRDTs for text editing carry metadata proportional to the total number of operations ever performed, not the current document size. A document that has seen 100,000 insertions and 99,000 deletions still carries metadata for all 100,000 positions (tombstones). Yjs mitigates this through run-length encoding of consecutive operations and periodic garbage collection when all replicas are known to have converged past a certain point.

The network layer matters enormously. Operation-based CRDTs require causal delivery ordering -- operation B that depends on operation A must not arrive before A. This is typically achieved through vector clocks or Lamport timestamps attached to each operation. State-based CRDTs are more tolerant of network conditions since the merge function is idempotent -- receiving the same state twice is harmless.

Libraries like Yjs, Automerge, and Diamond Types bring CRDTs to JavaScript. Yjs is particularly popular for real-time collaboration, powering editors in production at scale. It provides shared types (Y.Array, Y.Map, Y.Text) that can be bound to UI frameworks and synced over WebSocket, WebRTC, or any transport.

Gotchas

Tombstones cause unbounded memory growth -- deleted elements leave metadata behind; without compaction strategies, long-lived documents accumulate significant overhead
CRDTs guarantee convergence, not user intent -- two users concurrently setting a field to different values will converge, but one value is silently lost; the data structure cannot know which value the users actually wanted
Causal delivery is not optional for op-based CRDTs -- delivering operations out of causal order produces incorrect state; your transport layer must enforce this (vector clocks, not just sequence numbers)
Not everything is a good fit for CRDTs -- operations with global invariants (like unique constraints or balance checks) fundamentally require coordination and cannot be made conflict-free
Merging large state-based CRDTs is expensive -- shipping and merging full state on every sync is bandwidth-heavy; delta-state CRDTs (a hybrid approach) ship only the changes since the last known sync point

Unpacked/Real-time & Streams/CRDT Basics for Collaboration

How It Works

Gotchas