Distributed Algorithms

Problem framing

Distributed databases need routing stability, fast writes, bounded memory, and safe coordination under failure. No single algorithm solves all of this. Production systems combine probabilistic data structures, storage layouts, and consensus primitives to balance correctness, latency, and cost.

This page explains every algorithm listed in ../ddia-learning-app/ddia_app/algorithms and how they compose into end-to-end data paths.

Core idea / pattern

Algorithm Primary role Typical placement Cross-link
Consistent Hashing Stable key-to-node routing with minimal remapping Shard routers, partition maps Storage architecture
Vector Clocks Causality tracking for concurrent updates Multi-writer replicas and conflict resolution Consistency models
Quorum Replication Read/write consistency tuning via N, R, W Replicated key-value and document stores Replication patterns
LSM Tree Write-optimized storage with compaction High-ingest NoSQL engines Database internals
Raft Election Leader election by majority vote Metadata services and control planes Coordination and commit
Bloom Filter Probabilistic membership test to avoid wasted reads LSM read path and cache admission Storage engine internals
HyperLogLog Approximate distinct counting with fixed memory Cardinality estimation and analytics Stream and batch analytics

Architecture diagram

flowchart LR
  Client[Client] --> Router[Consistent Hash Router]
  Router --> ReplicaA[Replica Set A]
  Router --> ReplicaB[Replica Set B]
  ReplicaA --> Quorum[Quorum Read and Write]
  ReplicaB --> Quorum
  Quorum --> LSM[LSM Storage Engine]
  LSM --> Bloom[Bloom Filter]
  LSM --> HLL[HyperLogLog Counters]
  Quorum --> VC[Vector Clock Metadata]
  Control[Control Plane] --> Raft[Raft Leader Election]
  Raft --> Router
      

Step-by-step flow

  1. Route the key with consistent hashing so scaling events remap only a small key subset.
  2. Write to a replica subset and require quorum acknowledgments for target consistency.
  3. Persist writes into an LSM memtable, then flush immutable SSTables and compact in background.
  4. Use Bloom filters during reads to skip SSTables that definitely do not contain the key.
  5. Attach vector clock metadata when concurrent writes are possible across regions.
  6. Use Raft election for metadata leadership so membership and configuration updates stay coordinated.
  7. Track large-scale distinct metrics with HyperLogLog instead of exact sets.

Visual reference: key routing and quorum

sequenceDiagram
  participant C as Client
  participant R as Hash Router
  participant P1 as Replica 1
  participant P2 as Replica 2
  participant P3 as Replica 3

  C->>R: write(key, value)
  R->>P1: route by consistent hash
  R->>P2: replicate
  R->>P3: replicate
  P1-->>R: ack
  P2-->>R: ack
  R-->>C: write success (W quorum reached)
      

Visual reference: LSM read with Bloom filter

flowchart LR
  Read[Read key] --> Mem[Check memtable]
  Mem --> Bloom1[Bloom filter SSTable L0]
  Bloom1 -->|No| Skip1[Skip table]
  Bloom1 -->|Maybe| Probe1[Probe SSTable]
  Probe1 --> Bloom2[Bloom filter SSTable L1]
  Bloom2 -->|No| Skip2[Skip table]
  Bloom2 -->|Maybe| Probe2[Probe SSTable]
  Probe2 --> Result[Return newest value]
      

Visual reference: Raft election

sequenceDiagram
  participant N1 as Node 1
  participant N2 as Node 2
  participant N3 as Node 3
  participant N4 as Node 4
  participant N5 as Node 5

  N1->>N2: RequestVote(term=9)
  N1->>N3: RequestVote(term=9)
  N1->>N4: RequestVote(term=9)
  N1->>N5: RequestVote(term=9)
  N2-->>N1: vote granted
  N3-->>N1: vote granted
  N4-->>N1: vote granted
  N1-->>N1: majority reached, leader elected
      

Algorithm deep dive

Algorithm Problem Pattern Trade-offs Failure modes
Consistent Hashing Node adds/removals causing full key reshuffles Hash ring with virtual nodes and clockwise ownership Low remap cost vs imperfect balance without enough virtual nodes Hot partitions from skewed keys or poor vnode configuration
Vector Clocks Concurrent multi-writer updates with unclear causal order Per-writer counters merged by max and compared by dominance Causality clarity vs metadata growth with writer count Unbounded clock size and frequent conflict branches
Quorum Replication Balancing availability and consistency across replicas Tunable N/R/W with read-repair and anti-entropy Flexible guarantees vs quorum latency and partial failures Stale reads when quorum math is weak or repairs lag
LSM Tree High write rates overwhelm random-write storage Memtable + immutable SSTables + background compaction Excellent ingest vs read/write amplification from compaction Compaction backlog, tombstone buildup, and tail-latency spikes
Raft Election No safe coordinator for metadata and critical decisions Term-based majority voting to elect one leader Strong leadership safety vs election pauses during instability Leader flapping and progress stalls under repeated timeouts
Bloom Filter Too many negative disk lookups on read paths Bitset with multiple hashes for probabilistic membership tests Tiny memory footprint vs false positives Bad sizing causes high false-positive rates and wasted I/O
HyperLogLog Exact distinct counts are too expensive at scale Register-based cardinality estimation from hash leading zeros Fixed memory vs approximate results Using low precision where exact billing-grade counts are required

Failure modes

Trade-offs

Algorithm What it optimizes What it gives up
Consistent Hashing Minimal remapping during rebalancing Perfectly even distribution without tuning
Vector Clocks Causality visibility Metadata size and merge complexity
Quorum Replication Tunable consistency/availability balance Higher latency at stronger quorum levels
LSM Tree High write throughput Compaction cost and read amplification
Raft Election Safe single-leader coordination Majority dependency and election pauses
Bloom Filter Small-memory negative lookups False positives and no deletion in basic form
HyperLogLog Low-memory cardinality estimates Approximate answers instead of exact counts

Real-world usage

Production scenario Algorithm combination Reason
Distributed key-value store Consistent Hashing + Quorum + Vector Clocks Stable routing with tunable consistency and conflict tracking
Write-heavy document store LSM Tree + Bloom Filter + Compaction Sustains ingest while containing read cost
Cluster metadata service Raft Election + majority quorum Keeps leadership and config changes coherent
Large-scale analytics HyperLogLog + exact drill-down on demand Low-cost approximate cardinality at high scale