Build Databases from Scratch

Problem framing

A database is not one feature. It is a pipeline for correctness, durability, query speed, and recovery under failure. To build one from scratch, define the write contract first, then layer indexing, replication, background maintenance, and observability. Scale mechanics are covered in storage and partitioning patterns.

Core idea / pattern

Build in layers

Layer Problem Pattern Trade-offs Failure modes
API and parser Unclear query contract Define strict request grammar and validation Flexibility vs deterministic execution Ambiguous semantics and hard-to-debug behavior
Durability Data loss on crash Write-ahead log before applying state Latency overhead for fsync Torn writes, replay gaps, corrupt logs
Storage engine Slow reads or writes at scale B+ tree for read locality or LSM for write throughput Read latency vs write amplification Compaction storms, fragmentation, cache churn
Coordination Replica divergence Quorum and leader election Stronger consistency vs lower availability Split brain and stale reads
Background maintenance Performance decay over time Compaction, vacuum, reindexing, repair Steady overhead vs predictable latency Write stalls and long tail spikes

Choose model by workload

Database type Primary data model Storage/index defaults Best-fit workloads Main risk
Relational Tables, keys, joins, constraints WAL + B+ tree + MVCC OLTP, billing, inventory, ledgers Contention under heavy multi-row writes
Non-relational Key-value, document, wide-column Partitioning + replication + LSM High-ingest events, profiles, session state Cross-entity consistency complexity
Vector Embedding vectors + metadata filters ANN index (HNSW or IVF) + metadata store Semantic search, RAG, recommendation Recall-latency tuning and stale embeddings

For algorithm mechanics used by these systems, see distributed algorithms.

Architecture diagram

flowchart LR
  Client[Client] --> API[Query API]
  API --> Planner[Parser and Planner]
  Planner --> Txn[Transaction and Concurrency]
  Txn --> WAL[Write-Ahead Log]
  Txn --> Engine[Storage Engine]
  Engine --> Index[Index Structures]
  Engine --> Data[Data Files]
  Engine --> Bg[Compaction or Vacuum]
  Engine --> Repl[Replication Stream]
  Repl --> Replicas[(Replicas)]
  Index --> ANN[Vector ANN Index]
  Index --> BTree[B+ Tree]
  Index --> LSM[LSM Levels]
        

Step-by-step flow

Relational database build path

  1. Define schema, primary keys, foreign keys, and invariant checks.
  2. Implement WAL append and crash recovery replay.
  3. Build B+ tree indexes for primary and selective secondary access paths.
  4. Add MVCC snapshots and lock manager for concurrent transactions.
  5. Implement optimizer rules for join order, index scans, and predicate pushdown.
  6. Add replication, backup, and point-in-time restore.
sequenceDiagram
  participant App
  participant SQL as SQL Layer
  participant Txn
  participant WAL
  participant BTree
  participant Disk

  App->>SQL: INSERT/UPDATE
  SQL->>Txn: Validate constraints
  Txn->>WAL: Append log record
  WAL-->>Txn: Durable ack
  Txn->>BTree: Update index pages
  BTree->>Disk: Flush dirty pages
  Txn-->>App: Commit success
        

Non-relational database build path

  1. Pick partition key and replica strategy per entity.
  2. Route writes by partition and append to commit log.
  3. Buffer in memtable and flush immutable SSTables.
  4. Run compaction to merge segments and remove tombstones.
  5. Expose quorum-tunable reads and writes for consistency control.
  6. Run anti-entropy repair between replicas.
flowchart LR
  Client --> Router[Partition Router]
  Router --> Log[Commit Log]
  Log --> Mem[Memtable]
  Mem --> SST[SSTables]
  SST --> Compact[Compaction]
  Router --> Repl[Replica Set]
  Repl --> Read[Quorum Read]
  Repl --> Write[Quorum Write]
        

Vector database build path

  1. Generate embeddings and persist source metadata.
  2. Normalize vectors and choose ANN index strategy.
  3. Build index partitions and shard by tenant or semantic domain.
  4. At query time, encode text into vector and search top-k neighbors.
  5. Apply metadata filters and rerank candidates for quality.
  6. Rebuild or incrementally update indexes as embeddings drift.
sequenceDiagram
  participant User
  participant API
  participant Embed as Embedding Model
  participant Index as ANN Index
  participant Meta as Metadata Store

  User->>API: Semantic query
  API->>Embed: Encode text to vector
  Embed-->>API: Query vector
  API->>Index: kNN search
  Index-->>API: Candidate IDs
  API->>Meta: Filter and fetch payload
  Meta-->>API: Matched documents
  API-->>User: Ranked results
        

Failure modes

Trade-offs

Choice Benefit Cost
Relational constraints everywhere Strong correctness and clear invariants Higher write contention and schema migration effort
Wide denormalization in NoSQL Fast reads at high scale Duplicate data and reconciliation logic
Aggressive vector index compression Lower memory footprint Potential recall drop
Strong quorum settings Better read-after-write guarantees Higher latency and lower partition tolerance
Frequent background maintenance Stable read performance Predictable but persistent resource overhead

Real-world usage

Product need Recommended primary store Why
Orders, payments, and account balance Relational database Strict constraints and transactional integrity
User activity feed and high-volume events Non-relational LSM-based store High write throughput and horizontal partitioning
Semantic document search Vector database plus metadata store Fast nearest-neighbor retrieval with filtering
Hybrid retrieval applications Relational metadata + vector index + cache Combines correctness with semantic relevance