Build Databases from Scratch

Problem framing

A database is not one feature. It is a pipeline for correctness, durability, query speed, and recovery under failure. To build one from scratch, define the write contract first, then layer indexing, replication, background maintenance, and observability. Scale mechanics are covered in storage and partitioning patterns.

Core idea / pattern

Build in layers

Layer	Problem	Pattern	Trade-offs	Failure modes
API and parser	Unclear query contract	Define strict request grammar and validation	Flexibility vs deterministic execution	Ambiguous semantics and hard-to-debug behavior
Durability	Data loss on crash	Write-ahead log before applying state	Latency overhead for fsync	Torn writes, replay gaps, corrupt logs
Storage engine	Slow reads or writes at scale	B+ tree for read locality or LSM for write throughput	Read latency vs write amplification	Compaction storms, fragmentation, cache churn
Coordination	Replica divergence	Quorum and leader election	Stronger consistency vs lower availability	Split brain and stale reads
Background maintenance	Performance decay over time	Compaction, vacuum, reindexing, repair	Steady overhead vs predictable latency	Write stalls and long tail spikes

Choose model by workload

Database type	Primary data model	Storage/index defaults	Best-fit workloads	Main risk
Relational	Tables, keys, joins, constraints	WAL + B+ tree + MVCC	OLTP, billing, inventory, ledgers	Contention under heavy multi-row writes
Non-relational	Key-value, document, wide-column	Partitioning + replication + LSM	High-ingest events, profiles, session state	Cross-entity consistency complexity
Vector	Embedding vectors + metadata filters	ANN index (HNSW or IVF) + metadata store	Semantic search, RAG, recommendation	Recall-latency tuning and stale embeddings

For algorithm mechanics used by these systems, see distributed algorithms.

Architecture diagram

flowchart LR
  Client[Client] --> API[Query API]
  API --> Planner[Parser and Planner]
  Planner --> Txn[Transaction and Concurrency]
  Txn --> WAL[Write-Ahead Log]
  Txn --> Engine[Storage Engine]
  Engine --> Index[Index Structures]
  Engine --> Data[Data Files]
  Engine --> Bg[Compaction or Vacuum]
  Engine --> Repl[Replication Stream]
  Repl --> Replicas[(Replicas)]
  Index --> ANN[Vector ANN Index]
  Index --> BTree[B+ Tree]
  Index --> LSM[LSM Levels]

Step-by-step flow

Relational database build path

Define schema, primary keys, foreign keys, and invariant checks.
Implement WAL append and crash recovery replay.
Build B+ tree indexes for primary and selective secondary access paths.
Add MVCC snapshots and lock manager for concurrent transactions.
Implement optimizer rules for join order, index scans, and predicate pushdown.
Add replication, backup, and point-in-time restore.

sequenceDiagram
  participant App
  participant SQL as SQL Layer
  participant Txn
  participant WAL
  participant BTree
  participant Disk

  App->>SQL: INSERT/UPDATE
  SQL->>Txn: Validate constraints
  Txn->>WAL: Append log record
  WAL-->>Txn: Durable ack
  Txn->>BTree: Update index pages
  BTree->>Disk: Flush dirty pages
  Txn-->>App: Commit success

Non-relational database build path

Pick partition key and replica strategy per entity.
Route writes by partition and append to commit log.
Buffer in memtable and flush immutable SSTables.
Run compaction to merge segments and remove tombstones.
Expose quorum-tunable reads and writes for consistency control.
Run anti-entropy repair between replicas.

flowchart LR
  Client --> Router[Partition Router]
  Router --> Log[Commit Log]
  Log --> Mem[Memtable]
  Mem --> SST[SSTables]
  SST --> Compact[Compaction]
  Router --> Repl[Replica Set]
  Repl --> Read[Quorum Read]
  Repl --> Write[Quorum Write]

Vector database build path

Generate embeddings and persist source metadata.
Normalize vectors and choose ANN index strategy.
Build index partitions and shard by tenant or semantic domain.
At query time, encode text into vector and search top-k neighbors.
Apply metadata filters and rerank candidates for quality.
Rebuild or incrementally update indexes as embeddings drift.

sequenceDiagram
  participant User
  participant API
  participant Embed as Embedding Model
  participant Index as ANN Index
  participant Meta as Metadata Store

  User->>API: Semantic query
  API->>Embed: Encode text to vector
  Embed-->>API: Query vector
  API->>Index: kNN search
  Index-->>API: Candidate IDs
  API->>Meta: Filter and fetch payload
  Meta-->>API: Matched documents
  API-->>User: Ranked results

Failure modes

Relational: lock contention and long transactions stall hot tables.
Relational: checkpoint lag increases crash recovery time.
Non-relational: bad partition keys create hot shards and uneven storage growth.
Non-relational: compaction backlog increases read amplification and latency variance.
Vector: stale embeddings degrade relevance even when latency is stable.
Vector: ANN parameter mis-tuning causes low recall or excessive memory usage.
All types: untested backup restore paths produce silent data loss risk.

Trade-offs

Choice	Benefit	Cost
Relational constraints everywhere	Strong correctness and clear invariants	Higher write contention and schema migration effort
Wide denormalization in NoSQL	Fast reads at high scale	Duplicate data and reconciliation logic
Aggressive vector index compression	Lower memory footprint	Potential recall drop
Strong quorum settings	Better read-after-write guarantees	Higher latency and lower partition tolerance
Frequent background maintenance	Stable read performance	Predictable but persistent resource overhead

Real-world usage

Product need	Recommended primary store	Why
Orders, payments, and account balance	Relational database	Strict constraints and transactional integrity
User activity feed and high-volume events	Non-relational LSM-based store	High write throughput and horizontal partitioning
Semantic document search	Vector database plus metadata store	Fast nearest-neighbor retrieval with filtering
Hybrid retrieval applications	Relational metadata + vector index + cache	Combines correctness with semantic relevance