Build Databases from Scratch
Problem framing
A database is not one feature. It is a pipeline for correctness, durability, query speed, and recovery under failure. To build one from scratch, define the write contract first, then layer indexing, replication, background maintenance, and observability. Scale mechanics are covered in storage and partitioning patterns.
Core idea / pattern
Build in layers
| Layer | Problem | Pattern | Trade-offs | Failure modes |
|---|---|---|---|---|
| API and parser | Unclear query contract | Define strict request grammar and validation | Flexibility vs deterministic execution | Ambiguous semantics and hard-to-debug behavior |
| Durability | Data loss on crash | Write-ahead log before applying state | Latency overhead for fsync | Torn writes, replay gaps, corrupt logs |
| Storage engine | Slow reads or writes at scale | B+ tree for read locality or LSM for write throughput | Read latency vs write amplification | Compaction storms, fragmentation, cache churn |
| Coordination | Replica divergence | Quorum and leader election | Stronger consistency vs lower availability | Split brain and stale reads |
| Background maintenance | Performance decay over time | Compaction, vacuum, reindexing, repair | Steady overhead vs predictable latency | Write stalls and long tail spikes |
Choose model by workload
| Database type | Primary data model | Storage/index defaults | Best-fit workloads | Main risk |
|---|---|---|---|---|
| Relational | Tables, keys, joins, constraints | WAL + B+ tree + MVCC | OLTP, billing, inventory, ledgers | Contention under heavy multi-row writes |
| Non-relational | Key-value, document, wide-column | Partitioning + replication + LSM | High-ingest events, profiles, session state | Cross-entity consistency complexity |
| Vector | Embedding vectors + metadata filters | ANN index (HNSW or IVF) + metadata store | Semantic search, RAG, recommendation | Recall-latency tuning and stale embeddings |
For algorithm mechanics used by these systems, see distributed algorithms.
Architecture diagram
flowchart LR
Client[Client] --> API[Query API]
API --> Planner[Parser and Planner]
Planner --> Txn[Transaction and Concurrency]
Txn --> WAL[Write-Ahead Log]
Txn --> Engine[Storage Engine]
Engine --> Index[Index Structures]
Engine --> Data[Data Files]
Engine --> Bg[Compaction or Vacuum]
Engine --> Repl[Replication Stream]
Repl --> Replicas[(Replicas)]
Index --> ANN[Vector ANN Index]
Index --> BTree[B+ Tree]
Index --> LSM[LSM Levels]
Step-by-step flow
Relational database build path
- Define schema, primary keys, foreign keys, and invariant checks.
- Implement WAL append and crash recovery replay.
- Build B+ tree indexes for primary and selective secondary access paths.
- Add MVCC snapshots and lock manager for concurrent transactions.
- Implement optimizer rules for join order, index scans, and predicate pushdown.
- Add replication, backup, and point-in-time restore.
sequenceDiagram
participant App
participant SQL as SQL Layer
participant Txn
participant WAL
participant BTree
participant Disk
App->>SQL: INSERT/UPDATE
SQL->>Txn: Validate constraints
Txn->>WAL: Append log record
WAL-->>Txn: Durable ack
Txn->>BTree: Update index pages
BTree->>Disk: Flush dirty pages
Txn-->>App: Commit success
Non-relational database build path
- Pick partition key and replica strategy per entity.
- Route writes by partition and append to commit log.
- Buffer in memtable and flush immutable SSTables.
- Run compaction to merge segments and remove tombstones.
- Expose quorum-tunable reads and writes for consistency control.
- Run anti-entropy repair between replicas.
flowchart LR
Client --> Router[Partition Router]
Router --> Log[Commit Log]
Log --> Mem[Memtable]
Mem --> SST[SSTables]
SST --> Compact[Compaction]
Router --> Repl[Replica Set]
Repl --> Read[Quorum Read]
Repl --> Write[Quorum Write]
Vector database build path
- Generate embeddings and persist source metadata.
- Normalize vectors and choose ANN index strategy.
- Build index partitions and shard by tenant or semantic domain.
- At query time, encode text into vector and search top-k neighbors.
- Apply metadata filters and rerank candidates for quality.
- Rebuild or incrementally update indexes as embeddings drift.
sequenceDiagram
participant User
participant API
participant Embed as Embedding Model
participant Index as ANN Index
participant Meta as Metadata Store
User->>API: Semantic query
API->>Embed: Encode text to vector
Embed-->>API: Query vector
API->>Index: kNN search
Index-->>API: Candidate IDs
API->>Meta: Filter and fetch payload
Meta-->>API: Matched documents
API-->>User: Ranked results
Failure modes
- Relational: lock contention and long transactions stall hot tables.
- Relational: checkpoint lag increases crash recovery time.
- Non-relational: bad partition keys create hot shards and uneven storage growth.
- Non-relational: compaction backlog increases read amplification and latency variance.
- Vector: stale embeddings degrade relevance even when latency is stable.
- Vector: ANN parameter mis-tuning causes low recall or excessive memory usage.
- All types: untested backup restore paths produce silent data loss risk.
Trade-offs
| Choice | Benefit | Cost |
|---|---|---|
| Relational constraints everywhere | Strong correctness and clear invariants | Higher write contention and schema migration effort |
| Wide denormalization in NoSQL | Fast reads at high scale | Duplicate data and reconciliation logic |
| Aggressive vector index compression | Lower memory footprint | Potential recall drop |
| Strong quorum settings | Better read-after-write guarantees | Higher latency and lower partition tolerance |
| Frequent background maintenance | Stable read performance | Predictable but persistent resource overhead |
Real-world usage
| Product need | Recommended primary store | Why |
|---|---|---|
| Orders, payments, and account balance | Relational database | Strict constraints and transactional integrity |
| User activity feed and high-volume events | Non-relational LSM-based store | High write throughput and horizontal partitioning |
| Semantic document search | Vector database plus metadata store | Fast nearest-neighbor retrieval with filtering |
| Hybrid retrieval applications | Relational metadata + vector index + cache | Combines correctness with semantic relevance |