Storage and Data Patterns
Problem framing
Data systems must survive failures, scale throughput, and keep latency predictable under uneven access. Replication protects availability and durability, while sharding distributes load across nodes. Partitioning decisions define where data lives, which directly shapes failure behavior and consistency cost.
Core idea / pattern
Replication, sharding, partitioning
Replication copies data across nodes to improve availability and read capacity. Sharding splits a dataset by key or range so no single node holds all data. Partitioning is the physical placement strategy that binds shards to nodes and zones.
| Technique | Primary goal | Typical risks |
|---|---|---|
| Replication | Availability and durability | Replication lag, write amplification |
| Sharding | Horizontal scale | Hot shards, complex rebalancing |
| Partitioning | Fault isolation | Skewed placement, blast radius overlap |
SQL vs NoSQL architectures
SQL systems emphasize schema, transactions, and strong consistency within a single logical database. NoSQL systems trade rigid schemas and cross-entity transactions for scale and flexible access patterns. The real decision is workload fit, not ideology; use the smallest model that meets the SLA. For data model selection, see database patterns.
| Attribute | SQL | NoSQL |
|---|---|---|
| Consistency | Strong by default | Often tunable |
| Schema | Strict, enforced | Flexible or schema-on-read |
| Scaling | Vertical, or sharded with coordination | Horizontal by design |
| Transactions | Multi-row, ACID | Limited or scoped |
CAP trade-offs
CAP highlights trade-offs during network partitions, not during healthy operation. Under partition, you typically pick availability with eventual consistency or strict consistency with reduced availability. See consistency models for precise guarantees.
Architecture diagram
flowchart LR
App[Application] --> Router[Shard Router]
Router --> ShardA[Shard A Primary]
Router --> ShardB[Shard B Primary]
ShardA --> RepA1[Replica A1]
ShardA --> RepA2[Replica A2]
ShardB --> RepB1[Replica B1]
ShardB --> RepB2[Replica B2]
Animated flow
Step-by-step flow
- The application submits a request with a shard key.
- The router selects the shard based on the partitioning scheme.
- Writes go to the shard primary and replicate to secondaries.
- Reads are served from the primary or a replica based on consistency requirements.
- Background processes rebalance shards when hot spots or growth thresholds appear.
Playground: Shard skew
Failure modes
- Replication lag causes stale reads when clients read from followers.
- Network partitions create split-brain risk without clear leader election.
- Hot shards dominate CPU or storage, causing tail latency spikes.
- Rebalancing moves large amounts of data and can overload the network.
- Misconfigured quorum sizes allow stale reads or lost writes.
Trade-offs
- Strong consistency simplifies reasoning but adds write latency and reduces availability under partition.
- Eventually consistent systems scale and survive partitions, but add reconciliation complexity.
- More replicas improve durability but increase write cost and coordination overhead.
- Range sharding enables efficient scans but risks uneven key distribution.
- Hash sharding balances load but complicates range queries and data locality.
Real-world usage
- SQL systems like PostgreSQL or MySQL are common for transactional workloads with strict correctness.
- NoSQL systems like Cassandra or Dynamo-style stores power high-throughput, multi-region workloads.
- Document stores and wide-column databases are common for flexible schemas and high ingest.
- Cloud-native systems combine strong consistency per shard with global replication.