System Design 101: How to Build Systems That Scale

title: "System Design 101: How to Build Systems That Scale" date: "2026-03-10" description: "A practical introduction to system design fundamentals — scalability, load balancing, caching, databases, and the trade-offs that define real-world architecture decisions." tags: ["System Design", "Architecture"]

System design is the art of building software systems that are reliable, scalable, and maintainable. Whether you're preparing for a senior engineering interview or designing your next product, these fundamentals apply everywhere.

The Core Problem: Scale

A system that works for 100 users will often break at 10,000. The challenges change:

Traffic — more requests per second
Data — more records to store and query
Geography — users in multiple regions
Reliability — more nodes means more failure surfaces

Good system design anticipates these challenges before they become crises.

1. Horizontal vs Vertical Scaling

Vertical scaling (scale up) — buy a bigger server. Add more CPU, RAM, faster disk. Simple, but has limits and creates a single point of failure.

Horizontal scaling (scale out) — add more servers. Distribute load across many machines. This is how modern internet-scale systems work.

Vertical:         Horizontal:
  [Big Server]      [Server] [Server] [Server]
       |                    |
   [Users]              [Load Balancer]
                             |
                          [Users]

Most production systems use a combination: somewhat powerful instances, but many of them.

2. Load Balancing

When you have multiple servers, you need something to distribute incoming traffic evenly. That's a load balancer.

Common load-balancing strategies:

| Strategy | How it works | Best for | |---|---|---| | Round Robin | Requests go to servers in rotation | Stateless apps with similar workloads | | Least Connections | Route to server with fewest active connections | Variable request duration | | IP Hash | Hash client IP → consistent server | Session persistence | | Weighted | Some servers get more traffic | Mixed-capacity servers |

In AWS, this is ALB (Application Load Balancer). In Kubernetes, it's a Service with multiple pods. Nginx and HAProxy are popular self-hosted options.

Health Checks

Load balancers continuously health-check backend servers. If a server stops responding, the load balancer stops routing traffic to it. This is a core mechanism for high availability.

3. Caching

Cache = store frequently-accessed data in fast memory to avoid expensive recomputation or database queries.

Levels of Caching

Browser cache — HTTP cache headers (Cache-Control, ETag)
CDN — cache static assets geographically close to users
Application cache — in-process cache (e.g., Python dict, Guava cache in Java)
Distributed cache — Redis, Memcached — shared across all app servers

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

When underlying data changes, you need to update or invalidate cached copies. Common strategies:

TTL (Time-to-Live) — expire cache entries after N seconds
Write-through — update cache on every write
Write-behind — queue cache writes asynchronously
Cache-aside (lazy loading) — only populate cache on cache miss

import redis
import json

cache = redis.Redis(host="localhost", port=6379, db=0)

def get_user(user_id: str) -> dict:
    # Try cache first
    cached = cache.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Cache miss — query database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # Populate cache with 5-minute TTL
    cache.setex(f"user:{user_id}", 300, json.dumps(user))

    return user

4. Databases

Choosing the right database is one of the most consequential architecture decisions.

Relational Databases (SQL)

PostgreSQL, MySQL, SQLite. Data is structured in tables with defined schemas. ACID transactions guarantee consistency.

Best for: user data, financial records, anything with complex relationships and joins.

-- Efficient query with proper indexing
SELECT p.title, u.name, COUNT(c.id) as comment_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN comments c ON c.post_id = p.id
WHERE p.published_at > NOW() - INTERVAL '7 days'
GROUP BY p.id, u.name
ORDER BY comment_count DESC
LIMIT 10;

NoSQL Databases

Different data models for different access patterns:

| Type | Example | Best for | |---|---|---| | Document | MongoDB, Firestore | Flexible schemas, JSON-like data | | Key-Value | Redis, DynamoDB | Fast lookups by key, sessions, caches | | Wide-Column | Cassandra, HBase | High-write, time-series, massive scale | | Graph | Neo4j | Relationships (social networks, fraud detection) |

Database Scaling Techniques

Read replicas — one primary handles writes; multiple replicas handle reads. Great when your workload is read-heavy (most apps are).

Sharding — partition data across multiple database instances by a shard key (e.g., user_id % N). Enables horizontal scaling but adds query complexity.

Connection pooling — reuse database connections rather than opening a new one per request. PgBouncer for Postgres, HikariCP for Java.

5. Message Queues & Async Processing

Not all work needs to happen synchronously in the request-response cycle. Message queues decouple producers from consumers.

[Web Server] → [Queue: email_jobs] → [Email Worker]
                                   → [Email Worker]
                                   → [Email Worker]

Benefits:

Resilience — if email service is down, jobs wait in queue
Throughput — workers process at their own pace
Retry — failed jobs can be retried automatically

Popular choices: Kafka (high-throughput streaming), RabbitMQ (rich routing), SQS (AWS managed), Bull/BullMQ (Redis-backed, great for Node.js).

6. CAP Theorem

In a distributed system, you can only guarantee two of three:

Consistency — every read gets the most recent write
Availability — every request gets a (non-error) response
Partition tolerance — system works even if nodes can't communicate

Since network partitions are unavoidable in practice, you're really choosing between CP (consistent but may be unavailable during partition) and AP (available but may return stale data).

Examples:

CP: Zookeeper, HBase, MongoDB (with strong consistency settings)
AP: Cassandra, DynamoDB (with eventual consistency), CouchDB

7. Designing for Failure

Every component in a distributed system will eventually fail. Good systems assume this and design accordingly.

Circuit breakers — if a downstream service is failing, stop calling it for a while. Prevents cascading failures.

Retries with exponential backoff — don't hammer a struggling service.

async function callWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (attempt === maxRetries - 1) throw err;
      // Exponential backoff: 100ms, 200ms, 400ms...
      await new Promise((r) => setTimeout(r, 100 * 2 ** attempt));
    }
  }
  throw new Error("unreachable");
}

Timeouts — always set timeouts on external calls. Hanging requests exhaust your connection pool.

Bulkheads — isolate failures to a part of the system (like bulkheads in a ship).

Putting It Together: A Simple Feed Service

Imagine designing Twitter's feed. The requirements:

Users can post tweets
Users follow other users
Timeline shows latest tweets from followed users

A reasonable starting architecture:

API servers behind a load balancer, horizontally scaled
PostgreSQL for users, follows, tweets (with read replicas)
Redis to cache pre-computed timelines per user
Kafka for fan-out — when user A posts, emit a message; consumers update followers' cached timelines
CDN for media (images, videos)

As scale increases, you'd shard the tweets table, add more Kafka consumers, and possibly split the "celebrity accounts with millions of followers" into a separate pull-based timeline path.

Summary

System design is about trade-offs. There's rarely a single right answer — only answers appropriate for your scale, team size, and reliability requirements.

Key principles to internalize:

Design for failure — assume components will fail
Scale out over scale up for internet-scale systems
Cache aggressively — but have a cache invalidation strategy
Async by default for work that doesn't need to be synchronous
Pick databases based on access patterns, not familiarity
Measure everything — you can't optimize what you can't observe