How Amazon/Flipkart Manages 500 Crore Requests

Episode - How Amazon/Flipkart Manages 500 Crore Requests

Hey everyone! Welcome back to the tutorial series. Today we are going to learn about one of the MOST fascinating topics in system design - How do companies like Amazon and Flipkart handle 500 Crore (5 Billion) requests?

I am super excited about this topic because understanding how systems scale is what makes you a 10x engineer. Trust me, this is THE most asked system design interview question!

Think about it - during a Big Billion Day sale on Flipkart or Amazon's Great Indian Festival, millions of users are hitting the servers at the SAME time. How does the server NOT crash? How does the website stay fast? Let's find out!

What we will cover:

  • The Problem - Why Single Servers Fail
  • Horizontal Scaling
  • Load Balancer
  • Database Sharding
  • Caching (Redis)
  • Message Queues (Kafka)
  • CDN (Content Delivery Network)
  • Rate Limiting
  • How It All Works Together
  • Interview Questions
  • Key Points to Remember
The 7 Weapons to Handle 500 Crore Requests:
=============================================

┌─────────────────────────────────────────────────────────┐
│                                                          │
│   1. Horizontal Scaling    → Add more servers            │
│   2. Load Balancer         → Distribute traffic evenly   │
│   3. DB Sharding           → Split database across nodes │
│   4. Caching (Redis)       → Store hot data in memory    │
│   5. Queues (Kafka)        → Handle async tasks          │
│   6. CDN                   → Serve static files globally │
│   7. Rate Limiting         → Protect from abuse          │
│                                                          │
└─────────────────────────────────────────────────────────┘

Together, these 7 strategies let Amazon/Flipkart handle
BILLIONS of requests without breaking a sweat!

The Problem - Why Single Servers Fail

Let's start with the most basic question - Why can't a single server handle all the traffic?

Single Server Architecture:
============================

500 Crore requests
        │
        ▼
┌──────────────────┐
│   SINGLE SERVER  │
│                  │
│  CPU: 100% 🔥   │
│  RAM: 100% 🔥   │
│  Disk: 100% 🔥  │
│                  │
│  💀 SERVER DEAD  │
│  💀 WEBSITE DOWN │
│  💀 USERS ANGRY  │
│  💀 COMPANY LOSES│
│     CRORES!      │
└──────────────────┘

A single server can handle maybe 10,000-50,000 requests/second.
500 Crore requests = 5,00,00,00,000 requests
Even spread over a day = ~58,000 requests per SECOND!
During peak (sale) = 5,00,000+ requests per second!

One server? IMPOSSIBLE! 😱

So how do Amazon and Flipkart do it? They use a combination of 7 powerful strategies. Let's learn each one!

1. Horizontal Scaling - "Add More Servers!"

This is the foundation of handling massive traffic. Instead of making one server bigger, we add MORE servers!

Q: What is the difference between Vertical and Horizontal Scaling?

Vertical Scaling (Scale UP):
============================

"Make the server BIGGER"

┌──────────────────┐         ┌──────────────────┐
│   Small Server   │         │   BIG SERVER     │
│                  │         │                  │
│  CPU: 4 cores   │  ──→    │  CPU: 64 cores   │
│  RAM: 8 GB      │         │  RAM: 256 GB     │
│  Disk: 500 GB   │         │  Disk: 10 TB     │
└──────────────────┘         └──────────────────┘

Problems:
❌ There's a hardware LIMIT (can't add infinite CPU/RAM)
❌ Very EXPENSIVE (high-end servers cost lakhs!)
❌ Single point of failure (if it crashes, everything is down)
❌ Downtime during upgrades


Horizontal Scaling (Scale OUT):
================================

"Add MORE servers"

┌──────────────┐
│   Server 1   │
│  (App copy)  │
├──────────────┤
│   Server 2   │
│  (App copy)  │
├──────────────┤
│   Server 3   │
│  (App copy)  │
├──────────────┤
│   Server 4   │
│  (App copy)  │
├──────────────┤
│     ...      │
├──────────────┤
│  Server 100  │
│  (App copy)  │
└──────────────┘

Benefits:
✅ No hardware limit (just keep adding servers!)
✅ Cost effective (use many cheap servers)
✅ No single point of failure (one dies, others serve)
✅ Zero downtime during scaling
✅ THIS IS WHAT AMAZON USES!

How Amazon does it:

Amazon's Horizontal Scaling:
============================

Normal Day:
  10 servers → Handle 1 Lakh requests/second

Big Billion Day Sale starts:
  AWS Auto-Scaling kicks in!
  10 servers → 50 servers → 200 servers → 500 servers!

Sale ends:
  500 servers → 200 servers → 50 servers → 10 servers

This is called AUTO-SCALING!
You only pay for what you use!

┌─────────────────────────────────────────────────────┐
│              AUTO-SCALING IN ACTION                   │
│                                                       │
│  Traffic ▲                                           │
│          │         ╱╲                                │
│          │        ╱  ╲     Sale Peak!                │
│          │       ╱    ╲                              │
│          │      ╱      ╲                             │
│          │     ╱        ╲                            │
│          │ ───╱──────────╲───  Normal                │
│          └──────────────────────────→ Time           │
│                                                       │
│  Servers ▲                                           │
│          │         ╱╲                                │
│          │        ╱  ╲     500 servers!              │
│          │       ╱    ╲                              │
│          │ ─────╱──────╲─────  10 servers            │
│          └──────────────────────────→ Time           │
│                                                       │
│  Servers scale UP and DOWN with traffic!              │
└─────────────────────────────────────────────────────┘

But wait - if we have 500 servers, how does a user's request know WHICH server to go to? That's where the Load Balancer comes in!

2. Load Balancer - "The Traffic Police"

A Load Balancer sits in front of all your servers and distributes incoming requests evenly across them. Think of it like a traffic police officer directing cars to different lanes!

Without Load Balancer:
======================

All 500 Crore requests → ONE server → 💀 Dead!


With Load Balancer:
===================

                    500 Crore requests
                           │
                           ▼
                  ┌─────────────────┐
                  │  LOAD BALANCER  │
                  │      (LB)      │
                  └────────┬────────┘
                           │
         ┌─────────────────┼─────────────────┐
         │                 │                 │
         ▼                 ▼                 ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   App (A1)   │  │   App (A2)   │  │   App (A3)   │
│  Server 1    │  │  Server 2    │  │  Server 3    │
│              │  │              │  │              │
│ ~166 Crore   │  │ ~166 Crore   │  │ ~166 Crore   │
│ requests     │  │ requests     │  │ requests     │
└──────────────┘  └──────────────┘  └──────────────┘

Each server handles only 1/3 of the traffic!
Add more servers = Each handles even less!

Load Balancing Algorithms:

How LB Decides Which Server Gets the Request:
==============================================

1. ROUND ROBIN (Most Common)
   Request 1 → Server A
   Request 2 → Server B
   Request 3 → Server C
   Request 4 → Server A  (back to start)
   Request 5 → Server B
   ...

2. LEAST CONNECTIONS
   "Send to the server with fewest active connections"
   Server A: 100 connections
   Server B: 50 connections  ← Next request goes HERE!
   Server C: 80 connections

3. IP HASH
   "Same user always goes to same server"
   User IP 1.2.3.4 → Always Server A
   User IP 5.6.7.8 → Always Server B
   (Good for session-based apps)

4. WEIGHTED ROUND ROBIN
   "Powerful servers get more traffic"
   Server A (8 CPU):  40% traffic
   Server B (4 CPU):  30% traffic
   Server C (2 CPU):  15% traffic
   Server D (2 CPU):  15% traffic

Tools used by companies:

Load Balancer Used By Type
Nginx Netflix, Airbnb, WordPress Software (open-source)
AWS ELB/ALB Amazon, Flipkart Cloud managed
HAProxy GitHub, Reddit, Stack Overflow Software (open-source)
Cloudflare Millions of websites Cloud / CDN + LB

Now our requests are distributed across servers. But what about the database? If all 500 servers hit ONE database, the database becomes the bottleneck!

3. Database Sharding - "Split the Database!"

Even with 500 app servers, if they ALL read/write to ONE database, that database will crash! Sharding splits the database into smaller pieces!

Q: What is Database Sharding?

A: Sharding is the practice of splitting a large database into smaller, faster, more manageable pieces called shards. Each shard holds a subset of the data!

Without Sharding:
=================

500 App Servers
      │
      ▼ (ALL hitting one DB)
┌──────────────────────┐
│    SINGLE DATABASE   │
│                      │
│  100 Crore users     │
│  500 Crore orders    │
│                      │
│  CPU: 100% 🔥       │
│  Disk I/O: maxed 🔥 │
│  Queries: SLOW 🐌   │
│                      │
│  💀 DATABASE DEAD    │
└──────────────────────┘


With Sharding:
==============

500 App Servers
      │
      ▼
┌──────────────────────┐
│    ROUTING LAYER     │
│    "Which shard?"    │
└──────────┬───────────┘
           │
     ┌─────┼─────┬─────────┐
     │     │     │         │
     ▼     ▼     ▼         ▼
┌────────┐┌────────┐┌────────┐┌────────┐
│Shard A ││Shard B ││Shard C ││Shard D │
│        ││        ││        ││        │
│Users   ││Users   ││Users   ││Users   │
│A-F     ││G-M     ││N-S     ││T-Z     │
│        ││        ││        ││        │
│25 Crore││25 Crore││25 Crore││25 Crore│
│users   ││users   ││users   ││users   │
└────────┘└────────┘└────────┘└────────┘

Each shard handles only 25% of the data!
Queries are 4x FASTER! 🚀

Sharding Strategies:

Sharding Strategies:
====================

1. RANGE-BASED SHARDING
   Shard A: Users with ID 1 - 25,00,00,000
   Shard B: Users with ID 25,00,00,001 - 50,00,00,000
   Shard C: Users with ID 50,00,00,001 - 75,00,00,000
   Shard D: Users with ID 75,00,00,001 - 100,00,00,000

   Problem: Uneven distribution (new users all go to Shard D)


2. HASH-BASED SHARDING (Most Common)
   shard_number = hash(user_id) % number_of_shards

   hash("user_123") % 4 = 2  → Goes to Shard C
   hash("user_456") % 4 = 0  → Goes to Shard A
   hash("user_789") % 4 = 3  → Goes to Shard D

   Evenly distributed! ✅


3. GEO-BASED SHARDING
   Shard India:  All Indian users
   Shard US:     All American users
   Shard Europe: All European users

   Lower latency! Users connect to nearest shard!
Real Example - Flipkart:
========================

Flipkart during Big Billion Day:
- 100 Crore users
- 500 Crore requests
- 10 Database shards

Each shard:
- 10 Crore users
- 50 Crore requests
- Fast queries! ✅
- No single DB bottleneck! ✅

But even with sharding, hitting the database for EVERY request is slow. What if we could skip the database entirely for common data? Enter Caching!

4. Caching with Redis - "Memory is 100x Faster than Disk!"

This is one of the MOST powerful strategies! Caching stores frequently accessed data in memory (RAM) instead of hitting the database every time!

Q: Why is caching so powerful?

Speed Comparison:
=================

Database query (disk):  ~10-100 ms  🐌
Redis cache (memory):   ~0.1-1 ms   🚀

Redis is 100x FASTER than database!

If 80% of requests can be served from cache,
you just reduced database load by 80%!
How Caching Works:
==================

WITHOUT CACHE:
                                  ┌──────────┐
User → "Show me iPhone price" ──→ │ DATABASE │ → 50ms → Response
User → "Show me iPhone price" ──→ │ DATABASE │ → 50ms → Response
User → "Show me iPhone price" ──→ │ DATABASE │ → 50ms → Response
                                  └──────────┘
1000 users ask same thing = 1000 DB queries! 😱


WITH CACHE (Redis):
                                  ┌──────────┐
User → "Show me iPhone price" ──→ │ DATABASE │ → 50ms → Response
                                  └──────────┘
                                       │
                                  Store in Redis!
                                       │
                                  ┌──────────┐
User → "Show me iPhone price" ──→ │  REDIS   │ → 0.5ms → Response
User → "Show me iPhone price" ──→ │  REDIS   │ → 0.5ms → Response
User → "Show me iPhone price" ──→ │  REDIS   │ → 0.5ms → Response
                                  └──────────┘
1000 users ask same thing = 1 DB query + 999 cache hits! 🚀
Cache Architecture:
===================

┌──────────┐         ┌──────────┐         ┌──────────┐
│  Client  │────────→│   App    │────────→│  REDIS   │
│          │         │  Server  │         │  CACHE   │
└──────────┘         └────┬─────┘         └────┬─────┘
                          │                    │
                     Cache HIT?          Data found?
                          │                    │
                    ┌─────┴─────┐        ┌─────┴─────┐
                    │           │        │           │
                   YES          NO      YES          NO
                    │           │        │           │
                    ▼           │        ▼           │
              Return from       │   Return from     │
              Redis (fast!)     │   Redis (fast!)   │
                                │                   │
                                │                   ▼
                                │           ┌──────────┐
                                └──────────→│ DATABASE │
                                            └────┬─────┘
                                                 │
                                            Store in Redis
                                            for next time!
                                                 │
                                                 ▼
                                            Return to user

What Amazon/Flipkart caches:

What Gets Cached:
=================

┌─────────────────────────────────────────────────────┐
│  Data Type           │  TTL (Time to Live)          │
├──────────────────────┼──────────────────────────────┤
│  Product details     │  5 minutes                   │
│  Product prices      │  1 minute (changes often)    │
│  User sessions       │  30 minutes                  │
│  Search results      │  2 minutes                   │
│  Homepage data       │  10 minutes                  │
│  Category listings   │  15 minutes                  │
│  User cart           │  24 hours                    │
│  API rate limit count│  1 minute (per window)       │
└──────────────────────┴──────────────────────────────┘

TTL = How long data stays in cache before refreshing
// Redis caching in Node.js

const Redis = require("ioredis");
const redis = new Redis();

// Get product - Check cache first!
app.get("/product/:id", async (req, res) => {
    const productId = req.params.id;
    const cacheKey = `product:${productId}`;

    // Step 1: Check Redis cache
    const cached = await redis.get(cacheKey);

    if (cached) {
        console.log("CACHE HIT! Returning from Redis");
        return res.json(JSON.parse(cached));  // 0.5ms!
    }

    // Step 2: Cache MISS - Query database
    console.log("CACHE MISS! Querying database");
    const product = await Product.findById(productId);  // 50ms

    // Step 3: Store in Redis for next time (TTL: 5 min)
    await redis.set(cacheKey, JSON.stringify(product), "EX", 300);

    res.json(product);
});
OUTPUT:
=======

Request 1: CACHE MISS! Querying database  → 50ms
Request 2: CACHE HIT! Returning from Redis → 0.5ms
Request 3: CACHE HIT! Returning from Redis → 0.5ms
Request 4: CACHE HIT! Returning from Redis → 0.5ms
...
Request 1000: CACHE HIT! Returning from Redis → 0.5ms

999 out of 1000 requests served 100x faster! 🚀

Now, what about tasks that don't need an immediate response? Like sending order confirmation emails, processing payments, or updating inventory? We don't want to make the user WAIT for all that!

5. Message Queues (Kafka) - "Do It Later!"

Q: What is a Message Queue?

A: A message queue is a system where one service puts a message (task) into a queue, and another service picks it up later and processes it. The user doesn't have to wait!

Without Message Queue:
======================

User clicks "Place Order"
     │
     ▼
┌─────────────────────────────────────────────┐
│  App Server does EVERYTHING synchronously:  │
│                                             │
│  1. Validate order           → 50ms         │
│  2. Process payment          → 2000ms  🐌   │
│  3. Update inventory         → 100ms        │
│  4. Send confirmation email  → 500ms   🐌   │
│  5. Send SMS notification    → 300ms   🐌   │
│  6. Update analytics         → 200ms        │
│  7. Generate invoice PDF     → 800ms   🐌   │
│                                             │
│  Total: 3950ms (almost 4 seconds!) 😱      │
│  User is WAITING for all of this!           │
└─────────────────────────────────────────────┘


With Message Queue (Kafka):
============================

User clicks "Place Order"
     │
     ▼
┌─────────────────────────────────────────────┐
│  App Server does only the ESSENTIAL part:   │
│                                             │
│  1. Validate order           → 50ms         │
│  2. Process payment          → 2000ms       │
│  3. Push rest to Kafka queue → 5ms          │
│                                             │
│  Total: 2055ms → Response to user! ✅      │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│              KAFKA QUEUE                     │
│                                             │
│  Message: { orderId: "123", type: "new" }   │
│                                             │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐          │
│  │ Msg │ │ Msg │ │ Msg │ │ Msg │ ← Queue  │
│  └─────┘ └─────┘ └─────┘ └─────┘          │
│                                             │
└──────────────────┬──────────────────────────┘
                   │
         ┌─────────┼─────────┬──────────┐
         │         │         │          │
         ▼         ▼         ▼          ▼
    ┌─────────┐┌────────┐┌───────┐┌─────────┐
    │  Email  ││Inventory││  SMS  ││Analytics│
    │ Service ││ Service ││Service││ Service │
    │         ││         ││       ││         │
    │ Sends   ││ Updates ││ Sends ││ Updates │
    │ email   ││ stock   ││ SMS   ││ data    │
    └─────────┘└────────┘└───────┘└─────────┘

    These process ASYNCHRONOUSLY!
    User doesn't wait for them!

Kafka Topics and Partitions:

Kafka Architecture:
===================

                    KAFKA CLUSTER
┌──────────────────────────────────────────────┐
│                                              │
│  Topic: "orders"                             │
│  ┌────────────────────────────────────────┐  │
│  │  Partition A: Order 1, 4, 7, 10...    │  │
│  ├────────────────────────────────────────┤  │
│  │  Partition B: Order 2, 5, 8, 11...    │  │
│  ├────────────────────────────────────────┤  │
│  │  Partition C: Order 3, 6, 9, 12...    │  │
│  └────────────────────────────────────────┘  │
│                                              │
│  Topic: "notifications"                      │
│  ┌────────────────────────────────────────┐  │
│  │  Partition A: Emails                   │  │
│  ├────────────────────────────────────────┤  │
│  │  Partition B: SMS                      │  │
│  ├────────────────────────────────────────┤  │
│  │  Partition C: Push notifications       │  │
│  └────────────────────────────────────────┘  │
│                                              │
└──────────────────────────────────────────────┘

Why Partitions?
- Multiple consumers can read in PARALLEL!
- Partition A → Consumer 1
- Partition B → Consumer 2
- Partition C → Consumer 3
- 3x throughput! 🚀
Why Kafka for Amazon/Flipkart:
==============================

1. MASSIVE THROUGHPUT
   Kafka can handle millions of messages per second!
   Perfect for Big Billion Day scale.

2. DURABILITY
   Messages are stored on disk.
   Even if a consumer crashes, messages are NOT lost.
   Consumer restarts and picks up where it left off.

3. ORDERING
   Messages within a partition are ordered.
   Order 1 is always processed before Order 4 in Partition A.

4. SCALABILITY
   Add more partitions = More parallel consumers!
   Add more brokers = More storage and throughput!

6. CDN (Content Delivery Network) - "Serve from the Nearest Location!"

Q: What is a CDN?

A: A CDN is a network of servers spread across the globe that caches and serves static content (images, CSS, JS, videos) from the location nearest to the user!

Without CDN:
============

User in Mumbai → Request image → Server in US → 300ms 🐌
User in Delhi  → Request image → Server in US → 280ms 🐌
User in Chennai→ Request image → Server in US → 320ms 🐌

ALL requests travel across the ocean to US server!


With CDN:
=========

User in Mumbai  → CDN Server Mumbai  → 20ms 🚀
User in Delhi   → CDN Server Delhi   → 15ms 🚀
User in Chennai → CDN Server Chennai → 18ms 🚀

Content served from NEAREST location!

┌─────────────────────────────────────────────────────┐
│                    WORLD MAP                         │
│                                                      │
│    🌐 CDN Edge                    🌐 CDN Edge       │
│    San Francisco                  London             │
│                                                      │
│              🏢 Origin Server                        │
│              (US - Main)                             │
│                                                      │
│    🌐 CDN Edge        🌐 CDN Edge     🌐 CDN Edge   │
│    Singapore           Mumbai          Tokyo         │
│                                                      │
│  User in Mumbai → CDN Mumbai (20ms) NOT US (300ms)  │
│                                                      │
└─────────────────────────────────────────────────────┘

What CDN serves:

What Goes on CDN:
=================

✅ Product images (iPhone photo, thumbnail)
✅ CSS files (styles.css)
✅ JavaScript bundles (app.js)
✅ Videos (product demos)
✅ Fonts (Google Fonts, custom fonts)
✅ Static HTML pages
✅ PDF files (invoices, brochures)

❌ API responses (dynamic, user-specific)
❌ User data (private)
❌ Cart data (changes frequently)
❌ Payment data (sensitive)
CDN Impact on Amazon:
=====================

Amazon product page loads:
- 1 HTML file
- 5 CSS files
- 10 JS files
- 50 product images
- 2 videos

Without CDN: 68 requests to origin server (US)
             Total load time: ~4 seconds

With CDN:    68 requests to edge server (Mumbai)
             Total load time: ~0.8 seconds

5x FASTER page load! 🚀

And the origin server is FREE to handle API requests
instead of serving static files!
CDN Provider Used By Edge Locations
CloudFront Amazon, Flipkart 450+ worldwide
Cloudflare Shopify, Discord 300+ worldwide
Akamai Facebook, Apple 4000+ worldwide
Fastly GitHub, Stripe 90+ worldwide

7. Rate Limiting - "Slow Down, You're Too Fast!"

Even with all these strategies, we need to protect the system from abuse. Rate limiting controls how many requests a single user or IP can make!

Why Rate Limiting at Scale:
===========================

Problem 1: Bot Attacks
  Bot tries to buy 10,000 iPhones in 1 second
  during flash sale → Other real users can't buy!

Problem 2: Scraping
  Competitor scrapes ALL product prices
  → 10 Lakh requests per minute → Server overload!

Problem 3: DDoS
  Attacker sends 1 Crore fake requests
  → Server busy handling fake traffic → Real users suffer!

Solution: RATE LIMITING!
Rate Limiting at Amazon Scale:
==============================

┌──────────────────────────────────────────────────────┐
│              RATE LIMITING RULES                      │
├──────────────────────────────────────────────────────┤
│                                                       │
│  API Endpoint         │ Limit        │ Window         │
│  ─────────────────────┼──────────────┼───────────     │
│  GET /products        │ 100 req      │ per minute     │
│  POST /cart/add       │ 30 req       │ per minute     │
│  POST /order/place    │ 5 req        │ per minute     │
│  POST /login          │ 5 req        │ per 15 min     │
│  GET /search          │ 60 req       │ per minute     │
│                                                       │
│  Exceeded? → 429 Too Many Requests                    │
│  "Please slow down and try again!"                    │
└──────────────────────────────────────────────────────┘

Different limits for different endpoints!
/order/place has STRICT limit (prevent bot purchases)
/products has RELAXED limit (browsing is fine)
Rate Limiting with Redis:
=========================

Why Redis for Rate Limiting?
→ Redis is IN-MEMORY = Super fast counter!
→ Redis has built-in TTL (auto-expire keys)
→ Redis is shared across ALL app servers!

How it works:

User makes request
     │
     ▼
┌──────────────────────────────────────────┐
│  App Server checks Redis:                │
│                                          │
│  Key: "ratelimit:user_123:products"      │
│  Value: 45 (requests made so far)        │
│  TTL: 32 seconds remaining              │
│                                          │
│  Is 45 < 100 (limit)?                   │
│  YES → Allow request, increment to 46   │
│  NO  → Return 429 Too Many Requests     │
└──────────────────────────────────────────┘

How It ALL Works Together - The Complete Architecture

Now let's see how ALL 7 strategies work together to handle 500 Crore requests!

Complete Architecture - Amazon/Flipkart Scale:
===============================================

                         ┌─────────────┐
                         │   USERS     │
                         │ (Crores!)   │
                         └──────┬──────┘
                                │
                                ▼
                    ┌───────────────────────┐
        ┌──────────│         CDN           │
        │          │    (Static files)     │
        │          │  Images, CSS, JS      │
        │          └───────────┬───────────┘
        │                      │
        │              (Only API requests
        │               pass through)
        │                      │
        │                      ▼
        │          ┌───────────────────────┐
        │          │    RATE LIMITER       │
        │          │  "Too fast? → 429"   │
        │          └───────────┬───────────┘
        │                      │
        │                      ▼
        │          ┌───────────────────────┐
        │          │    LOAD BALANCER      │
        │          │       (LB)           │
        │          └───────────┬───────────┘
        │                      │
        │          ┌───────────┼───────────┐
        │          │           │           │
        │          ▼           ▼           ▼
        │   ┌──────────┐┌──────────┐┌──────────┐
        │   │ App (A1) ││ App (A2) ││ App (A3) │
        │   │ Server 1 ││ Server 2 ││ Server 3 │
        │   └────┬─────┘└────┬─────┘└────┬─────┘
        │        │           │           │
        │        └─────┬─────┴───────────┘
        │              │
        │        ┌─────┴─────┐
        │        │           │
        │        ▼           ▼
        │  ┌──────────┐ ┌──────────────────────┐
        │  │  REDIS   │ │      DATABASE        │
        │  │  CACHE   │ │     (Sharded)        │
        │  │          │ │                      │
        │  │ Cache    │ │ ┌────┐┌────┐┌────┐  │
        │  │ HIT? ────┼→│ │ S1 ││ S2 ││ S3 │  │
        │  │ Return!  │ │ └────┘└────┘└────┘  │
        │  └──────────┘ └──────────────────────┘
        │        │
        │        │ (Async tasks)
        │        ▼
        │  ┌──────────────────────────┐
        │  │      KAFKA QUEUE         │
        │  │                          │
        │  │ ┌────┐┌────┐┌────┐      │
        │  │ │ P1 ││ P2 ││ P3 │      │
        │  │ └──┬─┘└──┬─┘└──┬─┘      │
        │  └────┼─────┼─────┼────────┘
        │       │     │     │
        │       ▼     ▼     ▼
        │  ┌──────┐┌─────┐┌──────────┐
        │  │Email ││ SMS ││Analytics │
        │  └──────┘└─────┘└──────────┘
        │
        └──→ Images, CSS, JS served from CDN edge (20ms!)


Request Flow Example:
=====================

1. User opens Flipkart app
   → CDN serves HTML, CSS, JS, images (20ms)

2. User searches "iPhone 15"
   → Rate limiter: OK (under limit)
   → Load Balancer → App Server A2
   → Redis cache: HIT! Return cached results (1ms)

3. User clicks on iPhone 15 product
   → Rate limiter: OK
   → LB → App Server A1
   → Redis cache: MISS
   → Database Shard B: Fetch product (15ms)
   → Store in Redis for next user
   → Return to user

4. User clicks "Buy Now"
   → Rate limiter: OK (under 5/min for orders)
   → LB → App Server A3
   → Process payment (synchronous)
   → Push to Kafka: "Send email, SMS, update inventory"
   → Return "Order Placed!" to user (fast!)
   → Kafka consumers handle rest in background

Interview Questions - Quick Fire!

Q: How would you design a system to handle millions of requests per second?

"I would use a combination of horizontal scaling with auto-scaling to add/remove servers based on traffic, a load balancer to distribute requests evenly, Redis caching to serve 80% of reads from memory, database sharding to split data across multiple nodes, Kafka message queues for async processing, a CDN for static content, and rate limiting to protect from abuse. This is exactly how companies like Amazon and Flipkart handle billions of requests."

Q: What is the difference between Vertical and Horizontal Scaling?

"Vertical scaling means making a single server more powerful (more CPU, RAM), but it has hardware limits and is expensive. Horizontal scaling means adding more servers to handle the load. It's cheaper, has no theoretical limit, and provides fault tolerance. Companies like Amazon use horizontal scaling with auto-scaling groups."

Q: What is Database Sharding?

"Sharding is splitting a large database into smaller pieces called shards, each holding a subset of data. For example, users A-M go to Shard 1, N-Z go to Shard 2. It distributes the load, reduces query time, and removes the single database bottleneck. Common strategies include hash-based sharding, range-based sharding, and geo-based sharding."

Q: Why use Redis for caching? Why not just query the database?

"Redis stores data in-memory (RAM), which is 100x faster than disk-based database queries. For data that is read frequently but changes rarely (like product details, search results), caching in Redis means most requests never hit the database. This dramatically reduces database load and response times from 50ms to under 1ms."

Q: What is Kafka and why use it?

"Kafka is a distributed message streaming platform. It decouples services by letting producers push messages to topics, and consumers process them asynchronously. For example, when a user places an order, the API immediately responds while Kafka handles email notifications, inventory updates, and analytics in the background. It supports millions of messages per second with guaranteed ordering and durability."

Q: What is a CDN and how does it help?

"A CDN (Content Delivery Network) is a network of servers distributed globally that caches static content like images, CSS, JS near the user. Instead of fetching an image from a server in US (300ms), a user in Mumbai gets it from a CDN edge in Mumbai (20ms). This reduces latency, offloads the origin server, and makes the application faster globally."

Q: What is a Load Balancer and what algorithms does it use?

"A load balancer distributes incoming traffic across multiple servers to prevent any single server from being overwhelmed. Common algorithms include Round Robin (requests go to servers in rotation), Least Connections (send to the server with fewest active connections), IP Hash (same user always goes to the same server), and Weighted Round Robin (powerful servers get more traffic)."

Q: Why is Rate Limiting important at scale?

"Rate limiting controls how many requests a user/IP can make in a time window. At scale, it protects against bot attacks (buying all stock in a flash sale), DDoS attacks (flooding servers with fake traffic), and scraping (competitors extracting data). Redis is commonly used for rate limiting because it provides fast, shared counters across all servers."

Q: What is the Cache Aside pattern?

"Cache Aside (also called Lazy Loading) is a caching strategy where: 1) The application first checks the cache, 2) If data exists (cache hit), return it immediately, 3) If data doesn't exist (cache miss), query the database, store the result in cache, then return it. This ensures the cache is populated on-demand and only contains data that's actually being requested."

Q: What happens if a cache server goes down?

"If Redis goes down, all requests fall back to the database, which may cause a sudden spike in database load - this is called a Cache Stampede. To prevent this: use Redis Cluster with replication (if one node dies, replica takes over), implement circuit breakers, use cache warming on startup, and set different TTLs to prevent all keys from expiring at once."

Quick Recap

Strategy What It Does Impact
Horizontal Scaling Add more servers, auto-scale Handle unlimited traffic
Load Balancer Distribute requests evenly No single server overloaded
DB Sharding Split database into pieces Faster queries, no DB bottleneck
Caching (Redis) Store hot data in memory 100x faster reads, 80% less DB load
Queues (Kafka) Process tasks asynchronously Faster response, decoupled services
CDN Serve static files from nearest edge 5x faster page loads globally
Rate Limiting Block excessive requests Protection from bots and DDoS

Key Points to Remember

  • Horizontal Scaling = Add more servers (scale OUT), not bigger servers (scale UP)
  • Auto-Scaling = Servers automatically increase during peak, decrease after
  • Load Balancer = Traffic police, distributes requests using Round Robin, Least Connections etc.
  • DB Sharding = Split database by hash, range, or geography
  • Hash-based sharding = shard = hash(key) % num_shards (most even distribution)
  • Redis Cache = In-memory store, 100x faster than database, use Cache Aside pattern
  • Cache HIT = Data found in cache (fast!) | Cache MISS = Not in cache, query DB
  • TTL = Time To Live, how long data stays in cache before refresh
  • Kafka = Message queue for async tasks (email, SMS, analytics)
  • Kafka Partitions = Enable parallel consumption, increase throughput
  • CDN = Edge servers worldwide, serve images/CSS/JS from nearest location
  • CDN for static only, NOT for dynamic API responses or user data
  • Rate Limiting = Different limits for different endpoints (strict for orders, relaxed for browsing)
  • Redis for Rate Limiting = Fast shared counters across all servers
  • Cache Stampede = When cache goes down, all requests hit DB. Prevent with Redis Cluster + replication
  • No single strategy is enough - use ALL 7 together for true scale!

What's Next?

Now you understand how companies like Amazon and Flipkart handle billions of requests! In the next episode, we can explore:

  • Microservices Architecture - How Amazon splits into 1000+ services
  • Kubernetes - Container orchestration at scale
  • Database Replication - Master-Slave and Read Replicas
  • System Design Interview Preparation

Keep coding, keep learning! See you in the next one!