Skip to content

Latest commit

 

History

History
374 lines (234 loc) · 14.1 KB

File metadata and controls

374 lines (234 loc) · 14.1 KB

What is a CDN?

Layman's Analogy

Imagine a bookstore in New York that sells a best-selling book. Now, people in Tokyo and Bangalore also want this book. Shipping it from New York every time is slow and expensive.

To solve this, the publisher creates mini-warehouses (distribution centers) in Tokyo and Bangalore. These warehouses keep a local copy of the book.

Now, customers from Tokyo and Bangalore get the book faster, cheaper, and without overloading the New York store.

That’s a CDN — a global network of servers that store cached versions of web content closer to users.


Technical Definition

A Content Delivery Network (CDN) is a geographically distributed network of proxy servers and data centers. The goal is to deliver static and dynamic content (images, videos, CSS, JS, HTML) quickly by caching it closer to the user's location.

When a user makes a request:

  • If the CDN has the content → it serves it locally (called a cache hit)
  • If not → it fetches from the origin server, caches it, and then serves it (called a cache miss)

Example Scenario: Global Full Stack App

Imagine a full-stack app hosted in Oregon, USA. The user base is in:

  • 🇺🇸 America (fast access)
  • 🇯🇵 Japan (slow)
  • 🇮🇳 India (slow)

Problems Without CDN:

  • High latency for Asia (200ms+ round-trip)
  • Heavy bandwidth usage on origin server
  • Poor user experience (slow loading time)
  • App crashes under regional traffic spikes

What CDN Fixes:

  • Japan & India users get content from nearby CDN edge servers (e.g., Tokyo, Mumbai)
  • Static files load 3–4x faster
  • Reduces origin server load by serving cached content
  • Helps with DDoS protection, SSL termination, and content routing

How CDN Works – End-to-End Flow

User -> DNS resolves to nearest edge node (via Geo DNS or Anycast)
    |
CDN checks if content is cached
    |
[Cache Hit] -> Serve from edge
    |
[Cache Miss] -> Fetch from origin -> cache it -> serve

CDNs use TTL (Time-to-Live) for caching duration. After that, they revalidate or refetch.


CDN vs Redis – What's the Difference?

Feature CDN (e.g. Cloudflare, Akamai) Redis (e.g. in-memory distributed cache)
Scope Global network serving static files In-memory store for application data
Use Case Faster delivery of JS, CSS, images, HTML Fast retrieval of computed/query data
Where it runs Edge locations (globally) Inside VMs, clusters, or nodes
Protocol HTTP/S TCP/IP (binary/text protocols)
Auto population Pull model (on cache miss) Push model (manual caching by app)

CDN is for website/static content delivery, while Redis is for low-latency data access in apps.


When to Upload Content to CDN?

There are two modes:

  1. Push CDN (manual upload):

    • Developer uploads static assets to CDN in advance.
    • Used for large file distribution (e.g., game assets).
    • Examples: Amazon CloudFront (with S3), BunnyCDN.
  2. Pull CDN (lazy caching):

    • CDN fetches from the origin on demand and caches it.
    • Common for modern SPAs (React, Vue, etc.)
    • Examples: Cloudflare, Akamai, Fastly

In both, cache invalidation is controlled via:

  • TTL headers
  • Versioning of files (e.g., main.v3.js)
  • Cache purge APIs

Why Not Just Deploy Apps in Each Region?

It sounds logical but comes with tradeoffs:

Option Pros Cons
Multiple App Deployments Faster access Harder sync, costlier, requires DB/data sync
CDN Simple, fast for static content Doesn’t help with DB/data-driven pages directly

Deploying to each region requires:

  • Multi-region database replication
  • Failover handling
  • Session and consistency logic

CDN solves 80% of the problem with 10% of the effort.


Popular CDN Providers

Provider Features Link
Cloudflare DDoS protection, free tier, edge workers cloudflare.com
Akamai Enterprise-grade CDN, advanced analytics akamai.com
Fastly Real-time config, edge logic fastly.com
Amazon CloudFront Integrated with AWS infra aws.amazon.com/cloudfront
Bunny CDN Affordable, high-speed bunny.net

When to Use a CDN?

Ideal Situations:

  • Static-heavy websites (marketing, blogs, SPAs)
  • Global traffic distribution
  • APIs serving cached responses
  • Media delivery (videos, images, fonts)
  • Protecting app from DDoS

Not Ideal:

  • Real-time dynamic pages with no cache headers
  • Private or sensitive data delivery
  • Heavily personalized content per user

Summary – CDN in One Line:

A CDN is a global cache layer that accelerates and protects static content delivery across regions, without needing to deploy the app everywhere.

Excellent question. Let's break this down step-by-step like a story, clarifying how CDNs and edge servers actually work behind the scenes and how a call to something like mySampleApp.com doesn't always go directly to the main application.


Scene: User Accesses mySampleApp.com

Step 1: DNS Lookup Happens First

When a user in India opens a browser and types https://mySampleApp.com, the browser doesn’t instantly contact the origin server (e.g., hosted in Oregon, USA). Instead:

  1. A DNS query is sent to resolve mySampleApp.com into an IP address.
  2. If a CDN is configured, the DNS response is intercepted by the CDN provider.
  3. The CDN (e.g., Cloudflare, Akamai, Fastly) uses GeoDNS or Anycast routing to route the user to the nearest edge server (say, in Mumbai).

Step 2: Nearest Edge Server (CDN Node) Handles It

The edge server is a local server part of the CDN provider’s global network. Its job is:

  • To serve cached static files (like JS, CSS, HTML, images).
  • To reduce distance between user and content.
  • To act as a shield between the user and origin server.

So the user never talks directly to the main app/server unless necessary.


Step 3: Cache Hit or Miss

  • Cache Hit: If the edge server in Mumbai already has the latest version of the file, it serves it immediately.
  • Cache Miss: If not, it pulls from the origin server (e.g., in Oregon), caches it, and then sends it to the user.

The next time another user in India requests the same file → boom! Cache hit.


Step 4: Full App Loading

Now think of a React/Vue frontend:

  • All the static files (JS, CSS, HTML) are cached on edge servers.
  • When the user loads mySampleApp.com, the files come from Mumbai CDN.
  • The actual API calls, like fetching user data or transactions, may still go to the origin server unless the APIs are also behind a caching CDN (which is possible with modern CDNs + reverse proxies).

What Is an Edge Server?

  • A physical server in a CDN provider's PoP (Point of Presence) close to the user.
  • It acts like a local mirror of your app's static assets or even API data.
  • Handles requests faster than sending them halfway around the world.

How Does the Call Reach the Nearest Edge Server?

Here's the technical magic:

Mechanism Description
GeoDNS DNS resolution based on geographical IP lookup (resolves to nearest CDN node).
Anycast IPs All CDN edge servers advertise the same IP; routing protocols like BGP ensure user is directed to nearest one.
CDN Proxy Once resolved, the IP points to the CDN edge node → it proxies to origin if needed.

Is the Application Controlling This?

No. The CDN and DNS system handle all routing transparently. The application isn't writing logic like "if user is in Japan, serve from Tokyo".

This is all handled via:

  • CDN configuration
  • Cache-control headers
  • DNS settings

Summary in 1 Line

Users don’t talk directly to the app server; their requests are routed to the nearest edge server via DNS and Anycast, and the edge server handles static assets or proxies to origin for fresh data.

Flow of Request (CDN + Edge Server + App)

  1. User enters mySampleApp.com in the browser.

  2. The DNS resolves to a CDN edge server closest to the user's region (using GeoDNS or Anycast routing).

  3. The CDN edge server (nearest one):

    • Checks if the static content (JS, CSS, HTML, images) is already cached.
    • If cached → immediately serves it (cache hit).
    • If not cached → fetches it from the origin app server, then caches it for the next request (cache miss).
  4. For API calls or dynamic content (e.g., /api/orders, /api/user/123):

    • These requests usually bypass the CDN and go directly to the backend app.
    • The backend app might be hosted in one region (e.g., Oregon), or across regions in a multi-region setup.
    • The backend server talks to the database, processes, and returns dynamic content to the frontend.

Summary of What Happens and When

Content Type Served From Notes
Static files (JS, CSS, HTML, images) CDN edge servers (cached) Fastest access, reduces origin load
First-time static file request CDN fetches from origin → caches it Cache miss followed by cache fill
API/data calls (user-specific/dynamic) Origin app/backend server Not cached by CDN unless specifically configured
CDN TTL expires CDN re-fetches from origin Ensures freshness

Extra Insight: Can API Calls Be Cached?

Yes, but only:

  • If the data is not user-specific or very frequently changing, e.g., /api/products/top-sellers.
  • With correct HTTP cache headers (Cache-Control, ETag, etc.).
  • With CDN rules like those in Cloudflare, Akamai, or Fastly.

For most real-time or user-specific data, caching is avoided to ensure data accuracy.


1. How to Control What Gets Cached

A. By File Type (Default CDN Behavior)

Most CDNs automatically cache static file types like:

  • .js, .css, .png, .jpg, .html, .svg, .woff, etc.

🛠 B. Manual CDN Rules (Configuration)

  • On platforms like Cloudflare, AWS CloudFront, or Akamai, rules can be defined:

    • Which paths or extensions to cache
    • What headers to ignore or respect
    • How long to cache (TTL)
    • Whether to cache API responses or not

Example (CloudFront Rule): /static/* → cache for 7 days /api/* → do not cache


2. By Code – Using HTTP Cache Headers

This is the developer-controlled method via backend responses or web server config:

Header Purpose
Cache-Control Defines caching policy (e.g., max-age=3600, no-store)
ETag A unique content identifier for conditional caching
Expires Explicit expiration date/time
Vary Indicates which headers CDN should consider when caching (e.g., Vary: User-Agent)

Example: Static File Response Headers

Cache-Control: public, max-age=86400
ETag: "v1.3.4"

Example: API Response (no caching)

Cache-Control: no-store, no-cache, must-revalidate

3. Manual Upload or Push (for CDN-backed storage)

For CDNs integrated with object storage like:

  • S3 + CloudFront
  • Azure Blob + Azure CDN
  • Google Cloud Storage + Cloud CDN

Assets can be manually uploaded, and CDN pulls them (pull-based), or explicitly pushed to edge servers (push-based).


4. Cache Invalidation / Purging

Sometimes, content needs to be updated before TTL expires:

  • Invalidate specific paths/files via CDN panel or CLI
  • Use API-based purge (e.g., Fastly, Akamai, Cloudflare offer this)
  • Use versioned file names (main-v2.css) so the browser/CDN sees it as new

Quick Scenario Flow

  • Backend serves static assets with headers: Cache-Control: max-age=3600
  • CDN caches these for 1 hour (controlled by header)
  • Developer updates main.js, uploads main-v2.js, updates index.html
  • CDN sees new file → caches that version too

Key Takeaway:

  • CDN behavior is a contract between:

    • Developer-defined headers
    • CDN rules
    • File paths/types

It’s fully controllable from both code and infrastructure config.