Web Request, DNS, and CDN Caching Reference Guide¶
DNS Resolution¶
DNS (Domain Name System) translates human-readable domain names into IP addresses so browsers can reach the correct server.
Resolution Process¶
- Browser/OS cache: Checks if the IP is already known.
- Local DNS resolver: If not cached, the browser queries the resolver provided by the ISP.
- Recursive lookups:
- Root servers: Point to the appropriate TLD server (e.g.,
.com,.net). - TLD servers: Direct to the authoritative servers for the domain.
- Authoritative servers: Provide the actual IP address of the domain.
Key Points¶
- ICANN oversees global domain management.
- Registries manage TLDs; registrars sell domains to end users.
- Name servers can be operated by hosting companies, CDNs, or managed DNS providers.
DNS Records¶
- A/AAAA: Maps domain/subdomain directly to an IP.
- CNAME: Maps a domain to another domain name, not an IP.
- Name servers for CDNs or hosting often use CNAMEs, while direct VPS IPs use A/AAAA records.
Diagram: DNS Resolution Flow¶
User Browser
|
v
OS/Browser Cache
|
v
Local DNS Resolver
|
v
Root Server -> TLD Server -> Authoritative Server
|
v
Return IP to Browser
Origin Servers and CDN Basics¶
- Origin server: Your VPS or main web server where content resides.
- CDN (Content Delivery Network): Network of edge servers that cache content globally.
- Benefits of using a CDN:
- Reduced latency
- Load balancing
- Failover support
Edge Servers¶
- Independent servers with local caches.
- Serve content to users closest to them.
- Distributed caching means a cache miss at one edge doesn't imply a global miss.
Multi-Origin Setup¶
- CDNs can be configured with multiple origins.
- Edge servers choose the best origin based on:
- Geography (closest origin)
- Latency measurements
- Availability/failover
Stale-While-Revalidate (SWR)¶
- Allows an edge to serve stale content while asynchronously fetching fresh content.
- Improves user experience but does not reduce total origin requests.
Diagram: CDN Edge Request Flow¶
User Browser
|
v
CDN Edge Server
|-- Cache Hit --> Serve Content
|-- Cache Miss --> Fetch from Origin
|-- Primary Origin Available? --> Yes: Serve & Cache
|-- Primary Origin Down --> Alternate Origin: Serve & Cache
CDN Edge Nodes and Caching Behavior¶
- Edge nodes: Separate physical/virtual servers distributed globally.
- Each node caches content independently.
- Cache hit rate ~85% is typical due to:
- Distributed cache architecture
- Invalidations and purges
- Low TTL content
- Dynamic routing to different nodes
- Multi-origin fallback ensures content delivery even if one origin is unavailable.
Cache Lifecycle¶
- Cache miss triggers fetch from origin.
- Content is stored in the edge node cache.
- Subsequent requests served from cache until TTL expires or cache is invalidated.
Diagram: Distributed Edge Cache Example¶
User A (London) --> Edge Node 1: Cache Miss --> Origin (UK) --> Cached at Node 1
User B (London) --> Edge Node 2: Cache Miss --> Origin (UK or fallback) --> Cached at Node 2
Edge caches operate independently; one node's cache doesn't affect others
Surrogate Keys¶
- Surrogate keys: Labels/tags assigned to cached objects for targeted invalidation.
- Multiple keys per object allowed.
- Useful for grouping content logically (e.g., category, author, event).
- Invalidation by key allows selective cache purges without affecting unrelated content.
Surrogate Key Management¶
- Granularity: Avoid too broad or too narrow keys.
- Naming conventions: Consistency is crucial for automation.
- Automation: Dynamically generate keys based on content metadata.
- Multi-origin/multi-edge consistency: Ensure keys match across all origins and edge nodes.
- Interacts with SWR and TTL to maintain freshness efficiently.