Caching Strategies and Performance¶

Caching is one of the most effective performance optimizations. Understanding cache patterns, invalidation, and the full caching hierarchy is critical for architecture decisions.

Cache Hierarchy¶

Layer	Latency	Scope	Example
L1/L2/L3 CPU cache	1-10 ns	Single core/CPU	Hardware
In-process cache	~100 ns	Single instance	HashMap, Guava, Caffeine
Distributed cache	~1 ms	Cross-service	Redis, Memcached
CDN cache	1-50 ms	Global edge	CloudFlare, CloudFront
Database cache	1-10 ms	Single DB	Materialized views

Cache Patterns¶

Cache-Aside (Lazy Loading) - Most Common¶

Application checks cache first. On miss, reads from DB, populates cache. - Pro: simple, application controls caching logic - Con: stale data until TTL or explicit invalidation; cache miss = slow first request

Read-Through¶

Cache sits between app and DB. On miss, cache itself loads from DB. - Pro: simpler application code - Con: cache must understand data source

Write-Through¶

Every write goes to cache AND DB synchronously. - Pro: strong consistency, no stale data - Con: higher write latency

Write-Behind (Write-Back)¶

Writes go to cache immediately, flushed to DB asynchronously. - Pro: lowest write latency - Con: risk of data loss if cache fails before flush

Refresh-Ahead¶

Proactively refreshes entries before expiration. Reduces cache miss latency for hot data. Requires access pattern prediction.

Cache Invalidation Strategies¶

Strategy	Mechanism	Trade-off
TTL	Entries expire after fixed duration	Simple but serves stale data within window
Event-driven	Invalidate on data change (events, CDC)	Fresh data but more complex
Versioned keys	Include version in key (`user:123:v5`)	No explicit invalidation, old entries expire

Cache Stampede Prevention¶

When popular key expires, many requests simultaneously hit DB. - Locking - only one request refreshes, others wait - Probabilistic early expiration - add jitter so keys expire at different times - Pre-computation - refresh before expiration

Redis vs Memcached¶

Feature	Redis	Memcached
Data types	Strings, hashes, lists, sets, sorted sets, streams	Key-value only
Persistence	RDB snapshots, AOF log	None
Clustering	Yes	Consistent hashing
Pub/Sub	Yes	No
Threading	Single-threaded (multi in 7.0+)	Multi-threaded
Best for	Complex data, persistence, pub/sub	Simple caching, large values

Redis common uses: session store, rate limiting, leaderboards, real-time analytics.

Caching at Different Layers¶

CDN - static assets (images, CSS, JS), Cache-Control headers, ETags
API Gateway - cache API responses, TTL per endpoint
Application - object caching, query result caching
Page/fragment - rendered HTML for CMS, product pages

Load Balancing¶

Algorithms¶

Round Robin / Weighted Round Robin - equal or proportional distribution
Least Connections - send to least busy server
IP Hash - sticky sessions (same client, same server)
Consistent Hashing - for cache distribution

L4 vs L7¶

Layer	Level	Capabilities
L4	TCP/UDP	Faster, no content inspection
L7	HTTP	Content-based routing, SSL termination, caching

Health Checks¶

Active - probe backend periodically
Passive - monitor real traffic errors
Remove unhealthy backends from pool

Session Affinity¶

Sticky sessions simplify stateful apps but reduce even distribution. Prefer stateless backends with external session store (Redis).

Performance Optimization Checklist¶

Profile first - measure, identify bottlenecks, optimize biggest impact
Database - indexing, EXPLAIN ANALYZE, N+1 resolution, connection pooling, read replicas
Connection pooling - reuse DB/HTTP connections, configure pool size per concurrency
Compression - gzip/brotli for text content (JSON, HTML, CSS, JS)
Async processing - heavy work to background queues, return job ID immediately

Gotchas¶

Cache invalidation is one of the two hardest problems in computer science (alongside naming things)
Stale cache after deploy - new code reads data in new format, cache has old format. Clear cache on deploy or use versioned keys
Cache warming - cold cache after restart means all requests hit DB. Pre-warm critical keys
Distributed cache network - 1ms per Redis call adds up. Batch operations with pipelines
CDN cache poisoning - caching error responses at CDN serves errors to all users. Set Cache-Control: no-store for error responses