Cache Invalidation Strategies

TL;DR / Cache invalidation is the problem of knowing when cached data is stale, solved through three fundamental approaches: time-based expiry, event-based signals, and version-based fingerprinting.

How It Works

                ┌───────────────┐
                │ Cache: stale? │
                └───────────────┘
                        │
        ┌───────────────└─────────────────────┐
        │                 │                   │
        ↓                 ↓                   ↓
 ┌────────────┐    ┌─────────────┐    ┌───────────────┐
 │ Time-Based │    │ Event-Based │    │ Version-Based │
 └────────────┘    └─────────────┘    └───────────────┘

  max-age=3600     on mutation ->  content hash
                   invalidate      changes URL

Edit diagram

Phil Karlton's famous observation that cache invalidation is one of the two hard problems in computer science persists because no single strategy works universally. Every approach trades off between freshness (how quickly you see updated data), efficiency (how few unnecessary requests you make), and complexity (how much infrastructure you need).

Time-Based Invalidation (TTL)

The simplest strategy: data is considered valid for a fixed duration after it was cached. HTTP's Cache-Control: max-age=3600 tells the browser to reuse the cached response for 3600 seconds without contacting the server at all. After expiry, the cache must revalidate.

TTL works well for data that changes on predictable schedules or where brief staleness is acceptable. A news site's homepage might use a 60-second TTL -- you might see a headline one minute late, but the server handles far fewer requests. API responses for configuration data that changes hourly are another natural fit.

The weakness is obvious: TTL has no relationship to when data actually changes. If you set a 1-hour TTL and the data changes at minute 2, clients see stale data for 58 minutes. If the data never changes, clients still revalidate every hour. You are either stale too long or checking too often.

Short TTLs mitigate staleness at the cost of higher origin load. Long TTLs reduce load but increase staleness. There is no TTL value that is correct for data with irregular update patterns.

Event-Based Invalidation

Instead of guessing when data will change, event-based invalidation uses explicit signals. When the underlying data mutates, a notification propagates to all caches that hold copies. This is the model behind CDN purging APIs, database change notifications, and pub/sub cache invalidation.

In a frontend context, this manifests as cache busting on mutation. When a user updates their profile, the API response includes a cache-invalidation signal (either directly, or through a websocket/SSE channel) telling the client to discard the cached profile. GraphQL clients like Apollo and Relay implement this through normalized caches -- when a mutation updates entity X, all queries containing entity X are automatically invalidated.

The implementation complexity is significant. You need a reliable notification channel, a mapping from data changes to affected cache entries, and handling for missed events (network disconnections, crashed workers). Distributed systems face the additional challenge of propagation delay across nodes.

Version-Based Invalidation

Version-based strategies attach a fingerprint to cached data and use it to detect changes. HTTP ETags are the protocol-level implementation: the server generates a hash of the response body, sends it as the ETag header, and the client sends If-None-Match on subsequent requests. If the hash matches, the server responds with 304 Not Modified (no body). If not, it sends the full 200 response.

For static assets, content-based hashing (e.g., app.a1b2c3.js) is the gold standard. The filename itself is the version. You serve these files with Cache-Control: immutable, max-age=31536000 -- cached forever, never revalidated. When the content changes, the filename changes, which is a new URL and therefore a new cache entry. The old entry eventually evicts naturally.

This eliminates both the staleness problem (new content gets a new URL immediately) and the revalidation overhead (existing entries are never checked). The cost is build tooling complexity: you need content hashing in your build pipeline and a way to update references (HTML, manifests, import maps) to point to the new filenames.

Combining Strategies

Production systems rarely use a single strategy. A typical pattern: static assets use content-hashed filenames (version-based, immutable caching). HTML documents use short TTLs (time-based, 5 minutes) combined with ETags (version-based revalidation after TTL expiry). API data uses event-based invalidation through normalized client caches, with a TTL fallback for when the real-time channel disconnects.

The stale-while-revalidate directive bridges time-based and version-based: serve the cached response immediately (even if the TTL has expired) while revalidating in the background. This gives users instant responses while ensuring the cache converges to fresh data within one additional request cycle.

Frontend-Specific Patterns

React Query, SWR, and similar libraries implement sophisticated multi-layer invalidation. They maintain an in-memory cache keyed by query parameters, with configurable staleTime (TTL for considering data fresh) and cacheTime (TTL for keeping stale data in memory). Mutations can explicitly invalidate specific query keys (event-based). Background refetching on window focus provides an implicit revalidation trigger.

Gotchas

TTL-only caching for mutable data guarantees periodic staleness -- the gap between data change and TTL expiry is an unavoidable window of stale responses. For data where staleness has consequences (prices, inventory), TTL alone is insufficient.
Content-hashed filenames require updating all references atomically -- if the HTML referencing app.old.js is cached while app.new.js is deployed, users fetch the old HTML pointing to a now-missing old bundle. This is why HTML should never be cached immutably.
CDN purge APIs are eventually consistent -- requesting a purge does not guarantee immediate global invalidation. Edge nodes may serve stale content for seconds to minutes after a purge request.
Normalized cache invalidation in GraphQL clients only works for entities the client has previously fetched -- if a mutation affects a list query the client has not run, that query is not invalidated because the client does not know the mutation is relevant.
Browser back/forward cache (bfcache) bypasses all HTTP caching headers -- pages restored from bfcache may show data that was stale even before the user navigated away.