← Blog

Why you can't just put a CDN in front of GraphQL

9 min read Kay Schecker
#graphql #caching #edge #performance #cdn

image

It’s one of the most obvious ideas the moment a GraphQL API starts buckling under load: “Can’t we just put a CDN in front of it?” The question comes up on almost every team scaling GraphQL: in architecture reviews, in Stack Overflow threads, in Slack. It sounds reasonable: CDNs made REST fast and cheap to scale, so surely the same trick works one layer up. Then you try it, and within an hour you understand why GraphQL caching is its own product category instead of a config flag. That’s exactly why it needs its own tooling and processes. And that’s what this post is about.

The short version: GraphQL breaks almost every assumption that HTTP caching is built on. And for good reason, because that same flexibility is what makes GraphQL so powerful in the first place. Some of the resulting problems you can solve in an afternoon. One of them, invalidation, is genuinely hard, and it’s where most home-grown attempts quietly fall apart. But it is solvable: there are good tools that let you master these difficulties.

The first challenge: one endpoint, one verb, everything in the body

HTTP caching is keyed on the request. A CDN looks at the method and the URL, maybe a header or two, and decides whether it has seen this exact thing before. GET /users/1 is trivially cacheable: the URL is the cache key, and Cache-Control tells the CDN how long to keep it.

GraphQL throws that model out. You have a single endpoint, usually /graphql, and almost everything goes through POST with the actual query sitting in the request body:

POST /graphql
Content-Type: application/json

{
  "query": "
    {
      user(id: 1) {
        name
        posts {
          title
        }
      }
    }
  "
}

To a stock CDN, every request to your API looks identical: same method, same URL. It has no idea that one body asks for a user’s name and the next triggers a 50-table aggregation. It can’t tell two requests apart, so it can’t safely cache either. The thing that made REST cacheable, namely a meaningful URL, is gone.

So before you can cache anything, you have to teach the cache to look inside the request.

Making GraphQL cacheable at all (the easy 80%)

This part is mostly solved, and if you only need read caching you can get a long way:

Normalize the query into a stable key. Two requests that ask for the same data can look different on the wire: different whitespace, reordered fields, variables inline versus separated. You parse the query, canonicalize it (sort fields, extract variables, strip noise), and hash the result. Now logically identical requests collapse to the same cache key.

Get it past the CDN layer. Because bodies and POST are awkward to cache, you lean on GET requests plus persisted queries: the client sends a hash of a known query instead of the full text, and that hash becomes part of a cacheable URL. Automatic Persisted Queries (APQ) are the common flavor.

Assign TTLs from the schema. Not all data ages the same way. A product catalog can be stale for minutes; an account balance cannot. You annotate types and fields with a max age and let the cache apply per-type, per-field TTLs instead of one blunt number for the whole response.

Do those three things and you have a working read cache. This is the part vendors demo and the part you can build yourself. It is not the part that hurts.

The hard part: invalidation

TTL-only caching forces an unpleasant trade-off. Short TTLs mean low hit rates; you’re barely caching. Long TTLs mean you serve stale data, and “the dashboard showed the old number for five minutes” is the kind of bug that can erode trust fast. What you actually want is to cache aggressively and drop a cached response the instant the underlying data changes.

Here’s where GraphQL’s flexibility turns against you. In REST, a write maps cleanly to a resource: PUT /users/1 invalidates the cache entry for /users/1. There’s a URL to purge. In GraphQL there’s no URL, and worse: a single mutation can invalidate fragments of many unrelated cached responses. Change one user’s name and you’ve potentially staled every cached query that embedded that user: a profile page, a comment list, a search result, an admin table, each cached under a different key.

You can’t purge by URL because there isn’t one. So you invalidate by what’s inside the responses instead.

The standard approach is surrogate keys (also called cache tags). When you cache a response, you walk it and record every entity it contains, typically __typename + id, e.g. User:1, Post:42. Those become tags attached to the cached entry. When a mutation comes through, you figure out which entities it touched and purge every cached entry tagged with them. User:1 changes → drop everything tagged User:1, regardless of which query produced it.

Surrogate-key invalidation A mutation names the entity User:1; the edge proxy purges every cached response tagged User:1, across all locations. MUTATION updateUser(id:1) EDGE PROXY extracts entities → User:1 Profile query ✕ purged · tag User:1 Comment list ✕ purged · tag User:1 Search results ✕ purged · tag User:1
One mutation names User:1; the proxy purges every cached response tagged with it, across all locations, no matter which query produced it.

Conceptually clean. The trouble is in everything the clean version ignores.

Where it gets nasty

This is the part nobody puts in the marketing copy:

  • Lists and pagination. A mutation creates a new post. The new entity has an ID that doesn’t appear in any cached response yet, so tag-based purging can’t find the lists it should now appear in. “Invalidate the entity” doesn’t help when the problem is “a collection should have grown.” You end up needing coarser, list-level invalidation, which claws back some of the hit rate you fought for.
  • Mutations that don’t return the changed entity. Your tagging logic learns which entities to purge by inspecting payloads. A mutation that returns just { success: true } tells you nothing about what it changed. Now you’re guessing, or you’re forcing schema conventions on every mutation.
  • Derived and aggregate fields. commentCount, computed totals, or “is this in stock” depend on entities that aren’t directly named in the response. The thing that changed and the field that’s now wrong don’t share an ID.
  • Authorization and personalization. The same query returns different data per viewer. Cache it naively and you leak one user’s data to another. That’s a security bug, not a performance one. Cache it correctly and your key space explodes by user, gutting your hit rate. Deciding what’s safely shared versus per-viewer is a design problem with no universal answer.
  • Partial responses and errors. GraphQL can return data and errors in the same 200 response. Do you cache a partial result? A response that succeeded for three fields and failed for one? There’s no HTTP status to lean on.
  • Schema changes. Deploy a new schema and yesterday’s cached responses may no longer match today’s shape. Your cache has to know when its assumptions expired.

The good news: none of these is insurmountable. But together they’re why “we’ll just add caching to our GraphQL gateway” turns into a quarter of work.

Why the edge makes it harder, and why it’s still worth it

Everything above is true even with a single, central cache. Push it to the edge, meaning many points of presence close to users, and you inherit a distributed-systems problem on top of the GraphQL problem.

Now your cache state is spread across many locations. A purge isn’t an instant local delete; it has to propagate, and during that window different users can get different answers. A write in one region races against a read in another. The index that maps entities to cached responses has to be consistent enough to be trustworthy and fast enough to consult on the hot path. Because at the edge you can’t burn real compute per request without giving back the latency you came for.

So why bother? Because when it works, you move read traffic off your origin entirely and serve it from near the user, and your database stops being the thing that falls over at peak. The payoff is real. It’s just that the gap between “naive TTL cache” and “correct, invalidating, edge-distributed cache” is exactly where the engineering lives.

What good looks like

Whether you build this or buy it: these are the properties worth insisting on:

  • Schema-driven cache configuration. Cacheability, TTLs, and scoping declared alongside the schema, so reviewable, versioned, and diffable, not scattered across ad-hoc rules.
  • Entity-level surrogate keys with a real purge API. Tagging plus a way to say “drop everything touching User:1” on demand, not just on a timer.
  • Explicit auth scoping. A deliberate decision about what’s shared versus per-viewer, enforced by default. Personalized data should never be cacheable by accident.
  • Observability. Hit rate, staleness, and purge lag, visible. You cannot tune an invalidation strategy you can’t measure, and “it feels faster” is not a metric.

Closing

GraphQL caching is easy to start and hard to finish. The cacheable-key part is a weekend; correct invalidation across personalized, paginated, edge-distributed responses is why this becomes its own product category and not a feature you flip on in passing. If you’ve shipped a GraphQL API and watched your origin strain under read traffic, you’ve felt the pull toward the edge, and probably the pain of doing it right.

That’s exactly the problem we’re building GraphPilot to solve. It’s not open yet; if you run GraphQL in production and want early access, or want to shape it as a design partner, you can join the waitlist. I’d genuinely rather hear which of the edge cases above is hurting you than guess, so if you want to compare notes, I’m discussing this over on dev.to. Just jump into the comments there.

Pre-flight boarding is open

Reserve your seat for early access.
No credit card, no commitment, just a head start.