Streaming SSR

TL;DR / Streaming SSR sends HTML to the browser in chunks as each part renders on the server, rather than waiting for the entire page to complete before sending anything.

How It Works

 ┌──────────┐        ┌────────────┐          ┌────────────┐
 │  Server  │───┐    │  <head> +  │          │  Browser   │
 └──────────┘   │───→│  Nav HTML  │          └────────────┘
                │    └────────────┘
                │                            Paints each
                │                            chunk as it
                │    ┌────────────┐          arrives via
                │    │    Main    │          Transfer-
                │───→│  Content   │          Encoding:
                │    └────────────┘          chunked
                │
                │
                │    ┌────────────┐
                │    │  Footer +  │
                └───→│  </html>   │
                     └────────────┘

Edit diagram

Traditional server-side rendering is a blocking operation. The server fetches all data, renders the complete HTML string, and only then sends the response. If a database query takes 500ms, the user stares at a blank screen for those 500ms — even though the header, navigation, and layout could have been sent immediately.

Streaming SSR changes this by using HTTP chunked transfer encoding to send HTML as it becomes available. The server flushes each rendered section to the browser immediately, and the browser progressively paints the page as chunks arrive.

The Mechanics of Streaming

The HTTP/1.1 Transfer-Encoding: chunked header (or HTTP/2 data frames) enables the server to send a response in multiple pieces without knowing the total content length upfront. The server renders the <head> tag and top-of-page content first, flushes it, then continues rendering. Each flush sends another chunk of HTML that the browser can parse and display immediately.

React 18 introduced renderToPipeableStream (Node.js) and renderToReadableStream (Web Streams) specifically for this purpose. These APIs produce a Node.js stream or Web stream that emits HTML chunks as React renders each component. When a component wrapped in <Suspense> is still loading its data, React sends a fallback placeholder and continues streaming the rest of the page. When the data resolves, React sends a small <script> tag that swaps the placeholder with the real content inline — no full-page re-render needed.

Suspense and Out-of-Order Streaming

The real power of streaming SSR emerges with Suspense boundaries. Consider a page with a header, a product listing (fast), and a recommendations section (slow, depends on ML service). Without streaming, the entire page waits for the slow recommendations call. With streaming:

  1. Server sends <head>, header, and product listing immediately.
  2. Server sends a Suspense fallback (loading spinner HTML) for recommendations.
  3. User sees the header and products within 50ms.
  4. When the recommendations data resolves 800ms later, the server streams an inline <script> that replaces the fallback with the real HTML.

This out-of-order completion is critical. The browser displays fast content immediately, and slow content arrives asynchronously — all during the initial page load, before any client JavaScript has executed.

TTFB and FCP Improvements

Streaming SSR dramatically improves Time to First Byte (TTFB) because the server can start sending the response before data fetching completes. First Contentful Paint (FCP) also improves because the browser can begin rendering the initial HTML while waiting for subsequent chunks.

The improvement is most significant when pages have data waterfalls — components that depend on sequential data fetches. Streaming allows independent data fetches to resolve and render in parallel, with each result flushed to the browser as it completes.

Framework Support

Next.js App Router uses streaming SSR by default for all server-rendered pages. Remix supports streaming via deferred data in loaders (using Single Fetch, which replaced the earlier defer() API). SolidStart and Qwik City also support streaming natively. In all cases, Suspense boundaries define the streaming units — each boundary can resolve and stream independently.

Infrastructure Requirements

Streaming requires infrastructure that supports chunked responses. Most Node.js servers (Express, Fastify, Hono) support this natively. However, some reverse proxies, CDN layers, and serverless platforms buffer the entire response before forwarding it, which defeats the purpose. AWS Lambda (standard invocations), for example, does not support streaming without Lambda response streaming mode. Cloudflare Workers and Vercel Edge Functions do support it.

Gotchas

  • Response headers must be sent with the first chunk. You cannot set cookies or redirect after streaming has started. All headers, including status codes, must be decided before the first flush.
  • Error handling changes fundamentally. If a component throws after streaming has begun, you cannot send a 500 status code — the 200 was already sent. Frameworks typically inject a client-side redirect script instead.
  • Some CDNs and proxies buffer responses, negating streaming benefits. Verify that your entire infrastructure chain supports chunked transfer encoding.
  • Head management is tricky. If a deeply nested component needs to inject a <meta> tag or CSS link into <head>, that head has already been sent. You need inline <style> or late-injection strategies.
  • Streaming makes caching more complex. Traditional full-page caching stores one complete response. Streaming responses are harder to cache at the edge without specialized support.