Streaming structured data from an HTTP endpoint

Mar 20, 2025

I was trying to stream LLM responses through a FastAPI backend to show real-time updates, but ended up spending more time than I anticipated. I'm writing this in case someone else ends up trying to solve a similar problem, and this guide might help avoid the same pitfalls.

streaming-text-data

What is HTTP streaming?

HTTP streaming involves sending data in small, sequential chunks over a standard HTTP response, allowing the client to receive updates in real time. One common approach is using server-sent events (SSE), where the response has a Content-Type of text/event-stream. In this pattern, data is often formatted as JSON strings and sent as plain text using the SSE message format.

FastAPI endpoint for streaming text data as SSE

Here’s a FastAPI endpoint that streams text data (mocking the LLM response):

async def number_generator():
    counter = 0
    while True:
        await asyncio.sleep(0.1)
        data = json.dumps({"number": counter})
        yield f"data: {data}\n\n".encode("utf-8")
        counter += 1
        if counter == 10:
            break
 
@router.get("/stream")
async def stream_numbers():
    return StreamingResponse(
        number_generator(),
        media_type="text/event-stream",
        headers={
            "Connection": "keep-alive",
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
        },
    )
 

Note: The header "X-Accel-Buffering": "no" is used to disable response buffering by reverse proxies like Nginx. Without this, the proxy might wait to collect a large chunk of data before sending it to the client, which defeats the purpose of real-time streaming.

This endpoint uses StreamingResponse to continuously send data chunks in real time. The number_generator function creates a live data stream by yielding JSON-formatted messages one at a time.

Consuming the stream on the client

The client can consume this stream using either EventSource or by directly reading the response body with a ReadableStream. Below is an example using the Fetch API and a ReadableStream to handle SSE-formatted messages.

const res = await fetch(`/stream`, {
  headers: {
    Accept: "text/event-stream",
  },
});
if (!res.body) throw new Error("No response body");
 
const reader = res.body.getReader();
const decoder = new TextDecoder();
let done = false;
 
while (!done) {
  const { value, done: readerDone } = await reader.read();
  done = readerDone;
 
  if (value) {
    const chunk = decoder.decode(value, { stream: true });
 
    const messages = chunk
      .split("\n\n")
      .filter((msg) => msg.startsWith("data:"));
 
    for (const message of messages) {
      const jsonData = message.slice(5).trim();
      try {
        const parsedData = JSON.parse(jsonData);
        const content = parsedData.message?.content || "";
        setMessageStream((prev) => prev + content);
      } catch (e) {
        console.error("Failed to parse JSON:", e);
      }
    }
  }
}

The code above fetches the /stream endpoint and reads the response body using a ReadableStream. Each chunk is decoded, split into individual SSE messages, and parsed as JSON before updating the message stream.

Streaming data from an LLM in real time can feel a bit tricky at first, but once the pieces fit together with the backend and a readable stream on the frontend, it becomes a powerful pattern. Hopefully, this guide makes things a little easier if you’re setting up something similar.