Streaming

Streaming lets you receive the model's response token-by-token as it is generated, rather than waiting for the entire response to complete. This dramatically reduces perceived latency -- the user sees output within milliseconds instead of waiting seconds for a full response.

Why Use Streaming

  • Faster time-to-first-token. The user sees output almost immediately.

  • Better UX for long responses. Progressive rendering feels more responsive than a loading spinner.

  • Real-time applications. Chat interfaces, live coding assistants, and interactive tools all benefit from streaming.

  • Memory efficiency. Process tokens incrementally without buffering the entire response.

How It Works

Set stream: true in your request. The API responds with a stream of Server-Sent Events (SSE) instead of a single JSON response.

Each event is a line prefixed with data: containing a JSON chunk. The stream ends with data: [DONE].

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"The"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" capital"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" France"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" is"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" Paris."},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Key Differences from Non-streaming

Aspect
Non-streaming
Streaming

Response type

Single JSON object

Stream of SSE events

Object type

chat.completion

chat.completion.chunk

Message field

message

delta (incremental)

Content

Complete string

Token-by-token fragments

Usage stats

Included in response

Included in the final chunk (with stream_options)

The delta Object

In streaming, each chunk contains a delta instead of a message. The delta holds only the new content since the last chunk:

  • First chunk: delta has role: "assistant" (and optionally the first content token).

  • Middle chunks: delta has content with the next token(s).

  • Final chunk: delta is empty {}, and finish_reason is set (e.g., "stop" or "tool_calls").

Getting Usage Statistics

By default, streaming responses do not include token usage. To receive usage data, set stream_options:

The final chunk before [DONE] will include a usage field:

Code Examples

cURL

The -N flag disables output buffering so you see tokens as they arrive.

Python (OpenAI SDK -- Synchronous)

Python (OpenAI SDK -- Async)

Python (requests -- Manual SSE Parsing)

For cases where you cannot use the OpenAI SDK:

JavaScript (OpenAI SDK)

JavaScript (fetch -- Browser/Edge)

For browser-based applications or edge runtimes where the OpenAI SDK is not available:

React (Next.js with Vercel AI SDK)

For Next.js applications, the Vercel AI SDK provides a streamlined experience:

Streaming with Function Calls

When the model makes a tool call during streaming, the chunks contain delta.tool_calls instead of delta.content. The function name and arguments arrive incrementally:

Error Handling

Connection Errors

Streaming connections can be interrupted by network issues. Always handle connection errors and implement reconnection logic:

Incomplete Streams

If the stream ends unexpectedly (no [DONE] event), check the last chunk's finish_reason:

  • "stop" -- normal completion.

  • "length" -- hit the max_tokens limit. Increase max_tokens or continue the conversation.

  • "tool_calls" -- the model wants to call a function. Handle the tool call and continue.

  • null -- stream was interrupted. Retry the request.

Timeouts

For long-running streams, configure appropriate timeouts:

Collecting the Full Response

If you need the complete response text (e.g., for logging or saving to a database), accumulate it during streaming:

Next Steps

Last updated