Streaming

Applies to: Developer · Team · Enterprise — Last reviewed 2026-06-29

Set stream: true to receive the response as Server-Sent Events. Tokens arrive chunk by chunk and end with a data: [DONE] line. Provider fallback can still occur before the first token.

Stream with the SDK

stream = client.chat.completions.create(
    model="{{default_model}}",
    messages=[{"role": "user", "content": "Stream a haiku."}],
    max_tokens={{max_tokens}},
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)
curl -N {{api_base}}/chat/completions \
  -H "Authorization: Bearer $GOTOAI_API_KEY" \
  -d '{"model":"{{default_model}}","messages":[{"role":"user","content":"Stream a haiku."}],"stream":true,"max_tokens":{{max_tokens}}}'

Wire format

data: {"choices":[{"delta":{"content":"Hel"}}]}

data: {"choices":[{"delta":{"content":"lo"}}]}

data: [DONE]
Billing still finalizes after the stream completes — usage and charge are recorded once the full response is known.

Next steps