API reference

Streaming (SSE)

How to consume token-by-token streaming responses.

Last updated: 2026-05-19

Streaming (SSE)

Add "stream": true to any chat completion request. The response is Server-Sent Events (text/event-stream), OpenAI-compatible.

Wire format

Each event line starts with data: followed by a JSON payload. The stream ends with data: [DONE].

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"role":"assistant"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"Ciao"}}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":2}}

data: [DONE]

curl

curl -N https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apertus-70b-instruct",
    "stream": true,
    "stream_options": {"include_usage": true},
    "messages": [{"role": "user", "content": "Conta da 1 a 10."}]
  }'

The -N flag disables curl's buffering — essential to see tokens as they arrive.

Python (openai SDK)

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.siati.ai/v1",
    api_key=os.environ["SIATI_API_KEY"],
)

stream = client.chat.completions.create(
    model="apertus-70b-instruct",
    messages=[{"role": "user", "content": "Conta da 1 a 10."}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

JavaScript / TypeScript (fetch + ReadableStream)

const res = await fetch("https://api.siati.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SIATI_API_KEY}`,
    "Content-Type":  "application/json",
  },
  body: JSON.stringify({
    model: "apertus-70b-instruct",
    stream: true,
    messages: [{ role: "user", content: "Conta da 1 a 10." }],
  }),
});

const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  for (const line of buf.split("\n")) {
    if (!line.startsWith("data: ")) continue;
    const data = line.slice(6).trim();
    if (data === "[DONE]") return;
    const chunk = JSON.parse(data);
    const delta = chunk.choices?.[0]?.delta?.content;
    if (delta) process.stdout.write(delta);
  }
  buf = buf.split("\n").pop() || "";
}

Reconnection

SSE has built-in reconnection support via the Last-Event-ID header in the browser's EventSource. We don't currently resume mid-stream (resumption mid-completion is not part of the OpenAI spec either). On disconnect, the request is closed and any partial work is billed (you only pay for what was actually generated).

Streaming (SSE)

Wire format#

curl#

Python (openai SDK)#

JavaScript / TypeScript (fetch + ReadableStream)#

Reconnection#

Wire format

curl

Python (openai SDK)

JavaScript / TypeScript (fetch + ReadableStream)

Reconnection