API reference
Streaming (SSE)
How to consume token-by-token streaming responses.
Last updated: 2026-05-19
Streaming (SSE)
Add "stream": true to any chat completion request. The response is Server-Sent Events (text/event-stream), OpenAI-compatible.
Wire format
Each event line starts with data: followed by a JSON payload. The stream ends with data: [DONE].
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"Ciao"}}]}
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","model":"apertus-70b-instruct","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":2}}
data: [DONE]
curl
curl -N https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "apertus-70b-instruct",
"stream": true,
"stream_options": {"include_usage": true},
"messages": [{"role": "user", "content": "Conta da 1 a 10."}]
}'
The -N flag disables curl's buffering — essential to see tokens as they arrive.
Python (openai SDK)
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.siati.ai/v1",
api_key=os.environ["SIATI_API_KEY"],
)
stream = client.chat.completions.create(
model="apertus-70b-instruct",
messages=[{"role": "user", "content": "Conta da 1 a 10."}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
JavaScript / TypeScript (fetch + ReadableStream)
const res = await fetch("https://api.siati.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SIATI_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "apertus-70b-instruct",
stream: true,
messages: [{ role: "user", content: "Conta da 1 a 10." }],
}),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
for (const line of buf.split("\n")) {
if (!line.startsWith("data: ")) continue;
const data = line.slice(6).trim();
if (data === "[DONE]") return;
const chunk = JSON.parse(data);
const delta = chunk.choices?.[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
buf = buf.split("\n").pop() || "";
}
Reconnection
SSE has built-in reconnection support via the Last-Event-ID header in the browser's EventSource. We don't currently resume mid-stream (resumption mid-completion is not part of the OpenAI spec either). On disconnect, the request is closed and any partial work is billed (you only pay for what was actually generated).