API reference

Chat sessions (mobile / app contract)

Persistent conversations with auto-title, SSE streaming, and stop-on-demand. The contract used by the iOS/Android apps.

Last updated: 2026-05-24

Chat sessions API

The developer API at api.siati.ai (Bearer key, stateless /v1/chat/completions) is the OpenAI-compatible entry point. For mobile apps and persistent chat UIs, we expose a second, session-oriented API at my.siati.ai/api/v1/chat/sessions/* with:

Stateful conversations stored server-side (no need to ship history every request)
Server-Sent Events streaming with chunk-level delivery
Auto-titled conversations from the first user message
Per-message metrics (TTFT, latency, tok/s, prompt/completion tokens)
Stop-generation endpoint for interrupting long responses

Authentication is JWT Bearer (issued at app login), not API key.

Base URL & auth

Base URL:  https://my.siati.ai/api/v1
Auth:      Authorization: Bearer <jwt>

JWT is obtained from POST /api/v1/auth/login with email/password (returns 30-day JWT). The token is stateless and the app keeps it in secure storage (Keychain on iOS, KeyStore on Android).

Endpoints

List conversations

GET /chat/sessions

Returns the user's last 200 non-archived conversations (no messages, just metadata).

[
  {
    "id": "019e4c12-3a4b-...",
    "title": "Teorema di Pitagora",
    "model": "apertus-70b-instruct",
    "created_at": "2026-05-24T08:12:00+00:00",
    "updated_at": "2026-05-24T08:14:32+00:00",
    "messages": []
  }
]

Get conversation with messages

GET /chat/sessions/{id}

Returns the full conversation including all messages.

{
  "id": "019e4c12-3a4b-...",
  "title": "Teorema di Pitagora",
  "model": "apertus-70b-instruct",
  "created_at": "...",
  "updated_at": "...",
  "messages": [
    {
      "id": "...",
      "role": "user",
      "content": "Spiegami il teorema di Pitagora",
      "created_at": "...",
      "prompt_tokens": 0,
      "completion_tokens": 0,
      "latency_ms": 0
    },
    {
      "id": "...",
      "role": "assistant",
      "content": "Il teorema di Pitagora afferma…",
      "created_at": "...",
      "prompt_tokens": 24,
      "completion_tokens": 187,
      "latency_ms": 0
    }
  ]
}

Create conversation

POST /chat/sessions
Content-Type: application/json

{
  "title": "My new chat",       // optional, defaults to "Nuova conversazione"
  "model": "apertus-70b-instruct" // optional, defaults to qwen2.5:1.5b
}

Returns the new conversation (201 Created). Use the returned id for subsequent messages.

Rename / archive

PATCH /chat/sessions/{id}
Content-Type: application/json

{
  "title": "Renamed conversation"
}

DELETE /chat/sessions/{id}

DELETE is a hard delete — conversation and all messages are removed (FK cascade). Use the dashboard "Archivia" if you want soft-archive instead.

Send a message (the main one)

POST /chat/sessions/{id}/messages
Content-Type: application/json
Accept: text/event-stream

{
  "content": "Spiegami il teorema di Pitagora",
  "model":   "apertus-70b-instruct",
  "tier":    "fast",            // optional: slow|medium|fast|ludicrous
  "backend": "spark"            // optional: family name (mac-mini, l40-b, spark, bigguy, inference-vm)
}

Response: text/event-stream (SSE). Chunks arrive in order:

data: {"type":"delta","text":"Il teorema "}

data: {"type":"delta","text":"di Pitagora "}

data: {"type":"delta","text":"afferma che "}

...

data: {"type":"done","prompt_tokens":24,"completion_tokens":187,"latency_ms":4823,"outcome":"ok","title_job_queued":true}

Event types:

`type`	Fields	Meaning
`delta`	`text`	Append `text` to the assistant message buffer
`done`	`prompt_tokens`, `completion_tokens`, `latency_ms`, `outcome`, `title_job_queued`	Stream complete. Save the accumulated assistant message.
`error`	`code`, `message`	Upstream/inference error. `outcome` will be `error` on the next `done`.

About `title_job_queued`

When the response is for the first user message of a conversation, the server queues a background job (GenerateConversationTitle) that uses Qwen 2.5 7B to write a concise title. The job typically completes in 1-2 seconds after the done event.

The flag title_job_queued: true in the done event tells you to poll GET /chat/sessions/{id} once after ~2 seconds to pick up the new title. After that, the title is stable.

About `tier` and `backend`

Both optional. If omitted, defaults are:

tier: inherited from previous turn (or slow for new conv)
backend: router's choice — picks the least-loaded family that serves the model

When you pass backend, you scope the routing to a hardware family (e.g. mac-mini = all Mac mini Ollama backends; the router still load-balances within that family).

Stop generation

POST /stop/{id}

Same domain (chat.siati.ai/stop/...) as the chat webapp — purposely outside of the main API surface so it can be called as a fire-and-forget from a separate connection while a long-running messages POST is still streaming.

Server writes a Redis flag chat:stop:{id} that the streaming loop polls on every chunk. The model finishes the current chunk then breaks, and the assistant message is saved with whatever was generated up to that point, suffixed with \n\n_⏹ Generazione interrotta dall'utente._.

Full client example (TypeScript)

const BASE = 'https://my.siati.ai/api/v1';
const headers = {
  'Authorization': `Bearer ${jwt}`,
  'Content-Type':  'application/json',
  'Accept':        'application/json',
};

// 1) Create a session
const conv = await fetch(`${BASE}/chat/sessions`, {
  method: 'POST', headers,
  body: JSON.stringify({ model: 'apertus-70b-instruct' }),
}).then(r => r.json());

// 2) Send a message and consume the SSE stream
const resp = await fetch(`${BASE}/chat/sessions/${conv.id}/messages`, {
  method: 'POST',
  headers: { ...headers, 'Accept': 'text/event-stream' },
  body: JSON.stringify({
    content: 'Spiegami il teorema di Pitagora con un esempio.',
    model:   'apertus-70b-instruct',
    tier:    'fast',
  }),
});

const reader = resp.body!.getReader();
const decoder = new TextDecoder();
let buf = '';
let assistantText = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });

  // SSE frames: 'data: {…}\n\n'
  let frameEnd;
  while ((frameEnd = buf.indexOf('\n\n')) !== -1) {
    const frame = buf.slice(0, frameEnd);
    buf = buf.slice(frameEnd + 2);
    if (!frame.startsWith('data: ')) continue;
    const ev = JSON.parse(frame.slice(6));

    if (ev.type === 'delta') {
      assistantText += ev.text;
      updateUI(assistantText);
    }
    if (ev.type === 'done') {
      saveMessage(conv.id, assistantText, ev);
      if (ev.title_job_queued) {
        // Poll once after ~2s to pick up the auto-title
        setTimeout(async () => {
          const fresh = await fetch(`${BASE}/chat/sessions/${conv.id}`, { headers }).then(r => r.json());
          updateTitle(fresh.title);
        }, 2000);
      }
    }
    if (ev.type === 'error') {
      showError(ev.message);
    }
  }
}

Errors

HTTP	When
`401`	JWT missing/expired/invalid
`404`	Conversation not found or not owned by you
`422`	Validation failed (e.g. `content` empty or too long)
`429`	Rate limit hit for your plan/tier (see Rate limits)
`503`	No backend healthy for the requested `(model, tier)`

Errors during SSE streaming are emitted as data: {"type":"error", …} followed by data: {"type":"done", …, "outcome":"error"}. The HTTP status is still 200 (the response started successfully).

Differences vs developer API

	Developer API (`api.siati.ai`)	Sessions API (`my.siati.ai/api/v1`)
Auth	Bearer API key	Bearer JWT
State	Stateless	Server-side conversations
Format	OpenAI-compatible	Custom, simplified
Streaming	OpenAI SSE chunks	`{type:delta,text}` / `{type:done,…}`
Auto-title	No	Yes, via background job
Stop-on-demand	No (close connection)	`POST /stop/{id}`
History management	You manage	Server manages
Use case	Backend integration, OpenAI SDK migration	Mobile apps, web chat UIs

For a chat app, use this sessions API. For backend code that already speaks OpenAI, use the developer API.

Chat sessions API

Base URL & auth#

Endpoints#

List conversations#

Get conversation with messages#

Create conversation#

Rename / archive#

Send a message (the main one)#

About title_job_queued#

About tier and backend#

Stop generation#

Full client example (TypeScript)#

Errors#

Differences vs developer API#