Chat completions#
POST /v1/chat/completions — primary endpoint, 100% OpenAI-compatible.
Basic shape#
client.chat.completions.create(
model="siati/llama-3.1-405b",
messages=[
{"role": "system", "content": "You are a Swiss assistant."},
{"role": "user", "content": "Explain nFADP in 3 bullets."},
],
temperature=0.7,
max_tokens=500,
)
Supported parameters#
| Parameter | Type | Default | Notes |
|---|---|---|---|
model |
string | required | See model catalog. |
messages |
array | required | Alternating system/user/assistant. |
temperature |
float | 1.0 | 0=deterministic, 2=very creative. |
top_p |
float | 1.0 | Nucleus sampling, alternative to temperature. |
max_tokens |
int | model-default | Output cap. If omitted, model decides. |
stream |
bool | false | See streaming. |
tools |
array | – | Function calling, see tool use. |
response_format |
object | – | {"type":"json_object"} for JSON output. |
seed |
int | – | For reproducibility (best-effort). |
stop |
string|array | – | Stop sequences. |
presence_penalty |
float | 0 | -2..+2. |
frequency_penalty |
float | 0 | -2..+2. |
Multi-turn conversation#
history = [{"role": "system", "content": "You are a Swiss legal assistant."}]
while True:
user_input = input("> ")
if not user_input:
break
history.append({"role": "user", "content": user_input})
resp = client.chat.completions.create(
model="siati/mistral-small-24b",
messages=history,
)
answer = resp.choices[0].message.content
print(answer)
history.append({"role": "assistant", "content": answer})
The model has no memory across calls — you must resend the full history each turn. Cost: you pay for all input tokens on every round.
Response shape#
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1714723200,
"model": "siati/llama-3.1-405b",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 100,
"total_tokens": 142
}
}
finish_reason can be: stop (natural), length (hit max_tokens),
tool_calls (function call), content_filter (rare, only severe violations).