siati.ai docs

Live data

Models catalog

Models currently served by the fleet. Updated automatically.

Auto-generated from model_cards. Last query: 2026-06-01T18:36:10+00:00.

Apertus 70B

medium fast

apertus-70b-instruct

LLM sovrano svizzero. 70B parametri, multilingua IT/EN/DE/FR/RM. Default per chat e RAG.

Hardware
L40B (L40S 46 GB) + DGX Spark (GB10 Grace+Blackwell 128 GB)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apertus-70b-instruct",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Mistral Large 2

fast ludicrous

mistral-large-2

Mistral Large 2 (123B) AWQ INT4. Modello europeo (Mistral.AI, Parigi). Top reasoning + multilingua incl. italiano + code.

Hardware
BigGuy GPU0+1 (2× RTX 6000 Pro Blackwell, TP=2, 194 GB VRAM)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-large-2",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Qwen 2.5 72B

fast ludicrous

qwen2.5-72b-instruct

Qwen 2.5 72B Instruct, AWQ INT4. Top dense ~70B open-weight: multilingua, code, math eccellenti.

Hardware
BigGuy GPU2 (1× RTX 6000 Pro Blackwell 97 GB)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-72b-instruct",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

DeepSeek-R1 Distill 70B

fast ludicrous

deepseek-r1-distill-70b

DeepSeek-R1 distillato in Llama-70B. Reasoning con chain-of-thought visibile (<think>...</think>). Eccellente per math, code, problem solving.

Hardware
BigGuy GPU3 (1× RTX 6000 Pro Blackwell 97 GB)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1-distill-70b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Qwen 2.5 7B

slow medium

qwen2.5:7b-instruct-q4_K_M

Modello rapido per drafting e tier slow/medium. 7B parametri quantizzati 4-bit.

Hardware
2× Mac mini Apple Silicon (Lugano)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:7b-instruct-q4_K_M",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Qwen 2.5 1.5B

slow

qwen2.5:1.5b

Modello compatto per test e generazione titoli chat. Risponde rapido anche su CPU.

Hardware
VM test (CPU-only)
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:1.5b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

BGE-M3 (Embeddings)

bge-m3

Embeddings multilingua 1024-dim per RAG. Apache 2.0. Servito via Text-Embeddings-Inference.

Hardware
NVIDIA L40S 46 GB (L40A) — Lugano
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

BGE Reranker v2 m3

bge-reranker-v2-m3

Cross-encoder per il secondo stadio di retrieval RAG. +15–25% recall@5. Apache 2.0.

Hardware
NVIDIA L40S 46 GB (L40A) — Lugano
Status
online
cURL example
bash
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'