Live data
Models catalog
Models currently served by the fleet. Updated automatically.
Auto-generated from model_cards. Last query: 2026-06-01T18:36:10+00:00.
Apertus 70B
apertus-70b-instruct
LLM sovrano svizzero. 70B parametri, multilingua IT/EN/DE/FR/RM. Default per chat e RAG.
- Hardware
- L40B (L40S 46 GB) + DGX Spark (GB10 Grace+Blackwell 128 GB)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "apertus-70b-instruct",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
Mistral Large 2
mistral-large-2
Mistral Large 2 (123B) AWQ INT4. Modello europeo (Mistral.AI, Parigi). Top reasoning + multilingua incl. italiano + code.
- Hardware
- BigGuy GPU0+1 (2× RTX 6000 Pro Blackwell, TP=2, 194 GB VRAM)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral-large-2",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
Qwen 2.5 72B
qwen2.5-72b-instruct
Qwen 2.5 72B Instruct, AWQ INT4. Top dense ~70B open-weight: multilingua, code, math eccellenti.
- Hardware
- BigGuy GPU2 (1× RTX 6000 Pro Blackwell 97 GB)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-72b-instruct",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
DeepSeek-R1 Distill 70B
deepseek-r1-distill-70b
DeepSeek-R1 distillato in Llama-70B. Reasoning con chain-of-thought visibile (<think>...</think>). Eccellente per math, code, problem solving.
- Hardware
- BigGuy GPU3 (1× RTX 6000 Pro Blackwell 97 GB)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-70b",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
Qwen 2.5 7B
qwen2.5:7b-instruct-q4_K_M
Modello rapido per drafting e tier slow/medium. 7B parametri quantizzati 4-bit.
- Hardware
- 2× Mac mini Apple Silicon (Lugano)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:7b-instruct-q4_K_M",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
Qwen 2.5 1.5B
qwen2.5:1.5b
Modello compatto per test e generazione titoli chat. Risponde rapido anche su CPU.
- Hardware
- VM test (CPU-only)
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:1.5b",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
BGE-M3 (Embeddings)
bge-m3
Embeddings multilingua 1024-dim per RAG. Apache 2.0. Servito via Text-Embeddings-Inference.
- Hardware
- NVIDIA L40S 46 GB (L40A) — Lugano
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-m3",
"messages": [{"role": "user", "content": "Ciao!"}]
}'
BGE Reranker v2 m3
bge-reranker-v2-m3
Cross-encoder per il secondo stadio di retrieval RAG. +15–25% recall@5. Apache 2.0.
- Hardware
- NVIDIA L40S 46 GB (L40A) — Lugano
- Status
- online
cURL example
curl https://api.siati.ai/v1/chat/completions \
-H "Authorization: Bearer $SIATI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"messages": [{"role": "user", "content": "Ciao!"}]
}'