Models catalog · Docs

Apertus 70B

medium fast

apertus-70b-instruct

LLM sovrano svizzero. 70B parametri, multilingua IT/EN/DE/FR/RM. Default per chat e RAG.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "apertus-70b-instruct",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Gemma 4 26B

vision medium fast

gemma-4-26b

Modello Google multimodale (testo+immagini), tool calling e JSON robusto. MoE 26B/4B-attivi, veloce.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online
Input immagine: fino a 2 per richiesta · PNG, JPEG, WebP, GIF · max 8 MB

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Esempio con immagine

L'immagine si invia come data URI base64, oppure come URL pubblico: in quel caso la scarichiamo noi. Gli URL che puntano a indirizzi di rete privati vengono rifiutati.

B64=$(base64 -w0 foto.jpg)
curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Cosa c'\''è in questa foto?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,'"$B64"'"}}
      ]
    }]
  }'

Qwen 2.5 32B

medium fast

qwen2.5:32b

Modello capace da 32B per ragionamento e task complessi. Gira su GPU AMD sovrana a Lugano.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:32b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Qwen 2.5 14B

slow medium fast

qwen2.5:14b

Bilanciato 14B, veloce (~50 tok/s), per chat e drafting di qualita.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:14b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

Qwen 2.5 1.5B

slow medium fast

qwen2.5:1.5b

Modello compatto per test e generazione titoli chat. Risponde rapido anche su CPU.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:1.5b",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

BGE-M3 (Embeddings)

bge-m3

Embeddings multilingua 1024-dim per RAG. Apache 2.0. Servito via Text-Embeddings-Inference.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'

BGE Reranker v2 m3

bge-reranker-v2-m3

Cross-encoder per il secondo stadio di retrieval RAG. +15–25% recall@5. Apache 2.0.

Hardware: Infrastruttura sovrana svizzera · Lugano (CH)
Status: online

cURL example

curl https://api.siati.ai/v1/chat/completions \
  -H "Authorization: Bearer $SIATI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "messages": [{"role": "user", "content": "Ciao!"}]
  }'