Skip to content

Models#

Curated catalog of open-weight models served on hardware we own in Switzerland. No fine-tuning on your prompts, no hop outside the country.

Live catalog#

Model ID Tier Hardware Context Use case
siati/llama-3.1-405b fast NVIDIA Blackwell B6000-Pro × 4 (TP=4, INT4) 128K Frontier-class reasoning, drop-in for GPT-4o
siati/mistral-small-24b fast NVIDIA RTX 5090 (FP8 dynamic) 32K Chat, summarization, coding (multilingual IT/EN/FR/DE)
siati/bge-m3 embeddings NVIDIA L4 24GB 8K RAG, semantic search, multilingual
qwen2.5:7b-instruct-q4_K_M slow Apple Silicon (M-series, Metal) × 2 32K Batch jobs, lightweight chat, low energy

Coming soon#

  • siati/qwen-72b (medium tier) — production-grade reasoning
  • siati/qwen-32b-coder (medium tier) — code generation specialist
  • siati/xtts-v2 (tts tier) — multilingual text-to-speech
  • Whisper-class STT for audio transcription (medical visit verbalization, legal hearings, podcast captioning)

Need a model not in the catalog? Tell us — we curate based on actual customer demand.

Tiers explained#

Tier PAYG pricing Free quota Pro (19 CHF) Max (49 CHF)
embeddings 0.04 CHF / 1M
slow 0.40 CHF / 1M 100K/day 2M/day 6M/day
medium 1.50 CHF / 1M 2M/day 6M/day
fast 4.00 CHF / 1M PAYG PAYG

Tiers slow + medium + embeddings are included in subscriptions with a daily cap. Tier fast (including the 405B model) requires PAYG credits.

Programmatic listing#

from openai import OpenAI

client = OpenAI(base_url="https://api.siati.ai/v1", api_key="siati_...")

models = client.models.list()
for m in models.data:
    print(m.id, m.owned_by)

Returns only models accessible from your key (filtered by your plan tier + PAYG balance).

Public catalog endpoint#

https://api.siati.ai/api/v1/public/models/cards returns JSON with display metadata (blurb, hardware, status). We use it internally for the homepage and this page.

curl https://api.siati.ai/api/v1/public/models/cards | jq