Skip to content

HTTP API

Two endpoints do the real work; everything else is convenience. If your client speaks OpenAI or Anthropic, it already speaks OdyssAI-X.

OdyssAI-X exposes a public inference surface (/v1/*, always open) and an admin surface (/admin/*, open on a trusted LAN, token-gated when you choose). Telemak serves the same /v1/* surface from a single Mac.

POST /v1/chat/completions OpenAI dialect
POST /v1/messages Anthropic dialect (+ /v1/messages/count_tokens)

Every engine in the stack speaks both. Clients use OpenAI by default and Anthropic for Claude-style flows (Claude Code, the Anthropic SDK). The schemas are the standard ones — point an existing SDK at the base URL and it works:

Terminal window
export OPENAI_BASE_URL="http://<server>:8000/v1"
export OPENAI_API_KEY="dummy" # no key required on a LAN
GET /v1/models currently servable models
GET /v1/models?include_unloaded=true + on-disk inventory (x_odyssai.ready=false)

By default only servable models are listed: loaded local pools, enrolled Telemak clusters, published cloud aliases, and CoeOS when enabled. Each entry carries an x_odyssai block with the real backend, the concrete model behind an alias, and per-model capabilities.

On top of the standard schemas, OdyssAI-X reads three optional fields:

FieldTypeEffect
session_idstringScopes the KV prefix cache to a conversation. Reuse the same id across turns and the shared prefix is prefilled once — big TTFT win. Clients attach one per chat.
enable_thinkingboolToggles the reasoning block on reasoner models (default true on those that ship it).
reasoning_effortminimal / low / medium / highBudget for the reasoning block. minimal roughly halves completion tokens on always-think models.

The usage block in OpenAI responses adds prompt_tokens_details.cached_tokens — the prefix-cache hit count. Clients surface it (Companion shows Cached: N tok (XX%)).

The model field accepts several kinds of id, all resolved by the orchestrator:

model valueRoutes to
a cluster name (e.g. default, argo)the loaded pool on that cluster
a Telemak cluster id (e.g. telecode)the enrolled Telemak, over HTTP proxy
a cloud alias (e.g. or:claude-haiku)the configured cloud provider
CoeOSthe benchmark-composed router → the best model per skill

The caller doesn’t need to know the topology — it picks a published id, the orchestrator hides the rest.

/admin/* controls the cluster. A few you’ll actually use:

Terminal window
GET /admin/clusters # registered clusters + status
POST /admin/<cluster>/load # load a model {"model": "...", "sharding"?: "pipeline"}
POST /admin/<cluster>/unload # free the pool
GET /admin/settings PUT /admin/settings # server-wide defaults
POST /admin/sync/rsync # push a model from one node to others
GET /health # {"status":"idle|busy", "version": "..."}

/admin/* is open by default (trusted-LAN, single operator). Set ODYSSAI_X_ADMIN_TOKEN to require Authorization: Bearer <token> if you expose the engine beyond your LAN; /v1/* stays public regardless.

Cloud providers are first-class. Add one in the dashboard (Settings → Cloud providers → paste an OpenRouter / Anthropic / OpenAI key) and aliases like or:claude-haiku appear in /v1/models instantly, callable through the same /v1/chat/completions. There is no LiteLLM to install — it exists only as a legacy fallback rail.