HTTP API

Two endpoints do the real work; everything else is convenience. If your client speaks OpenAI or Anthropic, it already speaks OdyssAI-X.

OdyssAI-X exposes a public inference surface (/v1/*, always open) and an admin surface (/admin/*, open on a trusted LAN, token-gated when you choose). Telemak serves the same /v1/* surface from a single Mac.

The two endpoints that matter

POST /v1/chat/completions     OpenAI dialect
POST /v1/messages             Anthropic dialect (+ /v1/messages/count_tokens)

Every engine in the stack speaks both. Clients use OpenAI by default and Anthropic for Claude-style flows (Claude Code, the Anthropic SDK). The schemas are the standard ones — point an existing SDK at the base URL and it works:

export OPENAI_BASE_URL="http://<server>:8000/v1"
export OPENAI_API_KEY="dummy"          # no key required on a LAN

Listing models

GET /v1/models                     currently servable models
GET /v1/models?include_unloaded=true   + on-disk inventory (x_odyssai.ready=false)

By default only servable models are listed: loaded local pools, enrolled Telemak clusters, published cloud aliases, and CoeOS when enabled. Each entry carries an x_odyssai block with the real backend, the concrete model behind an alias, and per-model capabilities.

Three extensions

On top of the standard schemas, OdyssAI-X reads three optional fields:

Field	Type	Effect
`session_id`	string	Scopes the KV prefix cache to a conversation. Reuse the same id across turns and the shared prefix is prefilled once — big TTFT win. Clients attach one per chat.
`enable_thinking`	bool	Toggles the reasoning block on reasoner models (default `true` on those that ship it).
`reasoning_effort`	`minimal` / `low` / `medium` / `high`	Budget for the reasoning block. `minimal` roughly halves completion tokens on always-think models.

The usage block in OpenAI responses adds prompt_tokens_details.cached_tokens — the prefix-cache hit count. Clients surface it (Companion shows Cached: N tok (XX%)).

Aliases and routing

The model field accepts several kinds of id, all resolved by the orchestrator:

`model` value	Routes to
a cluster name (e.g. `default`, `argo`)	the loaded pool on that cluster
a Telemak cluster id (e.g. `telecode`)	the enrolled Telemak, over HTTP proxy
a cloud alias (e.g. `or:claude-haiku`)	the configured cloud provider
`CoeOS`	the benchmark-composed router → the best model per skill

The caller doesn’t need to know the topology — it picks a published id, the orchestrator hides the rest.

Admin endpoints

/admin/* controls the cluster. A few you’ll actually use:

GET  /admin/clusters                       # registered clusters + status
POST /admin/<cluster>/load                 # load a model  {"model": "...", "sharding"?: "pipeline"}
POST /admin/<cluster>/unload               # free the pool
GET  /admin/settings    PUT /admin/settings    # server-wide defaults
POST /admin/sync/rsync                     # push a model from one node to others
GET  /health                               # {"status":"idle|busy", "version": "..."}

/admin/* is open by default (trusted-LAN, single operator). Set ODYSSAI_X_ADMIN_TOKEN to require Authorization: Bearer <token> if you expose the engine beyond your LAN; /v1/* stays public regardless.

Cloud passthrough

Cloud providers are first-class. Add one in the dashboard (Settings → Cloud providers → paste an OpenRouter / Anthropic / OpenAI key) and aliases like or:claude-haiku appear in /v1/models instantly, callable through the same /v1/chat/completions. There is no LiteLLM to install — it exists only as a legacy fallback rail.