Glossary

One place for the words. If a term in the guide is unclear, it’s defined here.

The stack

OdyssAI — the whole stack: an engine layer (OdyssAI-X, Telemak), an experience layer (Companion), on Apple Silicon, local-first.

OdyssAI-X (codename Odysseus) — the distributed MLX engine + orchestrator. Routes inference across 1–5 Macs; OpenAI- and Anthropic-compatible. Runs on Docker, holds no weights itself.

Telemak — the mono-Mac runtime. A native Swift .app in the menu bar serving the same APIs from one machine. No Docker, no Python.

Companion — the chat client. Conversations, projects, memory, skills, MCP. Ships no model of its own.

Némo — the agent persona inside Companion. Not a model — the orchestrator, memory, and voice above whatever model you’ve routed.

Engine & cluster

Orchestrator / Serveur — the OdyssAI-X process (a Mac mini) that routes requests to engine nodes. Never runs inference itself.

Engine node — a Mac Studio running the MLX runner, holding (a shard of) a model.

Backend — how cluster nodes exchange tensors: ring (TCP, safe default) or jaccl (RDMA over Thunderbolt 5, faster, with a queue-pair quirk).

Sharding — how a model is split: tensor parallel (slice every layer; needs KV-heads divisible by node count) or pipeline parallel (split layers; required for big MoEs).

Capability contract — the /.well-known/inference-engine.json document a client reads to learn an engine’s capabilities.

Cloud alias — a published cloud model, e.g. or:claude-haiku, callable through the same API.

CoeOS

CoeOS — a benchmark-composed virtual model. Routes each request to the model proven best at its skill. For agents.

Skill axis — a benchmarked category CoeOS routes on (python, legal_rgpd, creative, …).

Decider — the small model that classifies a request into an axis.

TMB Settings — the curated file mapping each axis to its best model, derived from the TMB benchmarks.

Companion surfaces

Gateway / hybrid / legacy — the three rails Companion uses to reach a model; gateway (direct to the engine) is the default, legacy (LiteLLM) a fallback.

Agent mode — a per-conversation toggle; on = the model gets the tool catalog (fs, RAG, web, MCP).

Memory — LightRAG knowledge graphs (user / project / company tiers) retrieved per turn. Distinct from a document corpus (bulk RAG over files).

OdyRAG — the dashboard that builds and manages the LightRAG memory graphs.

Skill — a markdown instruction package (agentskills.io spec) the agent loads on demand.

MCP server — a remote Model Context Protocol server whose tools merge into the agent’s toolbox.

Agents token — an hms_… token that lets an external agent call into Companion’s memory and tools.

Slash command — /help, /comfyui, /hermes, /pi, /exit — go beyond plain chat from the composer.

Inference fields

session_id — scopes the KV prefix cache to a conversation (cross-turn speed).

enable_thinking — toggles the reasoning block on reasoner models.

reasoning_effort — minimal → high, the reasoning budget.

TTFT — time to first token. Cached — prefix-cache hits.