Glossary
One place for the words. If a term in the guide is unclear, it’s defined here.
The stack
Section titled “The stack”OdyssAI — the whole stack: an engine layer (OdyssAI-X, Telemak), an experience layer (Companion), on Apple Silicon, local-first.
OdyssAI-X (codename Odysseus) — the distributed MLX engine + orchestrator. Routes inference across 1–5 Macs; OpenAI- and Anthropic-compatible. Runs on Docker, holds no weights itself.
Telemak — the mono-Mac runtime. A native Swift .app in the menu bar serving
the same APIs from one machine. No Docker, no Python.
Companion — the chat client. Conversations, projects, memory, skills, MCP. Ships no model of its own.
Némo — the agent persona inside Companion. Not a model — the orchestrator, memory, and voice above whatever model you’ve routed.
Engine & cluster
Section titled “Engine & cluster”Orchestrator / Serveur — the OdyssAI-X process (a Mac mini) that routes requests to engine nodes. Never runs inference itself.
Engine node — a Mac Studio running the MLX runner, holding (a shard of) a model.
Backend — how cluster nodes exchange tensors: ring (TCP, safe default)
or jaccl (RDMA over Thunderbolt 5, faster, with a queue-pair quirk).
Sharding — how a model is split: tensor parallel (slice every layer; needs KV-heads divisible by node count) or pipeline parallel (split layers; required for big MoEs).
Capability contract — the /.well-known/inference-engine.json document a
client reads to learn an engine’s capabilities.
Cloud alias — a published cloud model, e.g. or:claude-haiku, callable
through the same API.
CoeOS — a benchmark-composed virtual model. Routes each request to the model proven best at its skill. For agents.
Skill axis — a benchmarked category CoeOS routes on (python, legal_rgpd,
creative, …).
Decider — the small model that classifies a request into an axis.
TMB Settings — the curated file mapping each axis to its best model, derived from the TMB benchmarks.
Companion surfaces
Section titled “Companion surfaces”Gateway / hybrid / legacy — the three rails Companion uses to reach a model; gateway (direct to the engine) is the default, legacy (LiteLLM) a fallback.
Agent mode — a per-conversation toggle; on = the model gets the tool catalog (fs, RAG, web, MCP).
Memory — LightRAG knowledge graphs (user / project / company tiers) retrieved per turn. Distinct from a document corpus (bulk RAG over files).
OdyRAG — the dashboard that builds and manages the LightRAG memory graphs.
Skill — a markdown instruction package (agentskills.io spec) the agent loads on demand.
MCP server — a remote Model Context Protocol server whose tools merge into the agent’s toolbox.
Agents token — an hms_… token that lets an external agent call into
Companion’s memory and tools.
Slash command — /help, /comfyui, /hermes, /pi, /exit — go beyond
plain chat from the composer.
Inference fields
Section titled “Inference fields”session_id — scopes the KV prefix cache to a conversation (cross-turn speed).
enable_thinking — toggles the reasoning block on reasoner models.
reasoning_effort — minimal → high, the reasoning budget.
TTFT — time to first token. Cached — prefix-cache hits.
Read next
Section titled “Read next”- Welcome — the stack in one screen.
- Getting started — pick your path.