Polish, no sovereignty.
Quality, modern agents, the user experience of a real AI OS. But your data leaves for the vendor, you're locked to their catalog, and every request is one more GDPR liability.
European · open-source · local-first
A complete European AI ecosystem: distributed inference, orchestration, and an enterprise AI OS on top. Local when it matters. Cloud when you choose. Your models, your data, your hardware.
/01 · Problem
Quality, modern agents, the user experience of a real AI OS. But your data leaves for the vendor, you're locked to their catalog, and every request is one more GDPR liability.
Your hardware, your weights. But too often a chat box and nothing else: no real memory, no agents, models capped by a single machine's RAM. A trip back to 2023.
Between data leakage and local tinkering, a third way was missing.
/02 · Solution
Frontier-grade AI experience, on infrastructure fully under control. Sovereignty and capability in one system. Local when it matters. Cloud when you choose.
Inference and memory live on hardware you own. Prompts, files, and the knowledge base never leave the LAN unless you say so.
An Apple Silicon cluster runs the largest open-weight models — MiniMax M3, Qwen 3.5 397B, GLM-5.2, … — in their best quantization.
Three-level knowledge base, semantic routing, local agents, sessions that survive. An interface you can live in.
/03 · Capacity
OdyssAI-X distributes inference across 1 to 6 Apple Silicon nodes over Thunderbolt 5 RDMA. One Mac Studio runs a serious model. Six run what the cloud runs.
94.4%
Qwen 3.5 397B BF16 on a 4-node OdyssAI-X cluster, evaluated against Claude Opus as the frontier ceiling. Within one point of the cloud frontier.
Built directly on Apple MLX and mlx-distributed — the frameworks Apple ships for Apple Silicon. Tensor and pipeline parallelism, RDMA over Thunderbolt 5, OpenAI- and Anthropic-compatible surfaces.
Because BF16 is what the model was trained at, and Q4 is a convenience, not a truth. Quantization is not lossless.
Because a 35B model on a single laptop is not a frontier experience — it is a polite invitation to get one.
GLM 5.2 Cloud (480/500, 19 June 2026) beats Opus 4.7 on the T01 benchmark. See the general scoreboard →
/04 · Stack
One engine that runs the cluster. One AI OS that holds the memory. Both built around standard wire protocols so you can plug your own clients in.
Distributed inference and orchestration on Apple Silicon. Up to six nodes, the largest open-weight models in their best quantization. Built on MLX + mlx-distributed, RDMA over Thunderbolt 5.
For operators and teams who want frontier-class compute without giving up standard clients.
The enterprise cognitive layer: conversations, projects, files, agents, and a three-level knowledge base — individual, team, organization. Memory is not a feature. It is a foundation.
Conversations remember. Teams share ground. The company keeps its institutional context — without giving any of it away.
Also For a single Mac, Telemak provides the same engine as a single-machine runtime. A native Swift binary, 1.5× to 2× faster than Python runtimes. Telemak docs →
Also A routing layer, CoeOS, exposes the whole fleet as one OpenAI-compatible model. It classifies each request by skill and serves the model our benchmarks proved best at it, local or cloud. Built for agents and orchestrators, not chat. Every response names the model it used. CoeOS docs →
/05 · Experience
You shouldn't have to choose between privacy and polish, between local models and modern agents.
Three structured levels — personal, team, company — queried in parallel by a dedicated RAG engine. About 4,500 selected tokens, not a 12,000-token wall.
— organisationalOpen-weight local models when sovereignty matters. Cloud models when you decide they're the right tool. Routing is automatic; the catalog is yours.
— optionalLong-running work, streaming, cancellation, recovery. State designed for real use, not the demo path.
— durableAgentic workflows that are visible, permissioned, interruptible, logged. No magic fog. No fake autonomy.
— visible/06 · Internals
The honest version. Open a block if you want to know how. Skip if you just want to know it works.
Most AI memory injects everything in bulk, without structure. Companion uses a graph-based retrieval
engine — LightRAG with Qwen3 embeddings. At every turn, all three levels
(personal, team, organization) are queried in parallel and selected, not concatenated.
About 4,500 relevant tokens instead of an undifferentiated 12,000-token wall. The model receives only what is relevant to this question, from this person, in this team.
On top: an append-only Decision Log. What was decided, why, when to revisit. Injected silently into context. The AI already knows what was decided before you ask.
No model picker. Every message passes through a small embedding model — about six milliseconds — and lands in its bucket: conversation, analysis, code. The right model takes the turn.
A simple question no longer wastes a 397 B reasoner. A hard problem no longer falls on a 35 B conversational. A code snippet no longer gets a poet.
Cost routers exist — round-robin, cheapest-token, latency-aware. They route on the bill, not on the intent. Semantic routing requires several capable models in the same hand, a fast embedder running locally, and the editorial courage to remove the picker. Three things the cloud is not selling.
The chat reasons; the agents execute. A slash command opens a terminal inside the conversation: read files, write files, run shell. A small local daemon bridges the agent runtime to Companion. The cloud never sees your file system.
/hermes — the terminal. /pi — the analyst. /omnigent — the
autonomous coder. Each agent is a slash command. Each runs on the user's machine, not in someone
else's cloud. Need to generate an image? Just type /comfyui to call
OdyssAI-imager.
Tools stop being separate apps with separate windows. They become commands you call from the place you were already thinking.
/07 · Europe
Data residency is not a setting. There is no cloud to reside in.
Built in Europe, for European regulatory reality. Fully compatible with Mistral models, alongside the best open-weight models worldwide. A European frontier model, on European hardware, end to end.
/08 · Status
The only question an enterprise truly asks: will it still be running on Monday? Yes.
The index of a full corpus into OdyRAG — four identical models running in parallel, four inferences every fifteen seconds.
/09 · Contact
Cluster deployments, partnerships, integrations, press, or simple technical curiosity. Answered from Europe, in French, English, or Spanish.