European · open-source · local-first

The best open-weight models. No compromise.

A complete European AI ecosystem: distributed inference, orchestration, and an enterprise AI OS on top. Local when it matters. Cloud when you choose. Your models, your data, your hardware.

/01 · Problem

Adopting AI in the enterprise means giving something up.

Option A · Proprietary cloud

Polish, no sovereignty.

Quality, modern agents, the user experience of a real AI OS. But your data leaves for the vendor, you're locked to their catalog, and every request is one more GDPR liability.

Option B · Local AI as it stands

Sovereignty, no polish.

Your hardware, your weights. But too often a chat box and nothing else: no real memory, no agents, models capped by a single machine's RAM. A trip back to 2023.

Between data leakage and local tinkering, a third way was missing.

/02 · Solution

OdyssAI removes the tradeoff.

Frontier-grade AI experience, on infrastructure fully under control. Sovereignty and capability in one system. Local when it matters. Cloud when you choose.

Pillar 01

Sovereign by design.

Inference and memory live on hardware you own. Prompts, files, and the knowledge base never leave the LAN unless you say so.

Pillar 02

Frontier-grade, local.

An Apple Silicon cluster runs the largest open-weight models — MiniMax M3, Qwen 3.5 397B, GLM-5.2, … — in their best quantization.

Pillar 03

A modern AI OS.

Three-level knowledge base, semantic routing, local agents, sessions that survive. An interface you can live in.

/03 · Capacity

Local AI is not capped at 35B on 36 GB.

OdyssAI-X distributes inference across 1 to 6 Apple Silicon nodes over Thunderbolt 5 RDMA. One Mac Studio runs a serious model. Six run what the cloud runs.

94.4%

Qwen 3.5 397B BF16 on a 4-node OdyssAI-X cluster, evaluated against Claude Opus as the frontier ceiling. Within one point of the cloud frontier.

Built directly on Apple MLX and mlx-distributed — the frameworks Apple ships for Apple Silicon. Tensor and pipeline parallelism, RDMA over Thunderbolt 5, OpenAI- and Anthropic-compatible surfaces.

Because BF16 is what the model was trained at, and Q4 is a convenience, not a truth. Quantization is not lossless.

Because a 35B model on a single laptop is not a frontier experience — it is a polite invitation to get one.

GLM 5.2 Cloud (480/500, 19 June 2026) beats Opus 4.7 on the T01 benchmark. See the general scoreboard →

/04 · Stack

Two products. One sovereign stack.

One engine that runs the cluster. One AI OS that holds the memory. Both built around standard wire protocols so you can plug your own clients in.

04.a · Cluster engine
OdyssAI-X
carries the weight.

Distributed inference and orchestration on Apple Silicon. Up to six nodes, the largest open-weight models in their best quantization. Built on MLX + mlx-distributed, RDMA over Thunderbolt 5.

For operators and teams who want frontier-class compute without giving up standard clients.

Surface OpenAI · Anthropic
Ground Apple Silicon cluster
04.b · Enterprise AI OS
Companion
gives the experience.

The enterprise cognitive layer: conversations, projects, files, agents, and a three-level knowledge base — individual, team, organization. Memory is not a feature. It is a foundation.

Conversations remember. Teams share ground. The company keeps its institutional context — without giving any of it away.

Surface Web · MCP · IDE
Holds Knowledge · Teams · Decisions

Also For a single Mac, Telemak provides the same engine as a single-machine runtime. A native Swift binary, 1.5× to 2× faster than Python runtimes. Telemak docs →

Also A routing layer, CoeOS, exposes the whole fleet as one OpenAI-compatible model. It classifies each request by skill and serves the model our benchmarks proved best at it, local or cloud. Built for agents and orchestrators, not chat. Every response names the model it used. CoeOS docs →

/05 · Experience

Local AI should not feel like a downgrade.

You shouldn't have to choose between privacy and polish, between local models and modern agents.

→ 01

Knowledge, not just memory.

Three structured levels — personal, team, company — queried in parallel by a dedicated RAG engine. About 4,500 selected tokens, not a 12,000-token wall.

— organisational
→ 02

Models you can choose.

Open-weight local models when sovereignty matters. Cloud models when you decide they're the right tool. Routing is automatic; the catalog is yours.

— optional
→ 03

Sessions that survive.

Long-running work, streaming, cancellation, recovery. State designed for real use, not the demo path.

— durable
→ 04

Tools without theatre.

Agentic workflows that are visible, permissioned, interruptible, logged. No magic fog. No fake autonomy.

— visible

/06 · Internals

Three mechanisms that make the difference.

The honest version. Open a block if you want to know how. Skip if you just want to know it works.

I.01 The knowledge layer.

Most AI memory injects everything in bulk, without structure. Companion uses a graph-based retrieval engine — LightRAG with Qwen3 embeddings. At every turn, all three levels (personal, team, organization) are queried in parallel and selected, not concatenated.

About 4,500 relevant tokens instead of an undifferentiated 12,000-token wall. The model receives only what is relevant to this question, from this person, in this team.

On top: an append-only Decision Log. What was decided, why, when to revisit. Injected silently into context. The AI already knows what was decided before you ask.

I.02 Semantic routing.

No model picker. Every message passes through a small embedding model — about six milliseconds — and lands in its bucket: conversation, analysis, code. The right model takes the turn.

A simple question no longer wastes a 397 B reasoner. A hard problem no longer falls on a 35 B conversational. A code snippet no longer gets a poet.

Cost routers exist — round-robin, cheapest-token, latency-aware. They route on the bill, not on the intent. Semantic routing requires several capable models in the same hand, a fast embedder running locally, and the editorial courage to remove the picker. Three things the cloud is not selling.

I.03 Local agents.

The chat reasons; the agents execute. A slash command opens a terminal inside the conversation: read files, write files, run shell. A small local daemon bridges the agent runtime to Companion. The cloud never sees your file system.

/hermes — the terminal. /pi — the analyst. /omnigent — the autonomous coder. Each agent is a slash command. Each runs on the user's machine, not in someone else's cloud. Need to generate an image? Just type /comfyui to call OdyssAI-imager.

Tools stop being separate apps with separate windows. They become commands you call from the place you were already thinking.

/07 · Europe

Sovereign infrastructure for a sovereign continent.

Data residency is not a setting. There is no cloud to reside in.

Built in Europe, for European regulatory reality. Fully compatible with Mistral models, alongside the best open-weight models worldwide. A European frontier model, on European hardware, end to end.

/08 · Status

144,000 documents, back to back. Not a single interruption.

The only question an enterprise truly asks: will it still be running on Monday? Yes.

The index of a full corpus into OdyRAG — four identical models running in parallel, four inferences every fifteen seconds.

OdyRAG

Under the hood

OdyRAG — our RAG. An improved LightRAG with an easy-to-use web UI.

OdyRAG docs →

/09 · Contact

Let's talk.

Cluster deployments, partnerships, integrations, press, or simple technical curiosity. Answered from Europe, in French, English, or Spanish.

Send a message