Memory

The model forgets the moment the context window scrolls. Némo doesn’t — it retrieves what matters and slips it in beside your question.

Companion gives the assistant a memory that survives conversations. It is not the model’s context window, and it is not a raw transcript dump. Memory is a LightRAG knowledge layer: living graphs that Companion retrieves from semantically, per turn, and injects as context. The engine sees opaque tokens; the source of truth stays in Companion.

Three tiers

Memory has three scopes, queried in parallel on each turn:

Tier	What it holds	Shared with
User	What you’ve told the assistant about you, your work, your preferences — across every chat.	just you
Project	Facts scoped to one project, kept out of the others.	the project
Company	A single shared graph everyone in the org reads — standards, vocabulary, common context.	everyone (opt-in)

The user and project tiers live in the nemo-memory LightRAG that ships with the Serveur. The company tier is a separate dedicated LightRAG you point Companion at (one shared graph, read by all). Leave its URL empty and the company tier is simply off.

How retrieval works

On each turn Companion semantically queries the tiers and assembles a small RAG block — only what’s relevant to this question. That block rides with your question, at the end of the sequence, not in the system prompt.

Why at the end: a per-turn block placed in the system prompt would invalidate the KV prefix of the whole conversation on every turn (measured: 8% cache hit, 113 s TTFT on a 6.9k-token history). Riding with the question keeps the stable prefix cached and the retrieval fresh.

Two modes

A per-user setting controls how memory is fetched:

Mode	What it does
Advanced (default)	Full LightRAG retrieval across the tiers — the per-turn RAG block above.
Basic	Only the stable, hand-kept wiki/vault — no semantic retrieval, even when the service is up. Cheaper, fully predictable.

Toggle it in Settings → Memory.

The toggles

Per conversation and per project:

Global wiki (conversation) — whether your stable user memory is injected at all.
Project wiki (project) — whether the project’s dedicated memory is active for chats in it.
Read-only — a sub-toggle that lets a conversation read your global memory without the agent appending to it.

Off means nothing injected — absence is the signal.

Curating it

Let the agent append — the assistant writes to memory on its own via companion_remember when you tell it something worth keeping.
Edit by hand — open the memory view to add, correct, or delete. You own it; nothing persists unless you keep it.

What gets injected, and where you see it

When memory is on, Companion shows exactly what it pulled in this turn — the stable wiki block and the per-turn retrieval — so “what leaves” is never a mystery. Empty tiers inject nothing.

Memory is not the document corpus. Memory (LightRAG) is the small, living layer — “what do we already know about you / this project / the company”. A large document corpus (RAG over your files) is a separate organ for “find me the passage”. They answer different questions.

A benchmarking note

For reproducible model benchmarks, turn memory off — an injected block changes the prompt and the token count, which would skew the comparison. Memory is for working with the assistant, not for measuring a model.