Memory
The model forgets the moment the context window scrolls. Némo doesn’t — it retrieves what matters and slips it in beside your question.
Companion gives the assistant a memory that survives conversations. It is not the model’s context window, and it is not a raw transcript dump. Memory is a LightRAG knowledge layer: living graphs that Companion retrieves from semantically, per turn, and injects as context. The engine sees opaque tokens; the source of truth stays in Companion.
Three tiers
Section titled “Three tiers”Memory has three scopes, queried in parallel on each turn:
| Tier | What it holds | Shared with |
|---|---|---|
| User | What you’ve told the assistant about you, your work, your preferences — across every chat. | just you |
| Project | Facts scoped to one project, kept out of the others. | the project |
| Company | A single shared graph everyone in the org reads — standards, vocabulary, common context. | everyone (opt-in) |
The user and project tiers live in the nemo-memory LightRAG that ships
with the Serveur. The company tier is a separate dedicated LightRAG you
point Companion at (one shared graph, read by all). Leave its URL empty and the
company tier is simply off.
How retrieval works
Section titled “How retrieval works”On each turn Companion semantically queries the tiers and assembles a small RAG block — only what’s relevant to this question. That block rides with your question, at the end of the sequence, not in the system prompt.
Why at the end: a per-turn block placed in the system prompt would invalidate the KV prefix of the whole conversation on every turn (measured: 8% cache hit, 113 s TTFT on a 6.9k-token history). Riding with the question keeps the stable prefix cached and the retrieval fresh.
Two modes
Section titled “Two modes”A per-user setting controls how memory is fetched:
| Mode | What it does |
|---|---|
| Advanced (default) | Full LightRAG retrieval across the tiers — the per-turn RAG block above. |
| Basic | Only the stable, hand-kept wiki/vault — no semantic retrieval, even when the service is up. Cheaper, fully predictable. |
Toggle it in Settings → Memory.
The toggles
Section titled “The toggles”Per conversation and per project:
- Global wiki (conversation) — whether your stable user memory is injected at all.
- Project wiki (project) — whether the project’s dedicated memory is active for chats in it.
- Read-only — a sub-toggle that lets a conversation read your global memory without the agent appending to it.
Off means nothing injected — absence is the signal.
Curating it
Section titled “Curating it”- Let the agent append — the assistant writes to memory on its own via
companion_rememberwhen you tell it something worth keeping. - Edit by hand — open the memory view to add, correct, or delete. You own it; nothing persists unless you keep it.
What gets injected, and where you see it
Section titled “What gets injected, and where you see it”When memory is on, Companion shows exactly what it pulled in this turn — the stable wiki block and the per-turn retrieval — so “what leaves” is never a mystery. Empty tiers inject nothing.
Memory is not the document corpus. Memory (LightRAG) is the small, living layer — “what do we already know about you / this project / the company”. A large document corpus (RAG over your files) is a separate organ for “find me the passage”. They answer different questions.
A benchmarking note
Section titled “A benchmarking note”For reproducible model benchmarks, turn memory off — an injected block changes the prompt and the token count, which would skew the comparison. Memory is for working with the assistant, not for measuring a model.
Read next
Section titled “Read next”- The chat window — where memory toggles live.
- Agents tokens — exposing this memory to external agents.
- Getting started — turning memory on.