Telemak — performance

Numbers from real hardware, not a spec sheet. Your mileage moves with the model, the quant, and how much you ask it to write.

A measured data point

On an M3 Ultra 96 GB, Telemak (native Swift) serving a Qwen3.6-35B-A3B MoE 8-bit reaches ~66 tok/s decode, with TTFT around 3.9 s cold. That’s the peak-throughput config in our own benchmarks — a MoE with few active parameters on a single big-memory Mac is where Telemak shines.

For comparison, a dense 31 B model 8-bit on a single node runs closer to ~18 tok/s — dense models move every parameter per token, so they’re slower than a same-size MoE.

Class	Example	Rough decode
MoE, few active params	Qwen3.6-35B-A3B 8-bit	~60–66 tok/s
Dense mid-size	a 31 B dense 8-bit	~18 tok/s
Big MoE	80–122 B MoE 8-bit	lower, memory-bound on bigger weights

These are single-stream interactive numbers. Treat them as the shape of what to expect, then measure your own model on your own Mac.

The wired-memory budget

Telemak keeps models in wired memory so the GPU always has them. The practical ceiling is your RAM minus headroom for macOS and anything else running. Telemak tracks each model’s wired_limit_mb and won’t let the OS page weights out — which is exactly what keeps tok/s stable instead of collapsing when memory gets tight.

A rough fit guide (leave ~10–20% headroom):

RAM	Comfortable load
64 GB	one 35 B MoE 8-bit, or two ~30 B co-loaded (chat + embedder)
96–128 GB	a 70–80 B MoE 8-bit
256 GB	a 122 B MoE 8-bit
512 GB	a 200 B+ MoE (mixed-quant)

The KV-cache win

The first turn of a long conversation pays full prefill. The second turn reuses the on-disk KV prefix (~/.telemak/sessions/) and comes back much faster — in practice several times faster on a multi-thousand-token context. This is why keeping a model warm and a conversation going beats reloading.

Reliability

Telemak runs in production on M3 Ultra and M3 Max machines. A recorded endurance mark: 96 h 44 min non-stop, 23 549 inferences, zero failed requests, with two 30 B models co-loaded (73.9 / 96 GB wired). The LaunchAgent restarts the daemon on crash and across reboots, so uptime is the daemon’s job, not yours.

Telemak — performance

A measured data point

The wired-memory budget

The KV-cache win

Reliability

Read next