Skip to content

Context Management

Fermi’s context management is the core feature that enables long sessions. Instead of hitting context limits and performing a blind reset, the system monitors usage, compresses strategically, and only resets as a last resort. The agent can inspect its own context distribution and surgically summarize selected blocks — down to a single tool call result.

As context grows, the system injects guidance at two thresholds:

LevelDefault triggerWhat the agent sees
Level 160% of budgetA nudge to call show_context and consider summarizing older groups
Level 280% of budgetA stronger prompt to summarize immediately before auto-compact triggers

Hysteresis prevents oscillation — once a hint fires, context must drop meaningfully before the hint can fire again.

The agent has two tools for fine-grained context control:

Returns a self-contained context map showing all context groups with their IDs, sizes, and types. It injects nothing into the existing conversation (preserving the prompt cache) — the map itself tells the agent what each context ID covers.

Operates on groups of spatially contiguous context IDs. For each group, the agent writes a distilled summary that preserves decisions, key facts, code references, and unresolved issues — then the original content is replaced by the summary.

The key property: this is append-only. Original content is never deleted — summaries are appended, and the system dynamically determines what is visible based on what has been summarized. This means summarization is safe and reversible at the system level.

summarize_context targets specific ranges. For whole-window reset when the context limit is reached, the system uses auto-compact (a separate mechanism, also exposed as the /compact user command).

When context reaches critical levels despite hints and summarization:

TriggerDefault thresholdWhen it fires
Before-turn85%Before processing the next user message
Mid-turn90%After a tool call result pushes context over the limit

Auto-compact produces a continuation prompt — a comprehensive briefing that lets the agent resume exactly where it left off. After compact, the context window contains only:

  • The continuation prompt
  • The current plan snapshot (if a plan is active)
  • AGENTS.md files (persistent memory)

Before-turn compact is interruptible: pressing Ctrl+C cancels the compact and preserves the original context.

/summarize (user)summarize_context tool (agent)
TriggerUser runs the slash commandAgent decides autonomously (or prompted by hints)
SelectionInteractive picker: choose start/end turn rangeAgent picks context IDs after inspecting the map
FocusOptional focus prompt (“Keep the auth details”)Agent writes the summary directly
GranularityTurn-level rangesCan target individual tool results

The effective context size can be restricted without switching models. In ~/.fermi/settings.json (or <project>/.fermi/settings.json for per-project override):

{
"context_budget_percent": 70
}

This sets the effective budget to 70% of the model’s maximum context length. All threshold calculations (hints, compact) operate against this budget. Useful when you want to leave headroom for large tool results.

You can also set it per-session via the CLI: fermi -c context_budget_percent=70.

Opens an interactive picker:

  1. Select start turn — pick where summarization begins
  2. Select end turn — pick where it ends
  3. Focus prompt (optional) — instructions about what to preserve

The selected range is converted to context IDs and summarized.

/summarize

Full context reset with a continuation summary. Optionally provide instructions:

/compact
/compact Preserve the DB schema decisions

Two AGENTS.md files are folded into the system prompt (so they’re present every turn) and survive compact:

  • ~/.fermi/AGENTS.md — Global preferences across all projects
  • <project>/AGENTS.md — Project-specific patterns and conventions

They’re read at session init and on reload (e.g. after editing AGENTS.md or running /reload). The agent reads these for context and can write to them to save long-term knowledge. Use AGENTS.md to store architectural decisions, coding conventions, known constraints, and preferred approaches.

  • Let the system work. In most sessions the three automatic layers handle everything.
  • Use /summarize after exploration. Once you have conclusions from a long investigation, summarize the exploration to free space for execution.
  • Provide a focus prompt. Telling the summarizer what matters makes compression more effective.
  • Adjust context_budget_percent if you routinely hit limits with large files or many tool results.
  • Write to AGENTS.md for knowledge that should persist across sessions — the agent can do this on your behalf.