Context Management

Fermi’s context management is the core feature that enables long sessions. Instead of hitting context limits and performing a blind reset, the system monitors usage, compresses strategically, and only resets as a last resort. The agent can inspect its own context distribution and surgically summarize selected blocks — down to a single tool call result.

Three Layers

1. Hint Compression

As context grows, the system injects guidance at two thresholds:

Level	Default trigger	What the agent sees
Level 1	50% of budget	A nudge to call `show_context` and consider summarizing older groups
Level 2	75% of budget	A stronger prompt to summarize immediately before auto-compact triggers

Hysteresis prevents oscillation — once a hint fires, context must drop meaningfully before the hint can fire again.

Tune (or disable) these triggers with the /summarize_hint command — /summarize_hint on | off | <level1> <level2>, where the two integers must satisfy 0 < level1 < level2 < 85. The change persists in settings.

2. Agent-Initiated Summarization

The agent has two tools for fine-grained context control:

`show_context`

Returns a self-contained context map showing all context groups with their IDs, sizes, and types. It injects nothing into the existing conversation (preserving the prompt cache) — the map itself tells the agent what each context ID covers.

`summarize_context`

Operates on groups of spatially contiguous context IDs. For each group, the agent writes a distilled summary that preserves decisions, key facts, code references, and unresolved issues — then the original content is replaced by the summary.

The key property: this is append-only. Original content is never deleted — summaries are appended, and the system dynamically determines what is visible based on what has been summarized. This means summarization is safe and reversible at the system level.

summarize_context targets specific ranges. For whole-window reset when the context limit is reached, the system uses auto-compact (a separate mechanism, also exposed as the /compact user command).

3. Auto-Compact

When context reaches critical levels despite hints and summarization:

Trigger	Default threshold	When it fires
Before-turn	85%	Before processing the next user message
Mid-turn	90%	After a tool call result pushes context over the limit

Auto-compact produces a continuation prompt — a comprehensive briefing that lets the agent resume exactly where it left off. After compact, the context window contains only:

The continuation prompt
The current plan snapshot (if a plan is active)
AGENTS.md files (persistent memory)

Before-turn compact is interruptible: pressing Ctrl+C cancels the compact and preserves the original context.

User vs. Agent Summarization

	`/summarize` (user)	`summarize_context` tool (agent)
Trigger	User runs the slash command	Agent decides autonomously (or prompted by hints)
Selection	Interactive picker: choose start/end turn range	Agent picks context IDs after inspecting the map
Focus	Optional focus prompt (“Keep the auth details”)	Agent writes the summary directly
Granularity	Turn-level ranges	Can target individual tool results

Context Budget

The effective context size can be restricted without switching models. In ~/.fermi/settings.json (or <project>/.fermi/settings.json for per-project override):

{
  "context_budget_percent": 70
}

This sets the effective budget to 70% of the model’s maximum context length. All threshold calculations (hints, compact) operate against this budget. Useful when you want to leave headroom for large tool results.

You can also set it per-session via the CLI: fermi -c context_budget_percent=70.

Manual Intervention

`/summarize`

Opens an interactive picker:

Select start turn — pick where summarization begins
Select end turn — pick where it ends
Focus prompt (optional) — instructions about what to preserve

The selected range is converted to context IDs and summarized.

/summarize

`/compact`

Full context reset with a continuation summary. Optionally provide instructions:

/compact
/compact Preserve the DB schema decisions

AGENTS.md — Persistent Memory

Two AGENTS.md files are folded into the system prompt (so they’re present every turn) and survive compact:

~/.fermi/AGENTS.md — Global preferences across all projects
<project>/AGENTS.md — Project-specific patterns and conventions

They’re read at session init and whenever the agent’s reload tool runs (e.g. after the agent edits AGENTS.md). The agent reads these for context and can write to them to save long-term knowledge. Use AGENTS.md to store architectural decisions, coding conventions, known constraints, and preferred approaches.

Practical Tips

Let the system work. In most sessions the three automatic layers handle everything.
Use /summarize after exploration. Once you have conclusions from a long investigation, summarize the exploration to free space for execution.
Provide a focus prompt. Telling the summarizer what matters makes compression more effective.
Adjust context_budget_percent if you routinely hit limits with large files or many tool results.
Write to AGENTS.md for knowledge that should persist across sessions — the agent can do this on your behalf.