11. Enforce the context ceiling with a transcript-measured PreToolUse hook¶
Date: 2026-06-25
Status¶
Accepted
Context¶
CLAUDE.md states a 40% Context Window Rule and the context-loading-protocol skill instructs the orchestrator to keep total context under that ceiling and to load agents/skills on demand rather than speculatively. Until now nothing enforced it: the rule lived only as guidance the orchestrator was asked to self-apply, and the protocol's own budget step is an estimate (baseline + history estimate + file sizes). If the orchestrator ignored the rule, context grew unchecked until the harness's own summarization kicked in — independent of the protocol.
Three design questions had non-obvious answers:
- How to measure context occupancy. Options:
- Model self-estimate. Ask the orchestrator to track its own fill. Rejected: a model has no reliable readout of its own context size; this is exactly the proxy the rule already fails on.
-
Transcript usage. The session transcript's assistant-message
usagerecordsinput_tokens,cache_read_input_tokens, andcache_creation_input_tokens— the prompt side of the most recent turn is the ground-truth occupancy the harness saw. A hook can read it fromtranscript_path. Chosen: it is real, not estimated, which is what makes the ceiling enforceable rather than advisory. (Same transcript-usage sourcecost-meter.shalready relies on.) -
Warn or block. Options:
- Block by default. Strongest, but a mis-detected window (see below) would hard-block constantly, and blocking is hostile when the model cannot remove tokens itself.
-
Warn by default, opt-in block. Chosen: mirrors
destructive-guard.sh//careful(warn on exit 0, block on exit 2).DEV_TEAM_CONTEXT_STRICT=onescalates to a hard block. A warn-default also means a wrong window only over-nudges, never bricks a session. -
What to gate. Options:
- All context-growing tools (incl. Read/Grep/Glob). Broad, but noisy on frequent Reads and dangerous to block — you need Reads to summarize and recover.
- Capability loads (
Agent+Skill). Chosen: these are exactly the "don't load speculatively" operations the protocol governs, far less frequent than Read, and safe to block because recovery never needs a fresh agent. Recovery skills (/context-summarization,/context-loading-protocol,/continue,/review-summary,/session-review) are whitelisted so the path back under budget can't deadlock.
A known limitation surfaced during design: the transcript reports the base model id (claude-opus-4-8) with no [1m] suffix and carries no context-window field, so a 1M-context model cannot be auto-distinguished from the 200K base. The window must therefore be resolvable out-of-band. A model-id→window map was considered and rejected: every current Claude model shares the same 200000-token base window (so the map would carry no information), and pinning claude-* snapshot ids in plugin source is forbidden by repo policy (see ADR 8). The window is resolved from an env var with a 200000 default instead.
Decision¶
Add hooks/context-ceiling-guard.sh, a PreToolUse hook registered on Agent and Skill. It computes occupancy from the latest transcript usage line, resolves the context window from DEV_TEAM_CONTEXT_WINDOW (else a 200000 default), and compares occupancy/window to a ceiling (DEV_TEAM_CONTEXT_CEILING_PCT, default 40). At or above the ceiling it warns to stderr (default, deduped by 5-point bucket per session to avoid spam) or blocks (exit 2) under DEV_TEAM_CONTEXT_STRICT=on. It whitelists recovery skills, and is fail-open throughout (DEV_TEAM_CONTEXT_CEILING=off, missing/unparseable transcript, missing jq, or any error → exit 0). On a 1M-context model the operator sets DEV_TEAM_CONTEXT_WINDOW=1000000.
Consequences¶
- The 40% rule is now a measured backstop, not just prose — capability loads over budget surface a concrete nudge (or block) tied to real occupancy.
- The window default (200000) over-nudges on a 1M-context session until
DEV_TEAM_CONTEXT_WINDOWis set; warn-by-default keeps that harmless, and the caveat is documented in the skill, CLAUDE.md, anddocs/agent-architecture.md. - The hook reads
tail -n 400of the transcript per gated call; bounded, but adds a small jq cost toAgent/Skilldispatches (not to Read/Grep/Glob). - Behavior is configurable and reversible via env vars; no schema or contract changes. Tests live in
tests/hooks/context_ceiling_guard.bats.