Session-review harness (#131)¶
/session-review mines ground-truth Claude Code session transcripts
(~/.claude/projects/<slug>/*.jsonl) to suggest plugin improvements that reduce
re-work, cut token usage, and improve accuracy.
It fills a blind spot. The plugin already measures quality from two angles, both with gaps:
/agent-eval+evals/grade agents on a synthetic fixture corpus — proves an agent can detect a planted issue, says nothing about real behaviour./harness-audit+metrics/analyse effectiveness from self-reported task logs — sparse, and only what the model chose to record about itself.
Neither reads what actually happened: per-turn token usage, tool errors,
failed edits, user corrections, and skill/agent attribution. /session-review
does.
Three stages (the model never reads raw transcripts)¶
| Stage | Component | What it does |
|---|---|---|
| 1. Extract | scripts/session_extract.py (#127) |
Deterministic, zero model tokens. Distills MBs of JSONL into a KB digest capturing all four signal classes equally (token / rework / accuracy / utilization). Privacy: metrics only — never prompt or code content. |
| 2. Analyze | agents/session-analysis.md + skills/session-review/SKILL.md (#128) |
A focused agent reads only the digest and maps aggregated patterns to probable plugin causes. |
| 3. Suggest | reports/session-review-<date>.md (#128) |
Ranked recommendations, each tagged {token \| rework \| accuracy}, naming the target artifact and handing off — never auto-applying. |
Hand-off, not auto-apply¶
| Suggestion | Handed to |
|---|---|
| Config / prompt / convention fix | /feedback-learning |
| Effort re-banding | /harness-audit + .claude/model-ladder.json |
| New / changed detection rule | /agent-eval |
| Token-heavy skill / agent | token-efficiency-review |
Trend persistence (#129)¶
Each run appends a metrics-only record to metrics/session-digest.jsonl so
/harness-audit can consume real-session data alongside the self-reported task
logs. Schema and the harness-audit join are documented in
eval-system.md → "Session-review trend digest".
OSS complements (#130)¶
For continuous quantitative monitoring, reach for ccusage, native
OpenTelemetry, or claude-code-log — they cover what /session-review does not.
/session-review covers the plugin-specific qualitative suggestions they
cannot, since they don't know this plugin's agents and skills. See
session-review-oss-complements.md.