~cytrogen/gstack: feat: session intelligence roadmap + design doc (#727)

2 files changed, 245 insertions(+), 0 deletions(-)

M TODOS.md
A docs/designs/SESSION_INTELLIGENCE.md

M TODOS.md => TODOS.md +110 -0

@@ 646,6 646,116 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr
 **Priority:** P3
 **Depends on:** Telemetry data showing freeze hook fires in real /investigate sessions
 
+## Context Intelligence
+
+### Context recovery preamble
+
+**What:** Add ~10 lines of prose to the preamble telling the agent to re-read gstack artifacts (CEO plans, design reviews, eng reviews, checkpoints) after compaction or context degradation.
+
+**Why:** gstack skills produce valuable artifacts stored at `~/.gstack/projects/$SLUG/`. When Claude's auto-compaction fires, it preserves a generic summary but doesn't know these artifacts exist. The plans and reviews that shaped the current work silently vanish from context, even though they're still on disk. This is the thing nobody else in the Claude Code ecosystem is solving, because nobody else has gstack's artifact architecture.
+
+**Context:** Inspired by Anthropic's `claude-progress.txt` pattern for long-running agents. Also informed by claude-mem's "progressive disclosure" approach. See `docs/designs/SESSION_INTELLIGENCE.md` for the broader vision. CEO plan: `~/.gstack/projects/garrytan-gstack/ceo-plans/2026-03-31-session-intelligence-layer.md`.
+
+**Effort:** S (human: ~30 min / CC: ~5 min)
+**Priority:** P1
+**Depends on:** None
+**Key files:** `scripts/resolvers/preamble.ts`
+
+### Session timeline
+
+**What:** Append one-line JSONL entry to `~/.gstack/projects/$SLUG/timeline.jsonl` after every skill run (timestamp, skill, branch, outcome). `/retro` renders the timeline.
+
+**Why:** Makes AI-assisted work history visible. `/retro` can show "this week: 3 /review, 2 /ship, 1 /investigate." Provides the observability layer for the session intelligence architecture.
+
+**Effort:** S (human: ~1h / CC: ~5 min)
+**Priority:** P1
+**Depends on:** None
+**Key files:** `scripts/resolvers/preamble.ts`, `retro/SKILL.md.tmpl`
+
+### Cross-session context injection
+
+**What:** When a new gstack session starts on a branch with recent checkpoints or plans, the preamble prints a one-line summary: "Last session: implemented JWT auth, 3/5 tasks done." Agent knows where you left off before reading any files.
+
+**Why:** Claude starts every session fresh. This one-liner orients the agent immediately. Similar to claude-mem's SessionStart hook pattern but simpler and integrated.
+
+**Effort:** S (human: ~2h / CC: ~10 min)
+**Priority:** P2
+**Depends on:** Context recovery preamble
+
+### /checkpoint skill
+
+**What:** Manual skill to snapshot current working state: what's being done and why, files being edited, decisions made (and rationale), what's done vs. remaining, critical types/signatures. Saved to `~/.gstack/projects/$SLUG/checkpoints/<timestamp>.md`.
+
+**Why:** Useful before stepping away from a long session, before known-complex operations that might trigger compaction, for handing off context to a different agent/workspace, or coming back to a project after days away.
+
+**Effort:** M (human: ~1 week / CC: ~30 min)
+**Priority:** P2
+**Depends on:** Context recovery preamble
+**Key files:** New `checkpoint/SKILL.md.tmpl`, `scripts/gen-skill-docs.ts`
+
+### Session Intelligence Layer design doc
+
+**What:** Write `docs/designs/SESSION_INTELLIGENCE.md` describing the architectural vision: gstack as the persistent brain that survives Claude's ephemeral context. Every skill writes to `~/.gstack/projects/$SLUG/`, preamble re-reads, `/retro` rolls up.
+
+**Why:** Connects context recovery, health, checkpoint, and timeline features into a coherent architecture. Nobody else in the ecosystem is building this.
+
+**Effort:** S (human: ~2h / CC: ~15 min)
+**Priority:** P1
+**Depends on:** None
+
+## Health
+
+### /health — Project Health Dashboard
+
+**What:** Skill that runs type-check, lint, test suite, and dead code scan, then reports a composite 0-10 health score with breakdown by category. Tracks over time in `~/.gstack/health/<project-slug>/` for trend detection. Optionally integrates CodeScene MCP for deeper complexity/cohesion/coupling analysis.
+
+**Why:** No quick way to get "state of the codebase" before starting work. CodeScene peer-reviewed research shows AI-generated code increases static analysis warnings by 30%, code complexity by 41%, and change failure rates by 30%. Users need guardrails. Like `/qa` but for code quality rather than browser behavior.
+
+**Context:** Reads CLAUDE.md for project-specific commands (platform-agnostic principle). Runs checks in parallel. `/retro` can pull from health history for trend sparklines.
+
+**Effort:** M (human: ~1 week / CC: ~30 min)
+**Priority:** P1
+**Depends on:** None
+**Key files:** New `health/SKILL.md.tmpl`, `scripts/gen-skill-docs.ts`
+
+### /health as /ship gate
+
+**What:** If health score exists and drops below a configurable threshold, `/ship` warns before creating the PR: "Health dropped from 8/10 to 5/10 this branch — 3 new lint warnings, 1 test failure. Ship anyway?"
+
+**Why:** Quality gate that prevents shipping degraded code. Configurable threshold so it's not blocking for teams that don't use `/health`.
+
+**Effort:** S (human: ~1h / CC: ~5 min)
+**Priority:** P2
+**Depends on:** /health skill
+
+## Swarm
+
+### Swarm primitive — reusable multi-agent dispatch
+
+**What:** Extract Review Army's dispatch pattern into a reusable resolver (`scripts/resolvers/swarm.ts`). Wire into `/ship` for parallel pre-ship checks (type-check + lint + test in parallel sub-agents). Make available to `/qa`, `/investigate`, `/health`.
+
+**Why:** Review Army proved parallel sub-agents work brilliantly (5 agents = 835K tokens of working memory vs. 167K for one). The pattern is locked inside `review-army.ts`. Other skills need it too. Claude Code Agent Teams (official, Feb 2026) validates the team-lead-delegates-to-specialists pattern. Gartner: multi-agent inquiries surged 1,445% in one year.
+
+**Context:** Start with the specific `/ship` use case. Extract shared parts only after 2+ consumers reveal what config parameters are actually needed. Avoid premature abstraction. Can leverage existing WorktreeManager for isolation.
+
+**Effort:** L (human: ~2 weeks / CC: ~2 hours)
+**Priority:** P2
+**Depends on:** None
+**Key files:** `scripts/resolvers/review-army.ts`, new `scripts/resolvers/swarm.ts`, `ship/SKILL.md.tmpl`, `lib/worktree.ts`
+
+## Refactoring
+
+### /refactor-prep — Pre-Refactor Token Hygiene
+
+**What:** Skill that detects project language/framework, runs appropriate dead code detection (knip/ts-prune for TS/JS, vulture/autoflake for Python, staticcheck/deadcode for Go, cargo udeps for Rust), strips dead imports/exports/props/console.logs, and commits cleanup separately.
+
+**Why:** Dirty codebases accelerate context compaction. Dead imports, unused exports, and orphaned code eat tokens that contribute nothing but everything to triggering compaction mid-refactor. Cleaning first buys back 20%+ of context budget. Reports lines removed and estimated token savings.
+
+**Effort:** M (human: ~1 week / CC: ~30 min)
+**Priority:** P2
+**Depends on:** None
+**Key files:** New `refactor-prep/SKILL.md.tmpl`, `scripts/gen-skill-docs.ts`
+
 ## Factory Droid
 
 ### Browse MCP server for Factory Droid

A docs/designs/SESSION_INTELLIGENCE.md => docs/designs/SESSION_INTELLIGENCE.md +135 -0

@@ 0,0 1,135 @@
+# Session Intelligence Layer
+
+## The Problem
+
+Claude Code's context window is ephemeral. Every session starts fresh. When
+auto-compaction fires at ~167K tokens, it preserves a generic summary but
+destroys file reads, reasoning chains, and intermediate decisions.
+
+gstack already produces valuable artifacts that survive on disk: CEO plans,
+eng reviews, design reviews, QA reports, learnings. These files contain
+decisions, constraints, and context that shaped the current work. But Claude
+doesn't know they exist. After compaction, the plans and reviews that
+informed every decision silently vanish from context.
+
+The ecosystem is working on this. claude-mem (9K+ stars) captures tool usage
+and injects context into future sessions. Claude HUD shows real-time agent
+status. Anthropic's own `claude-progress.txt` pattern uses a progress file
+that agents read at the start of each session.
+
+Nobody is solving the specific problem of making **skill-produced artifacts**
+survive compaction. Because nobody else has gstack's artifact architecture.
+
+## The Insight
+
+gstack already writes structured artifacts to `~/.gstack/projects/$SLUG/`:
+- CEO plans: `ceo-plans/`
+- Design reviews: `design-reviews/`
+- Eng reviews: `eng-reviews/`
+- Learnings: `learnings.jsonl`
+- Skill usage: `../analytics/skill-usage.jsonl`
+
+The missing piece is not storage. It's awareness. The preamble needs to tell
+the agent: "These files exist. They contain decisions you've already made.
+After compaction, re-read them."
+
+## The Architecture
+
+```
+                   ┌─────────────────────────────────────┐
+                   │        Claude Context Window         │
+                   │   (ephemeral, ~167K token limit)     │
+                   │                                      │
+                   │   Compaction fires ──► summary only   │
+                   └──────────────┬──────────────────────┘
+                                  │
+                          reads on start / after compaction
+                                  │
+                   ┌──────────────▼──────────────────────┐
+                   │    ~/.gstack/projects/$SLUG/         │
+                   │    (persistent, survives everything) │
+                   │                                      │
+                   │  ceo-plans/         ← /plan-ceo-review
+                   │  eng-reviews/       ← /plan-eng-review
+                   │  design-reviews/    ← /plan-design-review
+                   │  checkpoints/       ← /checkpoint (new)
+                   │  timeline.jsonl     ← every skill (new)
+                   │  learnings.jsonl    ← /learn
+                   └─────────────────────────────────────┘
+                                  │
+                          rolled up weekly
+                                  │
+                   ┌──────────────▼──────────────────────┐
+                   │           /retro                      │
+                   │  Timeline: 3 /review, 2 /ship, ...   │
+                   │  Health trends: compile 8/10 (↑2)     │
+                   │  Learnings applied: 4 this week       │
+                   └─────────────────────────────────────┘
+```
+
+## The Features
+
+### Layer 1: Context Recovery (preamble, all skills)
+~10 lines of prose in the preamble. After compaction or context degradation,
+the agent checks `~/.gstack/projects/$SLUG/` for recent plans, reviews, and
+checkpoints. Lists the directory, reads the most recent file.
+
+Cost: near-zero. Benefit: every skill's plans/reviews survive compaction.
+
+### Layer 2: Session Timeline (preamble, all skills)
+Every skill appends a one-line JSONL entry to `timeline.jsonl`: timestamp,
+skill name, branch, key outcome. `/retro` renders it.
+
+Makes the project's AI-assisted work history visible. "This week: 3 /review,
+2 /ship, 1 /investigate across branches feature-auth and fix-billing."
+
+### Layer 3: Cross-Session Injection (preamble, all skills)
+When a new session starts on a branch with recent artifacts, the preamble
+prints a one-liner: "Last session: implemented JWT auth, 3/5 tasks done.
+Plan: ~/.gstack/projects/$SLUG/checkpoints/latest.md"
+
+The agent knows where you left off before reading any files.
+
+### Layer 4: /checkpoint (opt-in skill)
+Manual snapshot of working state: what's being done, files being edited,
+decisions made, what's remaining. Useful before stepping away, before
+complex operations, for workspace handoffs, or coming back after days.
+
+### Layer 5: /health (opt-in skill)
+Code quality dashboard: type-check, lint, test suite, dead code scan.
+Composite 0-10 score. Tracks over time. `/retro` shows trends. `/ship`
+gates on configurable threshold.
+
+## The Compounding Effect
+
+Each feature is independently useful. Together, they create something
+that compounds:
+
+Session 1: /plan-ceo-review produces a plan. Saved to disk.
+Session 2: Agent reads the plan after preamble. Doesn't re-ask decisions.
+Session 3: /checkpoint saves progress. Timeline shows 2 /review, 1 /ship.
+Session 4: Compaction fires mid-refactor. Agent re-reads the checkpoint.
+           Recovers key decisions, types, remaining work. Continues.
+Session 5: /retro rolls up the week. Health trend: 6/10 → 8/10.
+           Timeline shows 12 skill invocations across 3 branches.
+
+The project's AI history is no longer ephemeral. It persists, compounds,
+and makes every future session smarter. That's the session intelligence
+layer.
+
+## What This Is Not
+
+- Not a replacement for Claude's built-in compaction (that handles session
+  state; we handle gstack artifacts)
+- Not a full memory system like claude-mem (that handles cross-session
+  memory via SQLite; we handle structured skill artifacts)
+- Not a database or service (just markdown files on disk)
+
+## Research Sources
+
+- [Anthropic: Effective harnesses for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)
+- [Anthropic: Effective context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
+- [claude-mem](https://github.com/thedotmack/claude-mem)
+- [Claude HUD](https://github.com/jarrodwatts/claude-hud)
+- [CodeScene: Agentic AI coding best practices](https://codescene.com/blog/agentic-ai-coding-best-practice-patterns-for-speed-with-quality)
+- [Post-compaction recovery via git-persisted state (Beads)](https://dev.to/jeremy_longshore/building-post-compaction-recovery-for-ai-agent-workflows-with-beads-207l)