feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226)
* refactor: host-aware gen-skill-docs + --host codex generation
Refactor gen-skill-docs.ts for multi-agent support:
- Add Host type, HostPaths interface, HOST_PATHS config
- Decompose generatePreamble() into 7 composable sub-functions
- Replace all hardcoded .claude/skills/gstack paths with ctx.paths
- Replace static findTemplates() list with dynamic filesystem scan
- Add --host codex|agents flag (aliases, same output)
- Add processTemplate host routing to .agents/skills/gstack-*/
- Add codexSkillName() with double-prefix prevention
- Add transformFrontmatter() — keeps only name + description for Codex
- Add extractHookSafetyProse() — converts hooks to inline advisory
- Add body text path rewriting for remaining hardcoded paths
- Exclude /codex skill from Codex generation (self-referential)
Claude output is unchanged (verified via --dry-run).
SKILL.md is an open standard: .agents/skills/ works on Codex, Gemini CLI, and Cursor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: generate Codex/Gemini/Cursor skills into .agents/skills/
Generated 21 skill files for the open SKILL.md standard:
- Output: .agents/skills/gstack-*/SKILL.md (one per skill)
- Frontmatter: name + description only (no allowed-tools/version)
- No .claude/skills/ paths in any generated file
- /codex skill excluded (Claude wrapper, self-referential on Codex)
- Hook skills (careful/freeze/guard) get inline safety prose
- Build script generates both hosts: bun run build
Supported agents (all read .agents/skills/):
- Codex CLI
- Gemini CLI
- Cursor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: dual-host setup + find-browse for Codex/Gemini/Cursor
- setup: add --host codex|claude|auto flag, install to ~/.codex/skills/
when targeting Codex, auto-detect installed agents
- find-browse: priority chain .codex > .agents > .claude (both
workspace-local and global)
- dev-setup/teardown: create .agents/skills/gstack symlinks for dev mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: Codex generation tests + CI + docs for multi-agent support
Tests (28 new):
- Codex output path routing, frontmatter validation (name+description only)
- No .claude/skills/ path leaks in Codex output (regression guard)
- /codex skill exclusion, hook→prose conversion, multiline YAML
- --host agents alias, dynamic template discovery
- Codex skill validation + $B command validation
- find-browse priority chain verification
- Replace static ALL_SKILLS list with dynamic filesystem scan
CI:
- Add Codex freshness check to skill-docs workflow
Docs:
- AGENTS.md: Codex-facing project instructions
- README: multi-agent installation section
- CONTRIBUTING: dual-host development workflow
- CHANGELOG: v0.9.0 multi-agent support entry
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Codex E2E test harness — verify skills work on Codex CLI
New test infrastructure:
- CodexSessionRunner: spawns codex exec, parses JSONL stream, returns
structured results (output, reasoning, toolCalls, tokens)
- JSONL parser ported from Python (codex/SKILL.md.tmpl) to TypeScript
- Temp HOME skill installation for Codex discovery testing
E2E tests (gated behind EVALS=1 + codex + OPENAI_API_KEY):
- codex-discover-skill: installs skill, verifies Codex finds it
- codex-review-findings: runs gstack-review via Codex, validates output
Integrates with existing eval infrastructure:
- Diff-based test selection via touchfiles
- Eval persistence via EvalCollector
- bun run test:codex / test:codex:all convenience scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: bump VERSION to 0.9.0 to match CHANGELOG
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Codex sidecar paths + setup installs generated skills
Two bugs found by Codex adversarial review:
1. Sidecar path mismatch: generated Codex skills referenced
.agents/skills/gstack-review/checklist.md but setup creates
sidecars at .agents/skills/gstack/review/. Fixed path rewriter
to emit .agents/skills/gstack/review/ (matching setup layout).
2. Setup installed Claude-format source dirs for Codex global
install instead of the generated Codex-format skills. Split
link_skill_dirs into link_claude_skill_dirs (source dirs for
Claude) and link_codex_skill_dirs (generated .agents/skills/
gstack-* dirs for Codex).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: comprehensive Codex path rewriting + setup install tests
17 new tests covering:
- Sidecar path rewriting: .claude/skills/review → .agents/skills/gstack/review/
(catches the bug where checklist.md was unreachable at gstack-review/)
- All 4 path rewrite rules tested individually across all skills
- Greptile triage sidecar path correctness
- Ship skill sidecar paths for pre-landing review
- Claude output regression guard: zero Codex paths in any Claude skill
- Setup script validation: separate link functions for Claude vs Codex,
link_codex_skill_dirs reads from .agents/skills/, create_agents_sidecar
links runtime assets (bin, browse, review, qa)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: regenerate Codex skills after investigate rename merge
Remove stale gstack-debug, add gstack-investigate, regenerate all
Codex skills to pick up changes merged from main (investigate rename,
platform-agnostic templates, review helpers).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Codex E2E uses ~/.codex/ auth, not OPENAI_API_KEY
- Remove OPENAI_API_KEY gate from test prerequisites
- Copy real ~/.codex/ auth config into temp HOME so codex can authenticate
- Increase review test timeout to 540s (codex does thorough 60+ tool call reviews)
- Document in CLAUDE.md that Codex uses its own auth config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41)
* refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata
- NEW: browse/src/commands.ts — command sets + COMMAND_DESCRIPTIONS + load-time validation (zero side effects)
- server.ts imports from commands.ts instead of declaring sets inline
- snapshot.ts: SNAPSHOT_FLAGS array drives parseSnapshotArgs (metadata-driven, no duplication)
- All 186 existing tests pass
* feat: SKILL.md template system with auto-generated command references
- SKILL.md.tmpl + browse/SKILL.md.tmpl with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders
- scripts/gen-skill-docs.ts generates SKILL.md from templates (supports --dry-run)
- Build pipeline runs gen:skill-docs before binary compilation
- Generated files have AUTO-GENERATED header, committed to git
* test: Tier 1 static validation — 34 tests for SKILL.md command correctness
- test/helpers/skill-parser.ts: extracts $B commands from code blocks, validates against registry
- test/skill-parser.test.ts: 13 parser/validator unit tests
- test/skill-validation.test.ts: 13 tests validating all SKILL.md files + registry consistency
- test/gen-skill-docs.test.ts: 8 generator tests (categories, sorting, freshness)
* feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding
- scripts/skill-check.ts: health summary for all SKILL.md files (commands, templates, freshness)
- scripts/dev-skill.ts: watch mode for template development
- test/helpers/session-runner.ts: Agent SDK wrapper for E2E skill tests
- test/skill-e2e.test.ts: 2 E2E tests + 3 stubs (auto-skip inside Claude Code sessions)
- E2E tests must run from plain terminal: SKILL_E2E=1 bun test test/skill-e2e.test.ts
* ci: SKILL.md freshness check on push/PR + TODO updates
- .github/workflows/skill-docs.yml: fails if generated SKILL.md files are stale
- TODO.md: add E2E cost tracking and model pinning to future ideas
* fix: restore rich descriptions lost in auto-generation
- Snapshot flags: add back value hints (-d <N>, -s <sel>, -o <path>)
- Snapshot flags: restore parenthetical context (@e refs, @c refs, etc.)
- Commands: is → includes valid states enum
- Commands: console → notes --errors filter behavior
- Commands: press → lists common keys (Enter, Tab, Escape)
- Commands: cookie-import-browser → describes picker UI
- Commands: dialog-accept → specifies alert/confirm/prompt
- Tips: restore → arrow (was downgraded to ->)
* test: quality evals for generated SKILL.md descriptions
Catches the exact regressions we shipped and caught in review:
- Snapshot flags must include value hints (-d <N>, -s <sel>, -o <path>)
- is command must list all valid states (visible/hidden/enabled/...)
- press command must list example keys (Enter, Tab, Escape)
- console command must describe --errors behavior
- Snapshot -i must mention @e refs, -C must mention @c refs
- All descriptions must be >= 8 chars (no empty stubs)
- Tips section must use → not ->
* feat: LLM-as-judge evals for SKILL.md documentation quality
4 eval tests using Anthropic API (claude-haiku, ~$0.01-0.03/run):
- Command reference table: clarity/completeness/actionability >= 4/5
- Snapshot flags section: same thresholds
- browse/SKILL.md overall quality
- Regression: generated version must score >= hand-maintained baseline
Requires ANTHROPIC_API_KEY. Auto-skips without it.
Run: bun run test:eval (or ANTHROPIC_API_KEY=sk-... bun test test/skill-llm-eval.test.ts)
* chore: bump version to 0.3.3, update changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add ARCHITECTURE.md, update CLAUDE.md and CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: conductor.json lifecycle hooks + .env propagation across worktrees
bin/dev-setup now copies .env from main worktree so API keys carry
over to Conductor workspaces automatically. conductor.json wires up
setup and archive hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: complete CHANGELOG for v0.3.3 (architecture, conductor, .env)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)
* fix: cookie import picker returns JSON instead of HTML
jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.
Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add help command to browse server
Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: version-aware find-browse with META signal protocol
Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.
- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: clean up .bun-build temp files after compile
bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: make help command reachable by removing it from META_COMMANDS
help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add shared Greptile comment triage reference doc
Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.
* feat: make /review and /ship Greptile-aware
/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.
/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.
* feat: add Greptile batting average to /retro
Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.
* docs: add Greptile integration section to README
Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.
* feat: add local dev mode for testing skills from within the repo
bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: narrow gitignore to .claude/skills/ instead of all .claude/
Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md
Rewritten as a contributor-friendly guide instead of a dry plan doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add browser interaction guidance to CLAUDE.md
Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add shared config module for project-local browse state
Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: rewrite port selection to use random ports
Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: move browse state from /tmp to project-local .gstack/
CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update crash log path reference to .gstack/
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add config tests and update CLI lifecycle test
14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update BROWSER.md and TODO.md for project-local state
Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2
- README: replace Conductor-aware language with project-local isolation,
add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff
When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update CHANGELOG for complete v0.3.2 coverage
Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>