# gstack development ## Commands ```bash bun install # install dependencies bun test # run free tests (browse + snapshot + skill validation) bun run test:evals # run paid evals: LLM judge + Agent SDK E2E (~$4/run) bun run dev # run CLI in dev mode, e.g. bun run dev goto https://example.com bun run build # gen docs + compile binaries bun run gen:skill-docs # regenerate SKILL.md files from templates bun run skill:check # health dashboard for all skills bun run dev:skill # watch mode: auto-regen + validate on change ``` `test:evals` requires `ANTHROPIC_API_KEY` and must be run from a plain terminal (not inside Claude Code — nested Agent SDK sessions hang). ## Project structure ``` gstack/ ├── browse/ # Headless browser CLI (Playwright) │ ├── src/ # CLI + server + commands │ │ ├── commands.ts # Command registry (single source of truth) │ │ └── snapshot.ts # SNAPSHOT_FLAGS metadata array │ ├── test/ # Integration tests + fixtures │ └── dist/ # Compiled binary ├── scripts/ # Build + DX tooling │ ├── gen-skill-docs.ts # Template → SKILL.md generator │ ├── skill-check.ts # Health dashboard │ └── dev-skill.ts # Watch mode ├── test/ # Skill validation + eval tests │ ├── helpers/ # skill-parser.ts, session-runner.ts, llm-judge.ts │ ├── fixtures/ # Ground truth JSON, planted-bug fixtures, eval baselines │ ├── skill-validation.test.ts # Tier 1: static validation (free, <1s) │ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s) │ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run) │ └── skill-e2e.test.ts # Tier 2: Agent SDK E2E (~$3.85/run) ├── ship/ # Ship workflow skill ├── review/ # PR review skill ├── plan-ceo-review/ # /plan-ceo-review skill ├── plan-eng-review/ # /plan-eng-review skill ├── retro/ # Retrospective skill ├── setup # One-time setup: build binary + symlink skills ├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly) ├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs └── package.json # Build scripts for browse ``` ## SKILL.md workflow SKILL.md files are **generated** from `.tmpl` templates. To update docs: 1. Edit the `.tmpl` file (e.g. `SKILL.md.tmpl` or `browse/SKILL.md.tmpl`) 2. Run `bun run gen:skill-docs` (or `bun run build` which does it automatically) 3. Commit both the `.tmpl` and generated `.md` files To add a new browse command: add it to `browse/src/commands.ts` and rebuild. To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild. ## Browser interaction When you need to interact with a browser (QA, dogfooding, cookie setup), use the `/browse` skill or run the browse binary directly via `$B `. NEVER use `mcp__claude-in-chrome__*` tools — they are slow, unreliable, and not what this project uses. ## Deploying to the active skill The active skill lives at `~/.claude/skills/gstack/`. After making changes: 1. Push your branch 2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main` 3. Rebuild: `cd ~/.claude/skills/gstack && bun run build` Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`