~cytrogen/gstack

ref: 942df42161f1d73709bc27af636ddc7112f9016f gstack/CLAUDE.md -rw-r--r-- 3.5 KiB
942df421 — Garry Tan simplify: one command for evals — bun run test:evals a month ago

#gstack development

#Commands

bun install          # install dependencies
bun test             # run free tests (browse + snapshot + skill validation)
bun run test:evals   # run paid evals: LLM judge + Agent SDK E2E (~$4/run)
bun run dev <cmd>    # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build        # gen docs + compile binaries
bun run gen:skill-docs  # regenerate SKILL.md files from templates
bun run skill:check  # health dashboard for all skills
bun run dev:skill    # watch mode: auto-regen + validate on change

test:evals requires ANTHROPIC_API_KEY and must be run from a plain terminal (not inside Claude Code — nested Agent SDK sessions hang).

#Project structure

gstack/
├── browse/          # Headless browser CLI (Playwright)
│   ├── src/         # CLI + server + commands
│   │   ├── commands.ts  # Command registry (single source of truth)
│   │   └── snapshot.ts  # SNAPSHOT_FLAGS metadata array
│   ├── test/        # Integration tests + fixtures
│   └── dist/        # Compiled binary
├── scripts/         # Build + DX tooling
│   ├── gen-skill-docs.ts  # Template → SKILL.md generator
│   ├── skill-check.ts     # Health dashboard
│   └── dev-skill.ts       # Watch mode
├── test/            # Skill validation + eval tests
│   ├── helpers/     # skill-parser.ts, session-runner.ts, llm-judge.ts
│   ├── fixtures/    # Ground truth JSON, planted-bug fixtures, eval baselines
│   ├── skill-validation.test.ts  # Tier 1: static validation (free, <1s)
│   ├── gen-skill-docs.test.ts    # Tier 1: generator quality (free, <1s)
│   ├── skill-llm-eval.test.ts   # Tier 3: LLM-as-judge (~$0.15/run)
│   └── skill-e2e.test.ts         # Tier 2: Agent SDK E2E (~$3.85/run)
├── ship/            # Ship workflow skill
├── review/          # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/           # Retrospective skill
├── setup            # One-time setup: build binary + symlink skills
├── SKILL.md         # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl    # Template: edit this, run gen:skill-docs
└── package.json     # Build scripts for browse

#SKILL.md workflow

SKILL.md files are generated from .tmpl templates. To update docs:

  1. Edit the .tmpl file (e.g. SKILL.md.tmpl or browse/SKILL.md.tmpl)
  2. Run bun run gen:skill-docs (or bun run build which does it automatically)
  3. Commit both the .tmpl and generated .md files

To add a new browse command: add it to browse/src/commands.ts and rebuild. To add a snapshot flag: add it to SNAPSHOT_FLAGS in browse/src/snapshot.ts and rebuild.

#Browser interaction

When you need to interact with a browser (QA, dogfooding, cookie setup), use the /browse skill or run the browse binary directly via $B <command>. NEVER use mcp__claude-in-chrome__* tools — they are slow, unreliable, and not what this project uses.

#Deploying to the active skill

The active skill lives at ~/.claude/skills/gstack/. After making changes:

  1. Push your branch
  2. Fetch and reset in the skill directory: cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main
  3. Rebuild: cd ~/.claude/skills/gstack && bun run build

Or copy the binary directly: cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse