/plan-design-review opens your site and reviews it like a senior product designer — typography, spacing, hierarchy, color, responsive, interactions, and AI slop detection. Get letter grades (A-F) per category, a dual headline "Design Score" + "AI Slop Score", and a structured first impression that doesn't pull punches./qa-design-review runs the same designer's eye audit, then iteratively fixes design issues in your source code with atomic style(design): commits and before/after screenshots. CSS-safe by default, with a stricter self-regulation heuristic tuned for styling changes.DESIGN.md baseline. Finally know how many fonts you're actually using.design-baseline.json. Next run auto-compares: per-category grade deltas, new findings, resolved findings. Watch your design score improve over time.{{DESIGN_METHODOLOGY}} resolver to gen-skill-docs.ts — shared design audit methodology injected into both /plan-design-review and /qa-design-review templates, following the {{QA_METHODOLOGY}} pattern.~/.gstack-dev/plans/ as a local plans directory for long-range vision docs (not checked in). CLAUDE.md and TODOS.md updated./setup-design-md to TODOS.md (P2) for interactive DESIGN.md creation from scratch./review and /ship used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see [AUTO-FIXED] file:line Problem → what was done for each auto-fix.review/checklist.md) so both /review and /ship stay in sync.$B js "const x = await fetch(...); return x.status" now works. The js command used to wrap everything as an expression — so const, semicolons, and multi-line code all broke. It now detects statements and uses a block wrapper, just like eval already did.@e3 [option] "Admin" in a snapshot and runs click @e3, gstack now auto-selects that option instead of hanging on an impossible Playwright click. The right thing just happens.<option> via CSS selector used to time out with a cryptic Playwright error. Now you get: "Use 'browse select' instead of 'click' for dropdown options."review/checklist.md — the canonical AUTO-FIX vs ASK classification.Fix-First Heuristic exists in checklist and is referenced by review + ship.needsBlockWrapper() and wrapForEvaluate() helpers in read-commands.ts — shared by both js and eval commands (DRY).getRefRole() to BrowserManager — exposes ARIA role for ref selectors without changing resolveRef return type.[role=option] refs to selectOption() via parent <select>, with DOM tagName check to avoid blocking custom listbox components./gstack-upgrade always checks for real. Running /gstack-upgrade directly now bypasses the cache and does a fresh check against GitHub. No more "you're already on the latest" when you're not.last-update-check cache TTL: 60 min for UP_TO_DATE, 720 min for UPGRADE_AVAILABLE.--force flag to bin/gstack-update-check (deletes cache file before checking).--force busts UP_TO_DATE cache, --force busts UPGRADE_AVAILABLE cache, 60-min TTL boundary test with utimesSync./document-release skill. Run it after /ship but before merging — it reads every doc file in your project, cross-references the diff, and updates README, ARCHITECTURE, CONTRIBUTING, CHANGELOG, and TODOS to match what you actually shipped. Risky changes get surfaced as questions; everything else is automatic._SESSIONS >= 3 conditional._BRANCH detection to preamble bash block (git branch --show-current with fallback).$B js "await fetch(...)" now just works. Any await expression in $B js or $B eval is automatically wrapped in an async context. No more SyntaxError: await is only valid in async functions. Single-line eval files return values directly; multi-line files use explicit return./ship, /review, /qa, and /plan-ceo-review detect which branch your PR actually targets instead of assuming main. Stacked branches, Conductor workspaces targeting feature branches, and repos using master all just work now./retro works on any default branch. Repos using master, develop, or other default branch names are detected automatically — no more empty retros because the branch name was wrong.{{BASE_BRANCH_DETECT}} placeholder for skill authors — drop it into any template and get 3-step branch detection (PR base → repo default → fallback) for free.hasAwait() helper with comment-stripping to avoid false positives on // await in eval files.(...), multi-line → block {...} with explicit return..tmpl files for git commands with hardcoded main.REPORT_DIR shell variable, simplified port detection to prose.gstack-config set gstack_contributor true) and gstack automatically writes up what went wrong — what you were doing, what broke, repro steps. Next time something annoys you, the bug report is already written. Fork gstack and fix it yourself.{{UPDATE_CHECK}} to {{PREAMBLE}} across all 11 skill templates — one startup block now handles update check, session tracking, contributor mode, and question formatting./qa-only) — report-only QA mode that finds and documents bugs without making fixes. Hand off a clean bug report to your team without the agent touching your code./qa now runs a find-fix-verify cycle: discover bugs, fix them, commit, re-navigate to confirm the fix took. One command to go from broken to shipped./plan-eng-review writes test-plan artifacts that /qa picks up automatically. Your engineering review now feeds directly into QA testing with no manual copy-paste.{{QA_METHODOLOGY}} DRY placeholder — shared QA methodology block injected into both /qa and /qa-only templates. Keeps both skills in sync when you update testing standards.generateCommentary() engine — interprets comparison deltas so you don't have to: flags regressions, notes improvements, and produces an overall efficiency summary.bun run eval:list now shows Turns and Duration per run. Spot expensive or slow runs instantly.bun run eval:summary shows average turns/duration/cost per test across runs. Identify which tests are costing you the most over time.judgePassed() unit tests — extracted and tested the pass/fail judgment logic.resolveRef() now checks element count to detect stale refs after page mutations. SPA navigation no longer causes 30-second timeouts on missing elements.formatComparison() now shows per-test turns and duration deltas alongside cost.printSummary() shows turns and duration columns.eval-store.test.ts fixed pre-existing _partial file assertion bug.bin/gstack-config CLI — simple get/set/list interface for ~/.gstack/config.yaml. Used by update-check and upgrade skill for persistent settings (auto_upgrade, update_check).update_check: false config option to disable checks entirely. Snooze resets when a new version is released.auto_upgrade: true in config or GSTACK_AUTO_UPGRADE=1 env var to skip the upgrade prompt and update automatically./gstack-upgrade now detects and updates local vendored copies in the current project after upgrading the primary install./gstack-upgrade instead of long paste commands.Write tool permission for config editing.TODO.md (roadmap) and TODOS.md (near-term) into one file organized by skill/component with P0-P4 priority ordering and a Completed section./ship Step 5.5: TODOS.md management — auto-detects completed items from the diff, marks them done with version annotations, offers to create/reorganize TODOS.md if missing or unstructured./plan-ceo-review, /plan-eng-review, /retro, /review, and /qa now read TODOS.md for project context. /retro adds Backlog Health metric (open counts, P0/P1 items, churn).review/TODOS-format.md — canonical TODO item format referenced by /ship and /plan-ceo-review to prevent format drift (DRY).greptile-triage.md for fixes (inline diff), already-fixed (what was done), and false positives (evidence + suggested re-rank). Replaces vague one-line replies.**Suggested re-rank:** when Greptile miscategorizes issue severity.TODOS-format.md references across skills..gitignore append failures silently swallowed — ensureStateDir() bare catch {} replaced with ENOENT-only silence; non-ENOENT errors (EACCES, ENOSPC) logged to .gstack/browse-server.log.TODO.md deleted — all items merged into TODOS.md./ship Step 3.75 and /review Step 5 now reference reply templates and escalation detection from greptile-triage.md./ship Step 6 commit ordering includes TODOS.md in the final commit alongside VERSION + CHANGELOG./ship Step 8 PR body includes TODOS section.screenshot command now supports element crop via CSS selector or @ref (screenshot "#hero" out.png, screenshot @e3 out.png), region clip (screenshot --clip x,y,w,h out.png), and viewport-only mode (screenshot --viewport out.png). Uses Playwright's native locator.screenshot() and page.screenshot({ clip }). Full page remains the default.~/.gstack-dev/e2e-live.json), per-run log directory (~/.gstack-dev/e2e-runs/{runId}/), progress.log, per-test NDJSON transcripts, persistent failure transcripts. All I/O non-fatal.bun run eval:watch — live terminal dashboard reads heartbeat + partial eval file every 1s. Shows completed tests, current test with turn/tool info, stale detection (>10min), --tail for progress.log.savePartial() writes _partial-e2e.json after each test completes. Crash-resilient: partial results survive killed runs. Never cleaned up.exit_reason, timeout_at_turn, last_tool_call fields in eval JSON. Enables jq queries for automated fix loops.is_error detection — claude -p can return subtype: "success" with is_error: true on API failures. Now correctly classified as error_api.parseNDJSON() pure function for real-time E2E progress from claude -p --output-format stream-json --verbose.~/.gstack-dev/evals/ with auto-comparison against previous run.eval:list, eval:compare, eval:summary for inspecting eval history..tmpl templates — plan-ceo-review, plan-eng-review, retro, review, ship now use {{UPDATE_CHECK}} placeholder. Single source of truth for update check preamble.claude -p (~$3.85/run), Tier 3: LLM-as-judge (~$0.15/run). Gated by EVALS=1.test/helpers/skill-parser.ts — getRemoteSlug() for git remote detection.find-browse indirection with explicit browse/dist/browse path in SKILL.md setup blocks.|| true to prevent non-zero exit when no update available.{{BROWSE_SETUP}} placeholder.{{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders in gen-skill-docs.ts. All browse-using skills generate from single source of truth.generateHelpText() auto-generated from COMMAND_DESCRIPTIONS (replaces hand-maintained help text)..tmpl files with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders, auto-generated from source code at build time. Structurally prevents command drift between docs and code.browse/src/commands.ts) — single source of truth for all browse commands with categories and enriched descriptions. Zero side effects, safe to import from build scripts and tests.SNAPSHOT_FLAGS array in browse/src/snapshot.ts) — metadata-driven parser replaces hand-coded switch/case. Adding a flag in one place updates the parser, docs, and tests.$B commands from SKILL.md code blocks, validates against command registry and snapshot flag metadataSKILL_E2E=1 env var (~$0.50/run)ANTHROPIC_API_KEYbun run skill:check — health dashboard showing all skills, command counts, validation status, template freshnessbun run dev:skill — watch mode that regenerates and validates SKILL.md on every template or source file change.github/workflows/skill-docs.yml) — runs gen:skill-docs on push/PR, fails if generated output differs from committed filesbun run gen:skill-docs script for manual regenerationbun run test:eval for LLM-as-judge evalstest/helpers/skill-parser.ts — extracts and validates $B commands from Markdowntest/helpers/session-runner.ts — Agent SDK wrapper with error pattern scanning and transcript savingconductor.json) — lifecycle hooks for workspace setup/teardown.env propagation — bin/dev-setup copies .env from main worktree into Conductor workspaces automatically.env.example template for API key configurationgen:skill-docs before compiling binariesparseSnapshotArgs is metadata-driven (iterates SNAPSHOT_FLAGS instead of switch/case)server.ts imports command sets from commands.ts instead of declaring inline.tmpl instead)jsonResponse() referenced url out of scope, crashing every API callhelp command routed correctly (was unreachable due to META_COMMANDS dispatch ordering)~/.claude/skills/gstack fallback from resolveServerScript()/tmp/ to .gstack//qa on a feature branch auto-analyzes git diff, identifies affected pages/routes, detects the running app on localhost, and tests only what changed. No URL needed..gstack/ inside the project root (detected via git rev-parse --show-toplevel). No more /tmp state files.browse/src/config.ts) — centralizes path resolution for CLI and server, eliminates duplicated port/state logicbinaryVersion SHA; CLI auto-restarts the server when the binary is rebuilt/tmp/browse-server*.json files, verifying PID ownership before sending signals/review and /ship fetch and triage Greptile bot comments; /retro tracks Greptile batting average across weeksbin/dev-setup symlinks skills from the repo for in-place development; bin/dev-teardown restores global installhelp command — agents can self-discover all commands and snapshot flagsfind-browse with META signal protocol — detects stale binaries and prompts agents to updatebrowse/dist/find-browse compiled binary with git SHA comparison against origin/main (4hr cached).version file written at build time for binary version tracking.gstack/browse.json (was /tmp/browse-server.json).gstack/browse-{console,network,dialog}.log (was /tmp/browse-*.log).json.tmp → rename (prevents partial reads)BROWSE_STATE_FILE to spawned server (server derives all paths from it)META:UPDATE_AVAILABLE/qa SKILL.md now describes four modes (diff-aware, full, quick, regression) with diff-aware as the default on feature branchesjsonResponse/errorResponse use options objects to prevent positional parameter confusionbrowse and find-browse binaries, cleans up .bun-build temp filesCONDUCTOR_PORT magic offset (browse_port = CONDUCTOR_PORT - 45600)~/.claude/skills/gstack/browse/src/server.tsDEVELOPING_GSTACK.md (renamed to CONTRIBUTING.md)cookie-import-browser command — decrypt and import cookies from real Chromium browsers (Comet, Chrome, Arc, Brave, Edge)--domain flag for non-interactive use/setup-browser-cookies skill for Claude Code integration/qa skill with 6-phase workflow (Initialize, Authenticate, Orient, Explore, Document, Wrap up)browse/bin/find-browse — DRY binary discovery using git rev-parse --show-toplevelupload <sel> <file1> [file2...]is visible|hidden|enabled|disabled|checked|editable|focused <sel>snapshot -a)snapshot -D)snapshot -C)wait --networkidle / --load / --domcontentloaded flagsconsole --errors filter (error + warning only)cookie-import <json-file> with auto-fill domain from page URL/browse installs — compiled binary now resolves server.ts from its own directory instead of assuming a global install existssetup rebuilds stale binaries (not just missing ones) and exits non-zero if the build failschain command swallowing real errors from write commands (e.g. navigation timeout reported as "Unknown meta command")ln -snf in setup to avoid creating nested symlinks on upgradegit fetch && git reset --hard instead of git pull for upgrades (handles force-pushes)/retro)Initial release.
/plan-ceo-review, /plan-eng-review, /review, /ship, /browsesetup script for binary compilation and skill symlinking