feat: await support in browse js/eval + contributor mode v2 (#104)
* feat: support await in $B js and eval commands
Auto-wrap await expressions in async IIFE context so
$B js "await fetch(...)" works without SyntaxError.
- hasAwait() strips comments before detection
- js: expression wrapping (async()=>(expr))()
- eval: smart wrapping — single-line=expression, multi-line=block
- 6 new unit tests covering async, false-positive, and return semantics
* feat: redesign contributor mode — periodic reflection with 0-10 rating
Replace passive "report when things break" with active reflection:
- Rate gstack experience 0-10 at workflow step boundaries
- Historical calibration example (await bug) anchors the reporting bar
- "What would make this a 10" field focuses on actionable improvements
- Removed category lists in favor of judgment-based assessment
* test: add deterministic contributor mode preamble validation
40 new skill-validation tests (4 checks × 10 skills) verify:
- 0-10 rating scale present
- Calibration example present
- "What would make this a 10" field present
- Periodic reflection (not per-command)
Update existing E2E contributor eval for new report format.
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: improve contributor mode + qa-quick E2E reliability
Contributor mode:
- Add "do not truncate" directive to template — agent was stopping
after "My rating" without completing Steps/Raw output/What would
make this a 10 sections
- Restore assertions for Steps to reproduce and Date footer
QA quick:
- Make test server URL prominent: top of prompt, explicit "already
running" and "do NOT discover ports" instructions
- Bump session timeout 180s→240s and test timeout 240s→300s
- Set B= at top of prompt (was buried in prose)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use flexible assertions for contributor mode E2E
Agent writes thorough reports with creative section names
("Repro Steps" vs "Steps to reproduce"). Match intent not formatting:
- /repro|steps to reproduce/ for reproduction steps
- /date.*2026/ for date footer presence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add E2E eval failure blame protocol
"Not related to our changes" is an extraordinary claim that requires
extraordinary proof. When evals fail during /ship:
1. Run the same eval on main — prove it fails there too
2. If it passes on main, it IS your change — trace the blame
3. If you can't verify, say "unverified" not "pre-existing"
Added to CLAUDE.md and as a comment in skill-e2e.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update CONTRIBUTING.md and BROWSER.md for v0.4.2
CONTRIBUTING.md: update contributor mode description — now describes
periodic 0-10 reflection loop instead of passive friction detection.
BROWSER.md: add js/eval async documentation — await expressions are
auto-wrapped in async context, single-line eval returns values directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore v0.4.2 changelog entries lost during cherry-pick conflict
The base branch detection entries from main were dropped when resolving
the CHANGELOG conflict — should have merged both sets, not replaced.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
fix: dynamic base branch detection across all SKILL templates (v0.3.10) (#81)
* feat: add {{BASE_BRANCH_DETECT}} resolver to gen-skill-docs
DRY placeholder for dynamic base branch detection across PR-targeting
skills. Detects via gh pr view (existing PR base) → gh repo view
(repo default) → fallback to main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: ship skill detects base branch instead of hardcoding main
Replaces ~14 hardcoded 'main' references with dynamic detection via
{{BASE_BRANCH_DETECT}}. Fixes stacked branches and Conductor workspaces
targeting non-main branches. Adds --base <base> to gh pr create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review, qa, plan-ceo-review detect base branch dynamically
Same pattern as ship: replaces hardcoded 'main' with {{BASE_BRANCH_DETECT}}.
Also cleans up qa bash-isms (REPORT_DIR variable, port chaining).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: retro detects default branch instead of hardcoding origin/main
Retro queries commit history (not PR targets), so uses simpler detection:
gh repo view defaultBranchRef. Replaces ~11 origin/main refs with
origin/<default>.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit cross-step references in gstack-upgrade template
Bash blocks are self-contained, but cross-block variable references
(INSTALL_DIR from Step 2) were implicit. Adds prose making them explicit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs+test: SKILL authoring guidance + regression tests
Adds "Writing SKILL templates" section to CLAUDE.md explaining that
templates are prompts, not scripts. Adds validation test catching
hardcoded 'main' in git commands, and resolver content test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update ARCHITECTURE + CONTRIBUTING for new placeholders
Add {{BASE_BRANCH_DETECT}} to ARCHITECTURE.md placeholder list.
Cross-reference CLAUDE.md template authoring guidance from CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing blank line between resolver functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 3 E2E smoke tests for base branch detection
- /review: verifies Step 0 detection + git diff against detected base
- /ship: truncated dry-run (Steps 0-1 only, no push/PR), asserts no
destructive actions
- /retro: verifies default branch detection for git log queries
Covers the {{BASE_BRANCH_DETECT}} resolver path (review), the ship
template's dual abort check, and retro's inline detection pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: contributor mode, session awareness, recommendation format (#90)
* feat: contributor mode, session awareness, universal RECOMMENDATION format
- Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates
- Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions)
- ELI16 mode when 3+ concurrent sessions detected (re-ground user on context)
- Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/
- Universal AskUserQuestion format: context → question → RECOMMENDATION → options
- Update plan-ceo-review and plan-eng-review to reference preamble baseline
- Add vendored symlink awareness section to CLAUDE.md
- Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing
- Add tests for contributor mode and session awareness in generated output
- Add E2E eval for contributor mode report filing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add Enum & Value Completeness to /review critical checklist
New CRITICAL review category that traces new enum values, status strings,
and type constants through every consumer outside the diff. Catches the
class of bugs where a new value is added but not handled in all switch/case
chains, allowlists, or frontend-backend contracts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add CHANGELOG style guide — user-facing, sell the feature
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rewrite v0.4.1 changelog to be user-facing and sell the features
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add evals for RECOMMENDATION format, session awareness, and enum completeness
Free tests (Tier 1): RECOMMENDATION format + session awareness in all
preamble SKILL.md files, enum completeness checklist structure and CRITICAL
classification.
E2E eval: /review catches missed enum handlers when a new status value
is added but not handled in case/switch and notify methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add E2E eval for session awareness ELI16 mode
Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments
branch, verifies the output re-grounds the user with project, branch,
context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: contributor mode eval marked FAIL due to expected browse error
The test intentionally runs a nonexistent binary to trigger contributor
mode. The session runner's browse error detection catches "no such file
or directory...browse" and sets browseErrors, causing recordE2E to mark
passed=false. Override passed to check only exitReason since the browse
error is the expected scenario.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: smart update check with auto-upgrade, snooze backoff, config CLI (v0.3.9) (#62)
* feat: add bin/gstack-config CLI for reading/writing ~/.gstack/config.yaml
Simple get/set/list interface for persistent gstack configuration.
Used by update-check and upgrade skill for auto_upgrade and update_check settings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: smart update check with 12h cache, snooze backoff, config disable
- Reduce cache TTL from 24h to 12h for faster update detection
- Add exponential snooze backoff: 24h → 48h → 1 week (resets on new version)
- Add update_check: false config option to disable checks entirely
- Clear snooze file on just-upgraded
- 14 new tests covering snooze levels, expiry, corruption, and config paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: upgrade skill with auto-upgrade, 4-option prompt, vendored sync
- Auto-upgrade mode via config or GSTACK_AUTO_UPGRADE=1 env var
- 4-option AskUserQuestion: upgrade once, always, not now, never
- Step 4.5: sync local vendored copy after upgrading primary install
- Snooze write with escalating backoff on "Not now"
- Update preamble text in gen-skill-docs for new upgrade flow
- Regenerate all SKILL.md files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: simplify upgrade instructions, move auto-upgrade to completed
README now points to /gstack-upgrade instead of long paste commands.
Auto-upgrade TODO moved to Completed section (v0.3.8).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.9)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61)
* fix: log non-ENOENT errors in ensureStateDir() instead of silently swallowing
Replace bare catch {} with ENOENT-only silence. Non-ENOENT errors (EACCES,
ENOSPC) are now logged to .gstack/browse-server.log. Includes test for
permission-denied scenario with chmod 444.
* feat: merge TODO.md + TODOS.md into unified backlog with shared format reference
Merge TODO.md (roadmap) and TODOS.md (near-term) into one file organized by
skill/component with P0-P4 priority ordering and Completed section. Add shared
review/TODOS-format.md for canonical format. Add static validation tests.
* feat: add 2-tier Greptile reply system with escalation detection
Add reply templates (Tier 1 friendly, Tier 2 firm), explicit escalation
detection algorithm, and severity re-ranking guidance to greptile-triage.md.
* feat: cross-skill TODOS awareness + Greptile template refs in all skills
/ship Step 5.5: auto-detect completed TODOs, offer reorganization.
/review Step 5.5: cross-reference PR against open TODOs.
/plan-ceo-review, /plan-eng-review: TODOS context in planning.
/retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode.
All Greptile-aware skills now reference reply templates and escalation detection.
* chore: bump version and changelog (v0.3.8)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update CONTRIBUTING.md for v0.3.8 changes
Clarify test tier cost table (Tier 3 standalone vs combined), add TODOS.md
to "Things to know", mention Greptile triage in ship workflow description.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Merge pull request #55 from garrytan/v0.3.6-qa-upgrades
feat: E2E observability + eval infrastructure + all skills templated
fix: update check preamble exits 1 when up to date — convert all skills to .tmpl
The `[ -n "$_UPD" ] && echo "$_UPD"` line in 5 skills was missing `|| true`,
causing exit code 1 when the update check finds no update (empty $_UPD).
Fix: convert ship/, review/, plan-ceo-review/, plan-eng-review/, retro/ to
.tmpl templates using {{UPDATE_CHECK}} placeholder (same as browse/qa/etc).
All 9 skills now generated from templates — preamble changes propagate everywhere.
Also: regenerates qa/SKILL.md which had drifted from its template, adds 12 tests
validating the update check preamble exits 0 in all skills, removes completed TODO.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge remote-tracking branch 'origin/main' into v0.3.6-qa-upgrades
# Conflicts:
# test/skill-e2e.test.ts
feat: daily update check + /gstack-upgrade skill (v0.3.4) (#42)
* feat: add daily update check script + /gstack-upgrade skill
bin/gstack-update-check: pure bash, checks VERSION against remote once/day,
outputs UPGRADE_AVAILABLE or JUST_UPGRADED. Uses ~/.gstack/ for state.
gstack-upgrade/SKILL.md: new skill with inline upgrade flow for all preambles.
Detects global-git, local-git, vendored installs. Shows What's New from CHANGELOG.
browse/test/gstack-update-check.test.ts: 10 test cases covering all branch paths.
* refactor: remove version check from find-browse, simplify to binary locator
Delete checkVersion(), readCache(), writeCache(), fetchRemoteSHA(),
resolveSkillDir(), CacheEntry interface, REPO_URL/CACHE_PATH/CACHE_TTL
constants, and META output from find-browse.ts.
Version checking is now handled by bin/gstack-update-check (previous commit).
* feat: add update check preamble to all 9 skills
Every skill now runs bin/gstack-update-check on invocation. If an upgrade
is available, reads gstack-upgrade/SKILL.md inline upgrade flow.
Also adds AskUserQuestion to 5 skills that lacked it (gstack root, browse,
qa, retro, setup-browser-cookies) and Bash to plan-eng-review.
Simplifies qa and setup-browser-cookies setup blocks (removes META parsing).
* chore: bump version and changelog (v0.3.4)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: remove unused import + add corrupt cache test
Address pre-landing review findings:
- Remove unused mkdirSync import from gstack-update-check.test.ts
- Add Path I test: corrupt cache file falls through to remote fetch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: dual greptile-history paths (per-project + global)
- Suppressions read from ~/.gstack/projects/{slug}/greptile-history.md
- Triage outcomes write to both per-project and global files
- greptile-triage.md: remote-slug derivation, dual-write instructions
- review/SKILL.md + ship/SKILL.md: updated save path references
- TODO: add smart default QA tier (P2, S)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)
* fix: cookie import picker returns JSON instead of HTML
jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.
Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add help command to browse server
Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: version-aware find-browse with META signal protocol
Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.
- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: clean up .bun-build temp files after compile
bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: make help command reachable by removing it from META_COMMANDS
help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add shared Greptile comment triage reference doc
Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.
* feat: make /review and /ship Greptile-aware
/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.
/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.
* feat: add Greptile batting average to /retro
Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.
* docs: add Greptile integration section to README
Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.
* feat: add local dev mode for testing skills from within the repo
bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: narrow gitignore to .claude/skills/ instead of all .claude/
Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md
Rewritten as a contributor-friendly guide instead of a dry plan doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add browser interaction guidance to CLAUDE.md
Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add shared config module for project-local browse state
Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: rewrite port selection to use random ports
Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: move browse state from /tmp to project-local .gstack/
CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update crash log path reference to .gstack/
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add config tests and update CLI lifecycle test
14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update BROWSER.md and TODO.md for project-local state
Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2
- README: replace Conductor-aware language with project-local isolation,
add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff
When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update CHANGELOG for complete v0.3.2 coverage
Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>