fix: gstack-slug bash compatibility — source to eval (#354)
* fix: replace source <(gstack-slug) with eval for bash compatibility
Under bash with set -euo pipefail, source <(cmd) process substitution
doesn't reliably set variables in the caller's scope. The variables
stay empty and -u (nounset) crashes the script. eval "$(cmd)" works
correctly in both bash and zsh.
Fixes: gstack-review-read, gstack-review-log, gstack-slug comment,
gen-skill-docs.ts resolver functions, and regression tests.
* chore: bump version and changelog (v0.11.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
fix: atomic review log helpers + platform-agnostic templates (v0.8.5) (#209)
* fix: add gstack-review-log and gstack-review-read atomic helpers
Branch names with `/` break review log filepaths when Claude Code runs
multi-line bash blocks as separate shell invocations. These two scripts
encapsulate the full operation in a single command.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace multi-line eval+mkdir+echo blocks with atomic helpers
- Review log writes now use gstack-review-log (single command)
- Review dashboard reads now use gstack-review-read (single command)
- Remaining source+mkdir blocks use && chaining for variable persistence
- Regenerated all SKILL.md files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove Rails-isms — platform-agnostic templates and checklist
- review/checklist.md: multi-framework examples (Rails/Node/Python/Django)
- plan-ceo-review: framework-agnostic grep + generic error table
- plan-eng-review: "corresponding test" not "JS or Rails test"
- CLAUDE.md: Platform-agnostic design principle + Testing section
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: update tests for gstack-review-log/read helpers
- codex review log test: check for gstack-review-log instead of reviews.jsonl
- dashboard resolver tests: check for gstack-review instead of reviews.jsonl
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.8.5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: security hardening + issue triage (v0.8.3) (#205)
* fix: check for bun before running setup (#147)
Users without bun installed got a cryptic "command not found" error.
Now prints a clear message with install instructions.
Closes #147
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: block SSRF via URL validation in browse commands (#17)
Adds validateNavigationUrl() that blocks non-HTTP(S) schemes (file://,
javascript:, data:) and cloud metadata endpoints (169.254.169.254,
metadata.google.internal). Applied to goto, diff, and newTab commands.
Localhost and private IPs remain allowed for local dev QA.
Closes #17
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace eval $(gstack-slug) with source <(...) (#133)
Eliminates unnecessary use of eval across all skill templates and
generated files. source <(...) has identical behavior without the
shell injection surface. Also hardens gstack-diff-scope usage.
Closes #133
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: rename /debug to /investigate to avoid Claude Code conflict (#190)
Claude Code has a built-in /debug command that shadows the gstack skill.
Renaming to /investigate which better reflects the systematic root-cause
investigation methodology.
Closes #190
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add unit tests for path validation helpers
validateOutputPath() and validateReadPath() are security-critical
functions with zero test coverage. Adds 14 tests covering safe paths,
traversal attacks, and prefix collision edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.8.3)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update /debug → /investigate references in docs
CLAUDE.md, README.md, and docs/skills.md still referenced the old
/debug skill name after the rename.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: harden URL validation against hostname bypasses (Codex P1)
Codex review found that metadata IPs could be reached via hex
(0xA9FEA9FE), decimal (2852039166), octal, trailing dot, and IPv6
bracket forms. Now normalizes hostnames before checking the blocklist
and probes numeric IP representations via URL constructor.
Also moves URL validation before page allocation in newTab() to
prevent zombie tabs on rejection (Codex P3).
5 new test cases for bypass variants.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: use AskUserQuestion for dirty working tree (v0.7.4) (#200)
* feat: use AskUserQuestion for dirty working tree check
Replace hard exit 1 with interactive AskUserQuestion prompt offering
commit/stash/abort options when /qa or /design-review detects a dirty
working tree.
* chore: bump version and changelog (v0.7.4)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: natural language skill routing + proactive suggestions (v0.7.1) (#195)
* feat: add trigger phrases to /debug and /office-hours
These two skills had zero "Use when asked to..." phrases, making them
completely invisible to natural language. Users saying "debug this" or
"brainstorm an idea" would get no skill invocation.
* feat: add proactive triggers to all workflow skills
Every skill now has "Proactively suggest when..." language so Claude
surfaces skills at natural moments — not just when the user says
specific trigger phrases.
* feat: lifecycle map + proactive preference system
Root gstack description now includes a developer workflow guide mapping
12 stages to skills. Preamble reads proactive preference via gstack-config.
Users can opt out with "stop suggesting things" and re-enable with
"be proactive again" — natural language toggle, no CLI needed.
* test: 11 journey-stage E2E routing tests + trigger phrase validation
Each test simulates a real development stage (ideation, plan review,
debug, QA, ship, retro...) with realistic project context and verifies
the right skill fires from natural language alone. 11/11 pass.
* chore: bump version and changelog (v0.7.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: Test Bootstrap + Regression Tests + Coverage Audit (v0.6.0) (#136)
* feat: test bootstrap, regression tests, coverage audit, retro test health
- Add {{TEST_BOOTSTRAP}} resolver to gen-skill-docs.ts
- Add Phase 8e.5 regression test generation to /qa and /qa-design-review
- Add Step 3.4 test coverage audit with quality scoring to /ship
- Add test health tracking to /retro
- Add 2 E2E evals (bootstrap + coverage audit)
- Add 26 validation tests
- Update ARCHITECTURE.md placeholder table
- Add 2 P3 TODOs (CI/CD non-GitHub, auto-upgrade weak tests)
* chore: bump version and changelog (v0.6.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: make coverage audit trace actual codepaths, not just syntax patterns
Step 3.4 now instructs Claude to read full files, trace data flow through
every branch, diagram the execution, and check each branch against tests.
Phase 8e.5 regression tests now trace the bug's codepath before writing
the test, catching adjacent edge cases.
* feat: coverage audit now maps user flows, interactions, and error states
Step 3.4 now covers the full picture: code branches AND user-facing behavior.
Maps user flows (complete journey through the feature), interaction edge cases
(double-click, back button, stale state, slow connection), error states
(what does the user actually see?), and boundary states (zero results,
10k results, max-length input). Coverage diagram splits into Code Path
Coverage and User Flow Coverage sections with separate percentages.
* fix: raise test gen cap to 20, add validation tests for user flow coverage
- Raise Step 3.4 test generation cap from 10 to 20 (code + user flow combined)
- Add 3 validation tests: codepath tracing, user flow mapping, diagram sections
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: Review Readiness Dashboard + gstack-slug helper (v0.5.1) (#130)
* feat: add bin/gstack-slug helper + migrate all inline SLUG computation
Extract the opaque SLUG sed pipeline into a shared 5-line shell script.
Replace 8 inline copies across templates with eval $(gstack-slug).
Sanitizes branch names (/ → -) to prevent subdirectory creation.
* feat: review readiness dashboard — track CEO/Eng/Design reviews per branch
Each review skill logs its result to JSONL. A shared {{REVIEW_DASHBOARD}}
placeholder displays run counts, timestamps, and a CLEARED TO SHIP verdict.
/ship pre-flight reads the dashboard and prompts when reviews are missing.
* chore: bump version and changelog (v0.5.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
fix: dynamic base branch detection across all SKILL templates (v0.3.10) (#81)
* feat: add {{BASE_BRANCH_DETECT}} resolver to gen-skill-docs
DRY placeholder for dynamic base branch detection across PR-targeting
skills. Detects via gh pr view (existing PR base) → gh repo view
(repo default) → fallback to main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: ship skill detects base branch instead of hardcoding main
Replaces ~14 hardcoded 'main' references with dynamic detection via
{{BASE_BRANCH_DETECT}}. Fixes stacked branches and Conductor workspaces
targeting non-main branches. Adds --base <base> to gh pr create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review, qa, plan-ceo-review detect base branch dynamically
Same pattern as ship: replaces hardcoded 'main' with {{BASE_BRANCH_DETECT}}.
Also cleans up qa bash-isms (REPORT_DIR variable, port chaining).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: retro detects default branch instead of hardcoding origin/main
Retro queries commit history (not PR targets), so uses simpler detection:
gh repo view defaultBranchRef. Replaces ~11 origin/main refs with
origin/<default>.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit cross-step references in gstack-upgrade template
Bash blocks are self-contained, but cross-block variable references
(INSTALL_DIR from Step 2) were implicit. Adds prose making them explicit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs+test: SKILL authoring guidance + regression tests
Adds "Writing SKILL templates" section to CLAUDE.md explaining that
templates are prompts, not scripts. Adds validation test catching
hardcoded 'main' in git commands, and resolver content test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update ARCHITECTURE + CONTRIBUTING for new placeholders
Add {{BASE_BRANCH_DETECT}} to ARCHITECTURE.md placeholder list.
Cross-reference CLAUDE.md template authoring guidance from CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing blank line between resolver functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 3 E2E smoke tests for base branch detection
- /review: verifies Step 0 detection + git diff against detected base
- /ship: truncated dry-run (Steps 0-1 only, no push/PR), asserts no
destructive actions
- /retro: verifies default branch detection for git log queries
Covers the {{BASE_BRANCH_DETECT}} resolver path (review), the ship
template's dual abort check, and retro's inline detection pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: contributor mode, session awareness, recommendation format (#90)
* feat: contributor mode, session awareness, universal RECOMMENDATION format
- Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates
- Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions)
- ELI16 mode when 3+ concurrent sessions detected (re-ground user on context)
- Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/
- Universal AskUserQuestion format: context → question → RECOMMENDATION → options
- Update plan-ceo-review and plan-eng-review to reference preamble baseline
- Add vendored symlink awareness section to CLAUDE.md
- Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing
- Add tests for contributor mode and session awareness in generated output
- Add E2E eval for contributor mode report filing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add Enum & Value Completeness to /review critical checklist
New CRITICAL review category that traces new enum values, status strings,
and type constants through every consumer outside the diff. Catches the
class of bugs where a new value is added but not handled in all switch/case
chains, allowlists, or frontend-backend contracts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add CHANGELOG style guide — user-facing, sell the feature
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rewrite v0.4.1 changelog to be user-facing and sell the features
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add evals for RECOMMENDATION format, session awareness, and enum completeness
Free tests (Tier 1): RECOMMENDATION format + session awareness in all
preamble SKILL.md files, enum completeness checklist structure and CRITICAL
classification.
E2E eval: /review catches missed enum handlers when a new status value
is added but not handled in case/switch and notify methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add E2E eval for session awareness ELI16 mode
Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments
branch, verifies the output re-grounds the user with project, branch,
context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: contributor mode eval marked FAIL due to expected browse error
The test intentionally runs a nonexistent binary to trigger contributor
mode. The session runner's browse error detection catches "no such file
or directory...browse" and sets browseErrors, causing recordE2E to mark
passed=false. Override passed to check only exitReason since the browse
error is the expected scenario.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83)
* feat: browser ref staleness detection via async count() validation
resolveRef() now checks element count to detect stale refs after page
mutations (e.g. SPA navigation). RefEntry stores role+name metadata
for better diagnostics. 3 new snapshot tests for staleness detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: qa-only skill, qa fix loop, plan-to-QA artifact flow
Add /qa-only (report-only, Edit tool blocked), restructure /qa with
find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for
shared methodology. /plan-eng-review now writes test-plan artifacts
to ~/.gstack/projects/<slug>/ for QA consumption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: eval efficiency metrics — turns, duration, commentary across all surfaces
Add generateCommentary() for natural-language delta interpretation,
per-test turns/duration in comparison and summary output, judgePassed
unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0
- ARCHITECTURE: add ref staleness detection section, update RefEntry type
- BROWSER: add ref staleness paragraph to snapshot system docs
- CONTRIBUTING: update eval tool descriptions with commentary feature
- README: fix missing qa-only in project-local uninstall command
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add user-facing benefit descriptions to v0.4.0 changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61)
* fix: log non-ENOENT errors in ensureStateDir() instead of silently swallowing
Replace bare catch {} with ENOENT-only silence. Non-ENOENT errors (EACCES,
ENOSPC) are now logged to .gstack/browse-server.log. Includes test for
permission-denied scenario with chmod 444.
* feat: merge TODO.md + TODOS.md into unified backlog with shared format reference
Merge TODO.md (roadmap) and TODOS.md (near-term) into one file organized by
skill/component with P0-P4 priority ordering and Completed section. Add shared
review/TODOS-format.md for canonical format. Add static validation tests.
* feat: add 2-tier Greptile reply system with escalation detection
Add reply templates (Tier 1 friendly, Tier 2 firm), explicit escalation
detection algorithm, and severity re-ranking guidance to greptile-triage.md.
* feat: cross-skill TODOS awareness + Greptile template refs in all skills
/ship Step 5.5: auto-detect completed TODOs, offer reorganization.
/review Step 5.5: cross-reference PR against open TODOs.
/plan-ceo-review, /plan-eng-review: TODOS context in planning.
/retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode.
All Greptile-aware skills now reference reply templates and escalation detection.
* chore: bump version and changelog (v0.3.8)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update CONTRIBUTING.md for v0.3.8 changes
Clarify test tier cost table (Tier 3 standalone vs combined), add TODOS.md
to "Things to know", mention Greptile triage in ship workflow description.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Merge remote-tracking branch 'origin/main' into v0.3.6-qa-upgrades
# Conflicts:
# test/skill-e2e.test.ts
fix: browse binary discovery broken for agents (v0.3.5) (#44)
* fix: replace find-browse with direct path in SKILL.md setup blocks
Agents were skipping the find-browse binary and guessing bin/browse
(wrong path). Now the setup block explicitly checks browse/dist/browse
with workspace-local priority, global fallback.
Also adds || true to update check to prevent misleading exit code 1.
Adds {{UPDATE_CHECK}} and {{BROWSE_SETUP}} template placeholders to
gen-skill-docs.ts so all skills share a single source of truth.
* refactor: convert qa/ and setup-browser-cookies/ to .tmpl templates
Replaces hardcoded update check and find-browse blocks with
{{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders. Both skills
are now generated from templates via gen-skill-docs.
* test: add e2e and LLM eval tests for SKILL.md setup block
- 3 Agent SDK e2e tests: happy path, NEEDS_SETUP, non-git-repo
- LLM eval: setup block clarity + actionability >= 4
- New error pattern: 'no such file or directory.*browse'
These tests catch the exact failure mode where agents can't discover
the browse binary via SKILL.md instructions.
* chore: bump version and changelog (v0.3.5)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>