feat: test coverage gate + plan completion audit + auto-verification (v0.11.13.0) (#428) * feat: test coverage gate + plan completion audit + auto-verification Three new gates in /ship and /review: 1. Test coverage gate: configurable thresholds (60%/80% default), hard stop below minimum with user override 2. Plan completion audit: discovers plan file, extracts actionable items, cross-references against diff, gates on NOT DONE items 3. Auto-verification: invokes /qa-only inline with plan's verification section, conditional on localhost reachability Also: coverage warning in /review, plan completion data in /retro, shared plan file discovery helper (DRY), ship metrics logging. * chore: regenerate SKILL.md files * chore: bump version and changelog (v0.11.13.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: remove trigger guard + proactive opt-out prompt (#457) * fix: telemetry source tagging + duration guards Add --source, --error-message, --failed-step flags to gstack-telemetry-log. Source tagging (live vs test via GSTACK_TELEMETRY_SOURCE env) prevents E2E tests from polluting production data. Duration guards cap unreasonable values (>24h or negative → null). Partial cherry-pick from garrytan/community-mode — non-breaking parts only. Skips install_fingerprint rename (needs schema migration). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: remove trigger guard + proactive opt-out prompt Remove "MANUAL TRIGGER ONLY" injection from all skill descriptions. This frees 59 chars per skill from the 1024-char Codex description budget and lets skills auto-fire based on semantic matching. Merge auto-fire control into the existing `proactive` setting — when false, Claude won't auto-invoke skills or suggest them. Users are prompted once about this preference (chains after the telemetry prompt, fires on second skill run). Also trims the root gstack description by removing the skill catalog (already in the body), saving ~500 chars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.11.16.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425) * refactor: extract gen-skill-docs into modular resolver architecture Break the 3000-line monolith into 10 domain modules under scripts/resolvers/: types, constants, preamble, utility, browse, design, testing, review, codex-helpers, and index. Each module owns one domain of template generation. The preamble module introduces a 4-tier composition system (T1-T4) so skills only pay for the preamble sections they actually need, reducing token usage for lightweight skills by ~40%. Adds a token budget dashboard that prints after every generation run showing per-skill and total token counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: tiered preamble — skills only pay for what they use Tag all 23 templates with preamble-tier (T1-T4). Lightweight skills like /browse and /benchmark get a minimal preamble (~40% fewer tokens), while review skills get the full stack. Regenerate all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: migrate eval storage to project-scoped paths Move eval results and E2E run artifacts from ~/.gstack-dev/evals/ to ~/.gstack/projects/$SLUG/evals/ so each project's eval history lives alongside its other gstack data. Falls back to legacy path if slug detection fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION after merge Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add WorktreeManager for isolated test environments Reusable platform module (lib/worktree.ts) that creates git worktrees for test isolation and harvests useful changes as patches. Includes SHA-256 dedup, original SHA tracking for committed change detection, and automatic gitignored artifact copying (.agents/, browse/dist/). 12 unit tests covering lifecycle, harvest, dedup, and error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate worktree isolation into E2E test infrastructure Add createTestWorktree(), harvestAndCleanup(), and describeWithWorktree() helpers to e2e-helpers.ts. Add harvest field to EvalTestEntry for eval-store integration. Register lib/worktree.ts as a global touchfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: run Gemini and Codex E2E tests in worktrees Switch both test suites from cwd: ROOT to worktree isolation. Gemini (--yolo) no longer pollutes the working tree. Codex (read-only) gets worktree for consistency. Useful changes are harvested as patches for cherry-picking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: skip symlinks in copyDirSync to prevent infinite recursion Adversarial review caught that .claude/skills/gstack may be a symlink back to the repo root, causing copyDirSync to recurse infinitely when copying gitignored artifacts into worktrees. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.11.12.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: relax session-awareness assertion to accept structured options The LLM consistently presents well-formatted A/B choices with pros/cons but doesn't always use the exact string "RECOMMENDATION". Accept case-insensitive "recommend", "option a", "which do you want", or "which approach" as equivalent signals of a structured recommendation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>