feat: natural language skill routing + proactive suggestions (v0.7.1) (#195)
* feat: add trigger phrases to /debug and /office-hours
These two skills had zero "Use when asked to..." phrases, making them
completely invisible to natural language. Users saying "debug this" or
"brainstorm an idea" would get no skill invocation.
* feat: add proactive triggers to all workflow skills
Every skill now has "Proactively suggest when..." language so Claude
surfaces skills at natural moments — not just when the user says
specific trigger phrases.
* feat: lifecycle map + proactive preference system
Root gstack description now includes a developer workflow guide mapping
12 stages to skills. Preamble reads proactive preference via gstack-config.
Users can opt out with "stop suggesting things" and re-enable with
"be proactive again" — natural language toggle, no CLI needed.
* test: 11 journey-stage E2E routing tests + trigger phrase validation
Each test simulates a real development stage (ideation, plan review,
debug, QA, ship, retro...) with realistic project context and verifies
the right skill fires from natural language alone. 11/11 pass.
* chore: bump version and changelog (v0.7.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149)
* refactor: rename qa-design-review → design-review
The "qa-" prefix was confusing — this is the live-site design audit with
fix loop, not a QA-only report. Rename directory and update all references
across docs, tests, scripts, and skill templates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: interactive /plan-design-review + CEO invokes designer
Rewrite /plan-design-review from report-only grading to an interactive
plan-fixer that rates each design dimension 0-10, explains what a 10
looks like, and edits the plan to get there. Parallel structure with
/plan-ceo-review and /plan-eng-review — one issue = one AskUserQuestion.
CEO review now detects UI scope and invokes the designer perspective
when the plan has frontend/UX work, so you get design review
automatically when it matters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: validation + touchfile entries for 100% coverage
Add design-consultation to command/snapshot flag validation. Add 4
skills to contributor mode validation (plan-design-review,
design-review, design-consultation, document-release). Add 2 templates
to hardcoded branch check. Register touchfile entries for 10 new
LLM-judge tests and 1 new E2E test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: LLM-judge for 10 skills + gstack-upgrade E2E
Add LLM-judge quality evals for all uncovered skills using a DRY
runWorkflowJudge helper with section marker guards. Add real E2E
test for gstack-upgrade using mock git remote (replaces test.todo).
Add plan-edit assertion to plan-design-review E2E.
14/15 skills now at full coverage. setup-browser-cookies remains
deferred (needs real browser).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add bisect commit style to CLAUDE.md
All commits should be single logical changes, split before pushing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.6.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142)
* feat: gstack-diff-scope helper + design review checklist
bin/gstack-diff-scope categorizes branch changes into SCOPE_FRONTEND,
SCOPE_BACKEND, SCOPE_PROMPTS, SCOPE_TESTS, SCOPE_DOCS, SCOPE_CONFIG.
review/design-checklist.md is a 20-item code-level checklist with
HIGH/MEDIUM/LOW confidence tags for detecting design anti-patterns
from source code.
* feat: integrate design review lite into /review and /ship
Add generateDesignReviewLite() resolver, insert {{DESIGN_REVIEW_LITE}}
partial in review Step 4.5 and ship Step 3.5. Update dashboard to
recognize design-review-lite entries. Ship pre-flight uses
gstack-diff-scope for smarter design review recommendations.
* test: E2E eval for design review lite detection
Planted CSS/HTML fixtures with 7 design anti-patterns. E2E test
verifies /review catches >= 4 of 7 (Papyrus font, 14px body text,
outline:none, !important, purple gradient, generic hero copy,
3-column feature grid).
* chore: bump version and changelog (v0.6.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: diff-based test selection for E2E and LLM-judge evals (v0.6.1.0) (#139)
* feat: diff-based test selection for E2E and LLM-judge evals
Each test declares file dependencies in a TOUCHFILES map. The test runner
checks git diff against the base branch and only runs tests whose
dependencies were modified. Global touchfiles (session-runner, eval-store,
gen-skill-docs) trigger all tests.
New scripts: test:e2e:all, test:evals:all, eval:select
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.6.1.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: plan-design-review-audit eval — bump turns to 30, add efficiency hints
The test was flaky at 20 turns because the agent reads a 300-line SKILL.md,
navigates, extracts design data, and writes a report. Added hints to skip
preamble/batch commands/write early while still testing the real SKILL.md.
Now completes in ~13 turns consistently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>