feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142)
* feat: gstack-diff-scope helper + design review checklist
bin/gstack-diff-scope categorizes branch changes into SCOPE_FRONTEND,
SCOPE_BACKEND, SCOPE_PROMPTS, SCOPE_TESTS, SCOPE_DOCS, SCOPE_CONFIG.
review/design-checklist.md is a 20-item code-level checklist with
HIGH/MEDIUM/LOW confidence tags for detecting design anti-patterns
from source code.
* feat: integrate design review lite into /review and /ship
Add generateDesignReviewLite() resolver, insert {{DESIGN_REVIEW_LITE}}
partial in review Step 4.5 and ship Step 3.5. Update dashboard to
recognize design-review-lite entries. Ship pre-flight uses
gstack-diff-scope for smarter design review recommendations.
* test: E2E eval for design review lite detection
Planted CSS/HTML fixtures with 7 design anti-patterns. E2E test
verifies /review catches >= 4 of 7 (Papyrus font, 14px body text,
outline:none, !important, purple gradient, generic hero copy,
3-column feature grid).
* chore: bump version and changelog (v0.6.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: diff-based test selection for E2E and LLM-judge evals (v0.6.1.0) (#139)
* feat: diff-based test selection for E2E and LLM-judge evals
Each test declares file dependencies in a TOUCHFILES map. The test runner
checks git diff against the base branch and only runs tests whose
dependencies were modified. Global touchfiles (session-runner, eval-store,
gen-skill-docs) trigger all tests.
New scripts: test:e2e:all, test:evals:all, eval:select
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.6.1.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: plan-design-review-audit eval — bump turns to 30, add efficiency hints
The test was flaky at 20 turns because the agent reads a 300-line SKILL.md,
navigates, extracts design data, and writes a report. Added hints to skip
preamble/batch commands/write early while still testing the real SKILL.md.
Now completes in ~13 turns consistently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>