feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83)
* feat: browser ref staleness detection via async count() validation
resolveRef() now checks element count to detect stale refs after page
mutations (e.g. SPA navigation). RefEntry stores role+name metadata
for better diagnostics. 3 new snapshot tests for staleness detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: qa-only skill, qa fix loop, plan-to-QA artifact flow
Add /qa-only (report-only, Edit tool blocked), restructure /qa with
find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for
shared methodology. /plan-eng-review now writes test-plan artifacts
to ~/.gstack/projects/<slug>/ for QA consumption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: eval efficiency metrics — turns, duration, commentary across all surfaces
Add generateCommentary() for natural-language delta interpretation,
per-test turns/duration in comparison and summary output, judgePassed
unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0
- ARCHITECTURE: add ref staleness detection section, update RefEntry type
- BROWSER: add ref staleness paragraph to snapshot system docs
- CONTRIBUTING: update eval tool descriptions with commentary feature
- README: fix missing qa-only in project-local uninstall command
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add user-facing benefit descriptions to v0.4.0 changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: QA test plan tiers with per-page risk scoring
Rewrite qa/SKILL.md to v2.0:
- Smart test plan generation with Quick/Standard/Exhaustive tiers
- Per-page risk heuristics (forms=HIGH, CSS=LOW, tests=SKIP)
- Reports persist to ~/.gstack/projects/{slug}/qa-reports/
- QA run index with bidirectional links between reports
- Report metadata: branch, commit, PR, tier
- Auto-open preference saved to ~/.gstack/config.json
- PR comment integration via gh
- file:// link output on completion
feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29)
* Phase 2: Enhanced browser — dialog handling, upload, state checks, snapshots
- CircularBuffer O(1) ring buffer for console/network/dialog (was O(n) array+shift)
- Async buffer flush with Bun.write() (was appendFileSync)
- Dialog auto-accept/dismiss with buffer + prompt text support
- File upload command (upload <sel> <file...>)
- Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
- Annotated screenshots with ref labels overlaid (-a flag)
- Snapshot diffing against previous snapshot (-D flag)
- Cursor-interactive element scan for non-ARIA clickables (-C flag)
- Snapshot scoping depth limit (-d N flag)
- Health check with page.evaluate + 2s timeout
- Playwright error wrapping — actionable messages for AI agents
- Fix useragent — context recreation preserves cookies/storage/URLs
- wait --networkidle / --load / --domcontentloaded flags
- console --errors filter (error + warning only)
- cookie-import <json-file> with auto-fill domain from page URL
- 166 integration tests (was ~63)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Phase 2: Rewrite SKILL.md as QA playbook + command reference
Reorient SKILL.md files from raw command reference to QA-first playbook
with 10 workflow patterns (test user flows, verify deployments, dogfood
features, responsive layouts, file upload, forms, dialogs, compare pages).
Compact command reference tables at the bottom.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Phase 3: /qa skill — systematic QA testing with health scores
New /qa skill for systematic web app QA testing. Three modes:
- full: 5-10 documented issues with screenshots and repro steps
- quick: 30-second smoke test with health score
- regression: compare against saved baseline
Includes issue taxonomy (7 categories, 4 severity levels), structured
report template, health score rubric (weighted across 7 categories),
framework detection guidance (Next.js, Rails, WordPress, SPA).
Also adds browse/bin/find-browse (DRY binary discovery using git
rev-parse), .gstack/ to .gitignore, and updated TODO roadmap.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Bump to v0.3.0 — Phase 2 + Phase 3 changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: cookie-import-browser — Chromium cookie decryption module + tests
Pure logic module for reading and decrypting cookies from macOS Chromium
browsers (Comet, Chrome, Arc, Brave, Edge). Supports v10 AES-128-CBC
encryption with macOS Keychain access, PBKDF2 key derivation, and
per-browser key caching. 18 unit tests with encrypted cookie fixtures.
* feat: cookie picker web UI + route handler
Two-panel dark-theme picker served from the browse server. Left panel
shows source browser domains with search and import buttons. Right panel
shows imported domains with trash buttons. No cookie values exposed.
6 API endpoints, importedDomains Set tracking, inline clearCookies.
* feat: wire cookie-import-browser into browse server
Add cookie-picker route dispatch (no auth, localhost-only), add
cookie-import-browser to WRITE_COMMANDS and CHAIN_WRITE, add serverPort
property to BrowserManager, add write command with two modes (picker UI
vs --domain direct import), update CLI help text.
* chore: /setup-browser-cookies skill + docs (Phase 3.5)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: redact sensitive values from command output (PR #21)
type no longer echoes text (reports character count), cookie redacts
value with ****, header redacts Authorization/Cookie/X-API-Key/X-Auth-Token,
storage set drops value, forms redacts password fields. Prevents secrets
from persisting in LLM transcripts. 7 new tests.
Credit: fredluz (PR #21)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: path traversal prevention for screenshot/pdf/eval (PR #26)
Add validateOutputPath() for screenshot/pdf/responsive (restricts to
/tmp and cwd) and validateReadPath() for eval (blocks .. sequences and
absolute paths outside safe dirs). 7 new tests.
Credit: Jah-yee (PR #26)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: auto-install Playwright Chromium in setup (PR #22)
Setup now verifies Playwright can launch Chromium, and auto-installs
it via `bunx playwright install chromium` if missing. Exits non-zero
if build or Chromium launch fails.
Credit: AkbarDevop (PR #22)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: fix path validation bypass, CORS restriction, cookie-import path check
- startsWith('/tmp') matched '/tmpevil' — now requires trailing slash
- CORS Access-Control-Allow-Origin changed from * to http://127.0.0.1:<port>
- cookie-import now validates file paths (was missing validateReadPath)
- 3 new tests for prefix collision and cookie-import path traversal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review informational issues + add regression tests
- Add cookie-import to CHAIN_WRITE set for chain command routing
- Add path validation to snapshot -a -o output path
- Fix package.json version to match 0.3.1
- Use crypto.randomUUID() for temp DB paths (unpredictable filenames)
- Add regression tests for chain cookie-import and snapshot path validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add /qa, /setup-browser-cookies to README + update BROWSER.md
- Add /qa and /setup-browser-cookies to skills table, install/update/uninstall blurbs
- Add dedicated README sections for both new skills with usage examples
- Update demo workflow to show cookie import → QA → browse flow
- Update BROWSER.md: cookie import commands, new source files, test count (203)
- Update skill count from 6 to 8
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: team-aware /retro v2.0 — per-person praise and growth opportunities
- Identify current user via git config, orient narrative as "you" vs teammates
- Add per-author metrics: commits, LOC, focus areas, commit type mix, sessions
- New "Your Week" section with personal deep-dive for whoever runs the command
- New "Team Breakdown" with per-person praise and growth opportunities
- Track AI-assisted commits via Co-Authored-By trailers
- Personal + team shipping streaks
- Tone: praise like a 1:1, growth like investment advice, never compare negatively
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add Conductor parallel sessions section to README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>