~cytrogen/gstack (5155fe3a2883b4eecb50b77664224b1e86a5a9bc): TODO.md

#TODO — gstack roadmap

[x] Annotated screenshots (--annotate flag, ref labels overlaid on screenshot)
[x] Snapshot diffing (--diff flag, unified diff against previous snapshot)
[x] Dialog handling (auto-accept/dismiss, dialog buffer, prevents browser lockup)
[x] File upload (upload )
[x] Cursor-interactive elements (-C flag, cursor:pointer/onclick/tabindex scan)
[x] Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
[x] CircularBuffer — O(1) ring buffer for console/network/dialog (was O(n) array+shift)
[x] Async buffer flush with Bun.write() (was appendFileSync)
[x] Health check with page.evaluate('1') + 2s timeout
[x] Playwright error wrapping — actionable messages for AI agents
[x] Fix useragent — context recreation preserves cookies/storage/URLs
[x] DRY: getCleanText exported, command sets in chain updated
[x] 148 integration tests (was ~63)

[x] /qa SKILL.md — 6-phase workflow: Initialize → Authenticate → Orient → Explore → Document → Wrap up
[x] Issue taxonomy reference (7 categories: visual, functional, UX, content, performance, console, accessibility)
[x] Severity classification (critical/high/medium/low)
[x] Exploration checklist per page
[x] Report template (structured markdown with per-issue evidence)
[x] Repro-first philosophy: every issue gets evidence before moving on
[x] Two evidence tiers: interactive bugs (multi-step screenshots), static bugs (single annotated screenshot)
[x] Key guidance: 5-10 well-documented issues per session, depth over breadth, write incrementally
[x] Three modes: full (systematic), quick (30-second smoke test), regression (compare against baseline)
[x] Framework detection guidance (Next.js, Rails, WordPress, SPA)
[x] Health score rubric (7 categories, weighted average)
[x] wait --networkidle / wait --load / wait --domcontentloaded
[x] console --errors (filter to error/warning only)
[x] cookie-import <json-file> (bulk cookie import with auto-fill domain)
[x] browse/bin/find-browse (DRY binary discovery across skills)
[ ] Video recording (deferred to Phase 5 — recreateContext destroys page state)

[ ] Bundle server.ts into compiled binary (eliminate resolveServerScript() fallback chain entirely) (P2, M)
[ ] v20 encryption format support (AES-256-GCM) — future Chromium versions may change from v10
[ ] Sessions (isolated browser instances with separate cookies/storage/history)
[ ] State persistence (save/load cookies + localStorage to JSON files)
[ ] Auth vault (encrypted credential storage, referenced by name, LLM never sees passwords)
[ ] Video recording (record start/stop — needs sessions for clean context lifecycle)
[ ] retro + browse: deployment health tracking
- Screenshot production state
- Check perf metrics (page load times)
- Count console errors across key pages
- Track trends over retro window

[ ] Linux/Windows cookie decryption (GNOME Keyring / kwallet / DPAPI)
[ ] Trend tracking across QA runs — compare baseline.json over time, detect regressions (P2, S)
[ ] CI/CD integration — /qa as GitHub Action step, fail PR if health score drops (P2, M)
[ ] Accessibility audit mode — --a11y flag for focused accessibility testing (P3, S)
[ ] Greptile training feedback loop — export suppression patterns to Greptile team for model improvement (P3, S)
[ ] E2E test cost tracking — track cumulative API spend, warn if over threshold (P3, S)
[ ] E2E model pinning — pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM (P2, XS)
[ ] Smart default QA tier — after a few runs, check index.md for user's usual tier pick, skip the question (P2, S)

Browser is the nervous system — every skill should be able to see, interact with, and verify the web
Skills are the product; the browser enables them
One repo, one install, entire AI engineering workflow
Bun compiled binary matches Rust CLI performance for this use case (bottleneck is Chromium, not CLI parsing)
Accessibility tree snapshots use ~200-400 tokens vs ~3000-5000 for full DOM — critical for AI context efficiency
Locator map approach for refs: store Map<string, Locator> on BrowserManager, no DOM mutation, no CSP issues
Snapshot scoping (-i, -c, -d, -s flags) is critical for performance on large pages
All new commands follow existing pattern: add to command set, add switch case, return string