~cytrogen/gstack

ref: 78e519e3b763680ba483aa606d7e2cfbadb1952f gstack/scripts/eval-compare.ts -rw-r--r-- 3.0 KiB
7d266661 — Garry Tan a month ago
Merge pull request #55 from garrytan/v0.3.6-qa-upgrades

feat: E2E observability + eval infrastructure + all skills templated
ed802d0c — Garry Tan a month ago
feat: eval CLI tools + docs cleanup

Add eval:list, eval:compare, eval:summary CLI scripts for exploring
eval history from ~/.gstack-dev/evals/. eval:compare reuses the shared
comparison functions from eval-store.ts.

- eval:list: sorted table with branch/tier/cost filters
- eval:compare: thin wrapper around compareEvalResults + formatComparison
- eval:summary: aggregate stats, flaky test detection, branch rankings
- Remove unused @anthropic-ai/claude-agent-sdk from devDependencies
- Update CLAUDE.md: streaming docs, eval CLI commands, remove Agent SDK refs
- Add GH Actions eval upload (P2) and web dashboard (P3) to TODOS.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>