~cytrogen/gstack

9eb74deb — Garry Tan a month ago
feat: inline /office-hours — no more "another window" (v0.11.3.1) (#352)

* feat: inline /office-hours invocation — no more "another window"

BENEFITS_FROM now uses read-and-follow pattern (same as /autoplan) to run
/office-hours inline. Removes handoff note save infrastructure from
plan-ceo-review template. Keeps handoff note check for backward compat.

* chore: bump version and changelog (v0.11.3.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
fdd45188 — Garry Tan a month ago
fix: gstack-slug bash compatibility — source to eval (#354)

* fix: replace source <(gstack-slug) with eval for bash compatibility

Under bash with set -euo pipefail, source <(cmd) process substitution
doesn't reliably set variables in the caller's scope. The variables
stay empty and -u (nounset) crashes the script. eval "$(cmd)" works
correctly in both bash and zsh.

Fixes: gstack-review-read, gstack-review-log, gstack-slug comment,
gen-skill-docs.ts resolver functions, and regression tests.

* chore: bump version and changelog (v0.11.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
5aee6db7 — Garry Tan a month ago
feat: Codex second opinion in /office-hours (v0.11.4.0) (#353)

* feat: add Codex second opinion to /office-hours (Phase 3.5)

New generateCodexSecondOpinion resolver that adds an opt-in cross-model
cold read between premise challenge and alternatives generation.
Codex independently reviews the session's problem statement, answers,
and premises without seeing Claude's reasoning.

Includes: temp file prompt assembly (shell injection safe), two mode-
specific prompt variants (startup/builder), cross-model synthesis,
premise revision check, and 7 unit tests.

* chore: regenerate office-hours SKILL.md files

* chore: bump version and changelog (v0.11.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4cd4d11c — Garry Tan a month ago
feat: design outside voices — cross-model design critique (v0.11.3.0) (#347)

* feat(gen-skill-docs): add design outside voices + hard rules resolvers

Add generateDesignOutsideVoices() — parallel Codex + Claude subagent
dispatch for cross-model design critique with litmus scorecard synthesis.
Branches per skillName (plan-design-review, design-review, design-consultation)
with task-specific reasoning effort (high for analytical, medium for creative).

Add generateDesignHardRules() — OpenAI Frontend Skill hard rules + gstack
AI slop blacklist unified into one shared block with classifier step
(landing page vs app UI vs hybrid).

Extract AI_SLOP_BLACKLIST constant from inline prose in generateDesignMethodology()
for DRY. Extend generateDesignReviewLite() with lightweight Codex block.
Extend generateDesignSketch() with outside voices opt-in after wireframe.

Source: OpenAI "Designing Delightful Frontends with GPT-5.4" (Mar 2026)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(design skills): add outside voices + hard rules to all design templates

Insert {{DESIGN_OUTSIDE_VOICES}} in plan-design-review (between Step 0D
and Pass 1), design-review (between Phase 6 and Phase 7), and
design-consultation (between Phase 2 and Phase 3).

Insert {{DESIGN_HARD_RULES}} in plan-design-review Pass 4 and design-review
Phase 3 checklist.

DESIGN_REVIEW_LITE in /ship and /review now includes a Codex design voice
block with litmus checks.

DESIGN_SKETCH in /office-hours now includes outside voices opt-in after
wireframe approval.

Regenerated all SKILL.md files (both Claude and Codex hosts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add resolver tests + touchfiles for design outside voices

Add 18 test cases across 4 new describe blocks:
- DESIGN_OUTSIDE_VOICES: host guard, skillName branching, reasoning effort
- DESIGN_HARD_RULES: classifier, 3 rule sets, slop blacklist, OpenAI criteria
- DESIGN_SKETCH extended: outside voices step, original wireframe preserved
- DESIGN_REVIEW_LITE extended: Codex block, codex host exclusion

Update touchfiles: add scripts/gen-skill-docs.ts to design skill E2E
test dependencies for accurate diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.3.0)

Design outside voices — parallel Codex + Claude subagent for cross-model
design critique with litmus scorecard synthesis. OpenAI hard rules + gstack
slop blacklist unified. Classifier for landing page vs app UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: generate .agents/ on demand in tests (not checked in since v0.11.2.0)

.agents/ is gitignored since v0.11.2.0 — tests that read Codex-host
SKILL.md files now generate them on demand via `bun run gen-skill-docs.ts
--host codex` before reading. Fixes test failures on fresh clones.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b7a3bf10 — Garry Tan a month ago
fix: Codex compatibility — 1024-char cap, duplicate skills, repo-local installs, kiro support (v0.11.2.0) (#346)

* fix: cap gstack skill descriptions for codex (#251)

Compresses SKILL.md.tmpl root description to <1024 chars (Codex token limit).
Adds description-length validation test. Includes /autoplan in compressed
skill list (added since PR was branched).

Co-authored-by: cweill <cweill@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: skip sidecar dir in Codex skill linking (#269)

Adds guard to skip .agents/skills/gstack in link_codex_skill_dirs() —
it's a runtime asset sidecar, not a standalone skill. Prevents duplicate
skill discovery and symlink overwriting.

Fixes #261

Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: generate .agents directory at setup time instead of shipping duplicates (#308)

Removes 14K+ lines of committed generated Codex skill files from git.
.agents/ is now gitignored and generated at setup time via
`bun run gen:skill-docs --host codex`. Updates CI workflow to validate
generation instead of checking committed file freshness.

Co-authored-by: cskwork <cskwork@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: avoid duplicate Codex skill discovery (#236)

Adds migrate_direct_codex_install() to move old direct installs from
~/.codex/skills/gstack to ~/.gstack/repos/gstack. Adds
create_codex_runtime_root() to expose only runtime assets (bin/, browse/,
review files) via symlinks instead of symlinking the entire repo.

Fixes #235

Co-authored-by: shichangs <shichangs@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: support repo-local Codex installs (#317)

Changes gen-skill-docs.ts to use dynamic $GSTACK_ROOT/$GSTACK_BIN/$GSTACK_BROWSE
variables in generated Codex preambles instead of hardcoded ~/.codex/ paths.
Renames GSTACK_DIR → SOURCE_GSTACK_DIR/INSTALL_GSTACK_DIR throughout setup for
clarity. Supports both global (~/.codex/skills/) and repo-local (.agents/skills/)
Codex installs.

Co-authored-by: pengwk <pengwk@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --host kiro support to setup script (#309)

Adds Kiro CLI as a supported agent platform. Setup detects kiro-cli,
copies+sed-rewrites SKILL.md paths from Codex/Claude to Kiro format,
and symlinks runtime assets (bin/, browse/).

Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add sidecar skip, GSTACK_ROOT, and kiro coverage (T1-T3)

Adds 3 tests identified during CEO/Eng review:
- T1: link_codex_skill_dirs() contains sidecar skip guard
- T2: generated Codex preambles use dynamic $GSTACK_ROOT paths
- T3: setup supports --host kiro with INSTALL_KIRO and sed rewrites

Also fixes existing test to expect kiro in --host case statement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: review fixes — ETHOS.md, runtime root, repo-local guard, kiro assets, upgrade paths

Paranoid 4-pass review found 7 issues, all fixed:
- Add ETHOS.md to create_codex_runtime_root
- Clean old real dirs (not just symlinks) on upgrade
- Skip runtime root for repo-local installs (prevent self-referential symlinks)
- Add review/, ETHOS.md, gstack-upgrade/ to Kiro install
- Update gstack-upgrade to detect ~/.gstack/repos/ and .agents/skills/
- Guard --host without value from silent exit
- Fix Kiro sed patterns + timeout instruction in gen-skill-docs.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.2.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove last tracked .agents/ file from git index

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: cweill <cweill@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-authored-by: cskwork <cskwork@users.noreply.github.com>
Co-authored-by: shichangs <shichangs@users.noreply.github.com>
Co-authored-by: pengwk <pengwk@users.noreply.github.com>
Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com>
264c1ca2 — Garry Tan a month ago
feat: plan files always show review status (v0.11.1.1) (#345)

* feat: plan files always show review status via preamble footer

Add Plan Status Footer to generateCompletionStatus() in the preamble.
When in plan mode before ExitPlanMode, Claude writes a GSTACK REVIEW
REPORT section to the plan file — either populated from review logs
or a "NO REVIEWS YET" placeholder. Skips if a review skill already
wrote a richer report.

* chore: bump version and changelog (v0.11.1.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
cc9e6f8f — Garry Tan a month ago
feat: /retro global — cross-project AI coding retrospective (v0.10.2.0) (#316)

* feat: gstack-global-discover — cross-tool AI session discovery

Standalone script that scans Claude Code, Codex CLI, and Gemini CLI
session directories, resolves each session's working directory to a git
repo, deduplicates by normalized remote URL, and outputs structured JSON.

- Reads only first 4-8KB of session files (avoids OOM on large transcripts)
- Only counts JSONL files modified within the time window (accurate counts)
- Week windows midnight-aligned like day windows for consistency
- 16 tests covering URL normalization, CLI behavior, and output structure

* feat: /retro global — cross-project retro using discovery engine

Adds Global Retrospective Mode to the /retro skill. When invoked as
`/retro global`, skips the repo-scoped retro and instead uses
gstack-global-discover to find all AI coding sessions across all tools,
then runs git log on each discovered repo for a unified cross-project
retrospective with global shipping streak and context-switching metrics.

* chore: bump version and changelog (v0.9.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: sync documentation with shipped changes

Update README /retro description to mention global mode.
Add bin/ directory to CLAUDE.md project structure.

* feat: /retro global adds per-project personal contributions breakdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after main merge

* chore: bump version and changelog (v0.10.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /retro global shareable personal card — screenshot-ready stats

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate Codex/agents SKILL.md for retro shareable card

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: widen retro global card — never truncate repo names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: retro global card — left border only, drop unreliable right border

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cf3582c6 — Garry Tan a month ago
fix: community security + stability fixes (wave 1) (#325)

* feat: add /cso skill — OWASP Top 10 + STRIDE security audit

* fix: harden gstack-slug against shell injection via eval

Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output
to prevent shell metacharacter injection when used with eval.

Only affects self-hosted git servers with lax naming rules — GitHub
and GitLab enforce safe characters already. Defense-in-depth.

* fix(security): sanitize gstack-slug output against shell injection

The gstack-slug script is consumed via eval $(gstack-slug) throughout
skill templates. If a git remote URL contains shell metacharacters
like $(), backticks, or semicolons, they would be executed by eval.

Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and
BRANCH before output. This preserves normal values while neutralizing
any injection payload in malicious remote URLs.

Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm
After:  eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf-

* fix(security): redact sensitive values in storage command output

The browse `storage` command dumps all localStorage and sessionStorage
as JSON. This can expose tokens, API keys, JWTs, and session credentials
in QA reports and agent transcripts.

Fix: redact values where the key matches sensitive patterns (token,
secret, key, password, auth, jwt, csrf) or the value starts with known
credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.).

Redacted values show length to aid debugging: [REDACTED — 128 chars]

* fix(browse): kill old server before restart to prevent orphaned chromium processes

When the health check fails or the server connection drops, `ensureServer()`
and `sendCommand()` would call `startServer()` without first killing the
previous server process. This left orphaned `chrome-headless-shell` renderer
processes running at ~120% CPU each.

After several reconnect cycles (e.g. pages that crash during hydration or
trigger hard navigations via `window.location.href`), dozens of zombie
chromium processes accumulate and exhaust system resources.

Fix: call `killServer()` on the stale PID before spawning a new server in
both the `ensureServer()` unhealthy path and the `sendCommand()` connection-
lost retry path.

Fixes #294

* Fix YAML linter error: nested mapping in compact sequence entries

Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations.

This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior.

* fix(security): add Azure metadata endpoint to SSRF blocklist

Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the
existing AWS/GCP endpoints. Closes the coverage gap identified in #125.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for storage redaction

Test key-based redaction (auth_token, api_key), value-based redaction
(JWT prefix, GitHub PAT prefix), pass-through for normal keys, and
length preservation in redacted output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add community PR triage process to CONTRIBUTING.md

Document the wave-based PR triage pattern used for batching community
contributions. References PR #205 (v0.8.3) as the original example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adjust test key names to avoid redaction pattern collision

Rename testKey→testData and normalKey→displayName in storage tests
to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key').
Also generate Codex variant of /cso skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.9.10.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)

Absorb Anthropic's security-review false positive filtering into /cso:
- 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only,
  regex injection, race conditions unless concrete, etc.)
- 9 precedents (React XSS-safe, env vars trusted, client-side code
  doesn't need auth, shell scripts need concrete untrusted input path)
- 8/10 confidence gate — below threshold = don't report
- Independent sub-agent verification for each finding
- Exploit scenario requirement per finding
- Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso

Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials
and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block,
removes bloated "What's new" section, adds /cso to skills table and install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage

- Exclusion #10: test files must verify not imported by non-test code
- Exclusion #13: distinguish user-message AI input from system-prompt injection
- Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): correct skill counts, add /autoplan to README tables

Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28).
Added /autoplan to specialist table. Fixed troubleshooting skills list
to include all skills added since v0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): DNS rebinding protection for SSRF blocklist

validateNavigationUrl is now async — resolves hostname to IP and checks
against blocked metadata IPs. Prevents DNS rebinding where evil.com
initially resolves to a safe IP, then switches to 169.254.169.254.
All callers updated to await. Tests updated for async assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): lockfile prevents concurrent server start races

Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent
TOCTOU race where two CLI invocations could both kill the old server and
start new ones, leaving an orphaned chromium process. Second caller now
waits for the first to finish starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): improve storage redaction — word-boundary keys + more value prefixes

Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats
_ as word char). Now correctly redacts auth_token, session_token while
skipping keyboardShortcuts, monkeyPatch, primaryKey.

Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-),
Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim

5 templates and 2 bin scripts still used eval $(gstack-slug). All now use
source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3
CHANGELOG entry that falsely claimed eval was fully eliminated — it was
the output sanitization that made it safe, not a calling convention change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): add /autoplan to install instructions, regen skill docs

The install instruction blocks and troubleshooting section were missing
/autoplan. All three skill list locations now include the complete 28-skill
set. Regenerated codex/agents SKILL.md files to match template changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.11.0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cso): add disclaimer — not a substitute for professional security audits

LLMs can miss subtle vulns and produce false negatives. For production
systems with sensitive data, hire a real firm. /cso is a first pass,
not your only line of defense. Disclaimer appended to every report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Orkun Duman <orkun1675@gmail.com>
d0300d4a — Garry Tan a month ago
fix: /autoplan — prevent analysis compression (v0.10.2.0) (#329)

* fix: prevent /autoplan from compressing review sections to one-liners

Adds explicit auto-decide contract, per-phase execution checklists,
pre-gate verification, and test review emphasis.

* chore: bump version and changelog (v0.10.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7ff0f84b — Garry Tan a month ago
feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259)

* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver

DRY extraction of the test coverage audit methodology into a shared
generator function with three explicit placeholders:
- TEST_COVERAGE_AUDIT_PLAN (plan-eng-review)
- TEST_COVERAGE_AUDIT_SHIP (ship)
- TEST_COVERAGE_AUDIT_REVIEW (review)

Shared across all modes: codepath tracing, ASCII diagram format,
quality scoring rubric, E2E test decision matrix, regression rule,
and test framework detection via CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: plan-eng-review uses shared test coverage audit

Replace the thin 6-line Section 3 test review with the full shared
methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now:
- Traces every codepath with full ASCII diagrams
- Adds missing tests to the plan (not just "check for tests")
- Writes test plan artifact for /qa consumption
- Includes E2E/eval recommendations and regression detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: ship uses shared test coverage audit

Replace 135 lines of inline Step 3.4 methodology with
{{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus:
- E2E test decision matrix (marks paths needing E2E vs unit)
- Eval recommendations for LLM prompt changes
- Regression detection iron rule
- Test framework detection via CLAUDE.md first
- Test plan artifact for /qa consumption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /review Step 4.75 test coverage diagram

Add codepath tracing to the pre-landing review via
{{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode:
- Produces ASCII coverage diagram (same methodology as plan/ship)
- Generates tests for gaps via Fix-First (ASK user)
- Subsumes Pass 2 "Test Gaps" checklist category
- Gaps are INFORMATIONAL findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: mode differentiation + regression guard for coverage audit

10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders:
- All modes share: codepath tracing, E2E matrix, regression rule
- Plan mode: adds to plan + artifact, no ship-specific content
- Ship mode: auto-generates + before/after count + coverage summary
- Review mode: Fix-First ASK + INFORMATIONAL, no artifact
- Regression guard: ship SKILL.md preserves all key phrases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: extract shared coverage audit fixture + review E2E

- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY)
- Refactor ship-coverage-audit E2E to use shared fixture
- Add review-coverage-audit E2E for Step 4.75
- Update touchfiles: both E2Es depend on shared fixture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strengthen E2E assertions for coverage audit tests

The coverage audit E2E tests (ship + review) were only asserting
exitReason === 'success' and readCalls > 0 — they passed even
if the agent produced no coverage diagram. Add assertion that
the output contains either GAP or TESTED markers.

Found during /review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: plan mode traces the plan, not the git diff

Codex adversarial review caught that plan-eng-review was inheriting
"git diff origin/<base>...HEAD" from the shared resolver, but plan mode
reviews a plan document, not a code diff. Plan mode now says:
"Trace every codepath in the plan" and "Read the plan document."

Ship and review modes keep the git diff instruction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: test coverage catalog + failure triage (merged branches) (#285)

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching

Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: test failure ownership triage — see something say something

Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle

Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options

/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.

Adds P2 TODO for gstack notes system (deferred lightweight reminder).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for Claude and Codex hosts

All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate repo mode values to prevent shell injection

Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: shell injection via branch names + feature-branch sampling bias

Codex code review found two issues:

P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.

P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.

Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: broaden codex-host stripping test to accommodate triage section

"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.

* fix: replace template placeholder in TODOS.md with readable text

{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add bin/ directory to project structure in CLAUDE.md

* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E

- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
  REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
  (gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
  pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate stale Codex SKILL.md for retro

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Split `git remote get-url origin` into a separate variable with `|| true`
so the script doesn't crash under `set -euo pipefail` in local-only repos.
Falls back to REPO_MODE=unknown gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't
catch empty output) to `source <(...) || true` followed by
`REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

math.test.js called process.exit(1) which killed the runner before
string.test.js could execute. Changed test runner to use child_process
so each test runs independently and both failure classes are exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Fall back through origin/main → origin/master → HEAD when
git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents
shortlog crash in repos where origin/HEAD isn't configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

Add assertions verifying both math.test.js (pre-existing failure) and
string.test.js (in-branch failure) actually executed during triage.
Prevents false passes where only one failure class is exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

- Remove head -20 truncation that biased solo classification by
  dropping low-volume contributors from the denominator
- Use atomic write (mktemp + mv) for cache to prevent concurrent
  preamble reads from seeing partial JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add test coverage catalog to CHANGELOG + update project structure

- CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E
  recommendations, regression iron rule, failure triage, repo-mode fix
- CLAUDE.md: add missing skill directories (autoplan, benchmark, canary,
  codex, land-and-deploy, setup-deploy) to project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.10.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG rules — branch-scoped versions, never fold into old entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
407b1569 — Garry Tan a month ago
feat: /autoplan — auto-review pipeline (v0.10.0.0) (#327)

* feat: /autoplan skill — auto-review pipeline with decision audit trail

Thin orchestrator that reads CEO, design, and eng review skills from disk
and runs them at full depth with auto-decisions using 6 encoded principles.
Surfaces taste decisions at a final approval gate.

* chore: wire /autoplan into routing, touchfiles, and validation tests

* chore: bump version and changelog (v0.10.0.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
dbd98aff — Garry Tan a month ago
feat: harden /office-hours diagnostic rigor (v0.9.9.0) (#307)

* feat: harden /office-hours diagnostic rigor — anti-sycophancy + pushback patterns

Address user feedback that gstack compromises toward founder input while
dbs-diagnosis maintains strict self-consistency. Five changes:

1. Hardened Response Posture — "direct to discomfort" replaces "not cruel"
2. Anti-Sycophancy Rules — banned phrases + evidence-based position-taking
3. Pushback Patterns — 5 worked BAD/GOOD examples for common founder evasions
4. Post-Q1 Framing Check — challenges language precision and hidden assumptions
5. Gated Escape Hatch — 2 more questions before skip, double-refusal respected

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2c0d4b39 — Garry Tan a month ago
docs: v0.9.8.0 — deploy pipeline docs + pre-merge readiness gate (#306)

* docs: v0.9.8.0 — deploy pipeline + E2E performance + pre-merge gate

CHANGELOG: added v0.9.8.0 entry covering /land-and-deploy, /canary,
/benchmark, /setup-deploy, /review perf pass, E2E model pinning,
and 3 test fixes.

README: added 4 new skills to tables and install instructions,
updated specialist/tool counts (18+7), added deploy pipeline to
"What's new" section.

/land-and-deploy: added Step 3.5 pre-merge readiness gate that
checks review dashboard, E2E results, free tests, and doc-release
status before merging. Uses AskUserQuestion for explicit confirmation.

VERSION: 0.9.7.0 → 0.9.8.0
TODOS: updated deploy pipeline to Completed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: comprehensive pre-merge readiness gate in /land-and-deploy

Step 3.5 now checks 5 dimensions before allowing merge:

1. Review staleness — compares review commit hash against HEAD,
   flags if significant code changes happened after last review
2. Tests — runs free tests inline, checks today's E2E and LLM
   eval results from ~/.gstack-dev/evals/
3. PR body accuracy — compares PR description against actual
   commits, flags missing features or stale descriptions
4. Document-release — checks if CHANGELOG/VERSION were updated
   when new features are present in the diff
5. Full readiness report — ASCII dashboard with warnings/blockers,
   explicit AskUserQuestion confirmation required before merge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
00bc482f — Garry Tan a month ago
feat: /land-and-deploy, /canary, /benchmark + perf review (v0.7.0) (#183)

* feat: add /canary, /benchmark, /land-and-deploy skills (v0.7.0)

Three new skills that close the deploy loop:
- /canary: standalone post-deploy monitoring with browse daemon
- /benchmark: performance regression detection with Web Vitals
- /land-and-deploy: merge PR, wait for deploy, canary verify production

Incorporates patterns from community PR #151.

Co-Authored-By: HMAKT99 <HMAKT99@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Performance & Bundle Impact category to review checklist

New Pass 2 (INFORMATIONAL) category catching heavy dependencies
(moment.js, lodash full), missing lazy loading, synchronous scripts,
CSS @import blocking, fetch waterfalls, and tree-shaking breaks.

Both /review and /ship automatically pick this up via checklist.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add {{DEPLOY_BOOTSTRAP}} resolver + deployed row in dashboard

- New generateDeployBootstrap() resolver auto-detects deploy platform
  (Vercel, Netlify, Fly.io, GH Actions, etc.), production URL, and
  merge method. Persists to CLAUDE.md like test bootstrap.
- Review Readiness Dashboard now shows a "Deployed" row from
  /land-and-deploy JSONL entries (informational, never gates shipping).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: mark 3 TODOs completed, bump v0.7.0, update CHANGELOG

Superseded by /land-and-deploy:
- /merge skill — review-gated PR merge
- Deploy-verify skill
- Post-deploy verification (ship + browse)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /setup-deploy skill + platform-specific deploy verification

- New /setup-deploy skill: interactive guided setup for deploy configuration.
  Detects Fly.io, Render, Vercel, Netlify, Heroku, Railway, GitHub Actions,
  and custom deploy scripts. Writes config to CLAUDE.md with custom hooks
  section for non-standard setups.

- Enhanced deploy bootstrap: platform-specific URL resolution (fly.toml app
  → {app}.fly.dev, render.yaml → {service}.onrender.com, etc.), deploy
  status commands (fly status, heroku releases), and custom deploy hooks
  section in CLAUDE.md for manual/scripted deploys.

- Platform-specific deploy verification in /land-and-deploy Step 6:
  Strategy A (GitHub Actions polling), Strategy B (platform CLI: fly/render/heroku),
  Strategy C (auto-deploy: vercel/netlify), Strategy D (custom hooks from CLAUDE.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: E2E + LLM-judge evals for deploy skills

- 4 E2E tests: land-and-deploy (Fly.io detection + deploy report),
  canary (monitoring report structure), benchmark (perf report schema),
  setup-deploy (platform detection → CLAUDE.md config)
- 4 LLM-judge evals: workflow quality for all 4 new skills
- Touchfile entries for diff-based test selection (E2E + LLM-judge)
- 460 free tests pass, 0 fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden E2E tests — server lifecycle, timeouts, preamble budget, skip flaky

Cross-cutting fixes:
- Pre-seed ~/.gstack/.completeness-intro-seen and ~/.gstack/.telemetry-prompted
  so preamble doesn't burn 3-7 turns on lake intro + telemetry in every test
- Each describe block creates its own test server instance instead of sharing
  a global that dies between suites

Test fixes (5 tests):
- /qa quick: own server instance + preamble skip
- /review SQL injection: timeout 90→180s, maxTurns 15→20, added assertion
  that review output actually mentions SQL injection
- /review design-lite: maxTurns 25→35 + preamble skip (now detects 7/7)
- ship-base-branch: both timeouts 90→150/180s + preamble skip
- plan-eng artifact: clean stale state in beforeAll, maxTurns 20→25

Skipped (4 flaky/redundant tests):
- contributor-mode: tests prompt compliance, not skill functionality
- design-consultation-research: WebSearch-dependent, redundant with core
- design-consultation-preview: redundant with core test
- /qa bootstrap: too ambitious (65 turns, installs vitest)

Also: preamble skip added to qa-only, qa-fix-loop, design-consultation-core,
and design-consultation-existing prompts. Updated touchfiles entries and
touchfiles.test.ts. Added honest comment to codex-review-findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: redesign 6 skipped/todo E2E tests + add test.concurrent support

Redesigned tests (previously skipped/todo):
- contributor-mode: pre-fail approach, 5 turns/30s (was 10 turns/90s)
- design-consultation-research: WebSearch-only, 8 turns/90s (was 45/480s)
- design-consultation-preview: preview HTML only, 8 turns/90s (was 30/480s)
- qa-bootstrap: bootstrap-only, 12 turns/90s (was 65/420s)
- /ship workflow: local bare remote, 15 turns/120s (was test.todo)
- /setup-browser-cookies: browser detection smoke, 5 turns/45s (was test.todo)

Added testConcurrentIfSelected() helper for future parallelization.
Updated touchfiles entries for all 6 re-enabled tests.

Target: 0 skip, 0 todo, 0 fail across all E2E tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: relax contributor-mode assertions — test structure not exact phrasing

* perf: enable test.concurrent for 31 independent E2E tests

Convert 18 skill-e2e, 11 routing, and 2 codex tests from sequential
to test.concurrent. Only design-consultation tests (4) remain sequential
due to shared designDir state. Expected ~6x speedup on Teams high-burst.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --concurrent flag to bun test + convert remaining 4 sequential tests

bun's test.concurrent only works within a describe block, not across
describe blocks. Adding --concurrent to the CLI command makes ALL tests
concurrent regardless of describe boundaries. Also converted the 4
design-consultation tests to concurrent (each already independent).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: split monolithic E2E test into 8 parallel files

Split test/skill-e2e.test.ts (3442 lines) into 8 category files:
- skill-e2e-browse.test.ts (7 tests)
- skill-e2e-review.test.ts (7 tests)
- skill-e2e-qa-bugs.test.ts (3 tests)
- skill-e2e-qa-workflow.test.ts (4 tests)
- skill-e2e-plan.test.ts (6 tests)
- skill-e2e-design.test.ts (7 tests)
- skill-e2e-workflow.test.ts (6 tests)
- skill-e2e-deploy.test.ts (4 tests)

Bun runs each file in its own worker = 10 parallel workers
(8 split + routing + codex). Expected: 78 min → ~12 min.

Extracted shared helpers to test/helpers/e2e-helpers.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: bump default E2E concurrency to 15

* perf: add model pinning infrastructure + rate-limit telemetry to E2E runner

Default E2E model changed from Opus to Sonnet (5x faster, 5x cheaper).
Session runner now accepts `model` option with EVALS_MODEL env var override.
Added timing telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms
to eval-store for diagnosing rate-limit impact. Added EVALS_FAST test filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 3 E2E test failures — tmpdir race, wasted turns, brittle assertions

plan-design-review-plan-mode: give each test its own tmpdir to eliminate
race condition where concurrent tests pollute each other's working directory.

ship-local-workflow: inline ship workflow steps in prompt instead of having
agent read 700+ line SKILL.md (was wasting 6 of 15 turns on file I/O).

design-consultation-core: replace exact section name matching with fuzzy
synonym-based matching (e.g. "Colors" matches "Color", "Type System"
matches "Typography"). All 7 sections still required, LLM judge still hard fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: pin quality tests to Opus, add --retry 2 and test:e2e:fast tier

~10 quality-sensitive tests (planted-bug detection, design quality judge,
strategic review, retro analysis) explicitly pinned to Opus. ~30 structure
tests default to Sonnet for 5x speed improvement.

Added --retry 2 to all E2E scripts for flaky test resilience.
Added test:e2e:fast script that excludes 8 slowest tests for quick feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: mark E2E model pinning TODO as shipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add SKILL.md merge conflict directive to CLAUDE.md

When resolving merge conflicts on generated SKILL.md files, always merge
the .tmpl templates first, then regenerate — never accept either side's
generated output directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add DEPLOY_BOOTSTRAP resolver to gen-skill-docs

The land-and-deploy template referenced {{DEPLOY_BOOTSTRAP}} but no resolver
existed, causing gen-skill-docs to fail. Added generateDeployBootstrap() that
generates the deploy config detection bash block (check CLAUDE.md for persisted
config, auto-detect platform from config files, detect deploy workflows).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after DEPLOY_BOOTSTRAP fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move prompt temp file outside workingDirectory to prevent race condition

The .prompt-tmp file was written inside workingDirectory, which gets deleted
by afterAll cleanup. With --concurrent --retry, afterAll can interleave with
retries, causing "No such file or directory" crashes at 0s (seen in
review-design-lite and office-hours-spec-review).

Fix: write prompt file to os.tmpdir() with a unique suffix so it survives
directory cleanup. Also convert review-design-lite from describeE2E to
describeIfSelected for proper diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --retry 2 --concurrent flags to test:evals scripts for consistency

test:evals and test:evals:all were missing the retry and concurrency flags
that test:e2e already had, causing inconsistent behavior between the two
script families.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: HMAKT99 <HMAKT99@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8321115a — Garry Tan a month ago
feat: plan file review report + enriched JSONL logging (v0.9.7.0) (#303)

* feat: plan file review report — markdown table appended to plan files

Adds {{PLAN_FILE_REVIEW_REPORT}} template resolver that instructs review
skills to write a structured markdown table (with Trigger/Why/Status/Findings
columns) to the plan file itself, so review status is visible to anyone
reading the plan — not just in conversation output.

Integrated into plan-ceo-review, plan-eng-review, plan-design-review, and
codex skill templates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: enrich JSONL review logs for accurate plan file report

CEO reviews now log scope_proposed/accepted/deferred counts,
eng reviews log total issues_found, design reviews log initial_score
for before→after tracking, and codex reviews log findings_fixed.

Report generator references these fields directly instead of
requiring agents to reconstruct from partial data. Also fixes
footer replacement to handle mid-file sections robustly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6c69febf — Garry Tan a month ago
feat: auto-scaled adversarial review (v0.9.5.0) (#297)

* feat: auto-scaled adversarial review by diff size

Replace config-driven Codex review step with automatic adversarial review
that scales by diff size: small (<50 lines) skips adversarial, medium
(50-199) gets cross-model adversarial, large (200+) gets all 4 passes.

Adds Claude adversarial subagent as fallback when Codex unavailable.
Review log uses new "adversarial-review" skill name with source/tier fields.
Dashboard updated to read both adversarial-review and legacy codex-review.

* chore: regenerate SKILL.md files for auto-scaled adversarial review

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: allow user override of adversarial review tier

Users can now say "paranoid review", "run all passes", "full adversarial",
etc. to force the large tier regardless of diff size.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
f075cb75 — Garry Tan a month ago
feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298)

* feat: ETHOS.md — gstack builder philosophy

Standalone document capturing the four principles: The Golden Age,
Boil the Lake, Search Before Building, and Build for Yourself.

Introduces the three-layer knowledge framework (tried-and-true,
new-and-popular, first-principles) and the Eureka Moment concept —
when first-principles reasoning reveals conventional wisdom is wrong.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Search Before Building preamble section + CLAUDE.md

Add generateSearchBeforeBuildingSection(ctx) to gen-skill-docs.ts.
Every workflow skill now gets a compact router section covering:
- Three layers of knowledge (tried-and-true, new-and-popular, first-principles)
- Eureka moment format and jq-based JSONL logging
- WebSearch fallback clause
- ETHOS.md reference via ctx.paths.skillRoot resolver

Also adds compact "Search before building" section to CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: skill-specific Search Before Building integrations

8 template changes:
- /office-hours: Phase 2.75 Landscape Awareness (WebSearch + three-layer synthesis)
- /plan-eng-review: Step 0 search check with layer provenance annotations
- /investigate: external pattern search + search escalation on hypothesis failure
- /plan-ceo-review: Landscape Check before scope challenge
- /review: search-before-recommending for fix patterns
- /qa-only: WebSearch in allowed-tools
- /design-consultation: three-layer synthesis backport in Phase 2 Step 3
- /retro: eureka moment tracking from ~/.gstack/analytics/eureka.jsonl

All search steps include WebSearch fallback clause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: v0.9.5.0 — Builder Ethos (CHANGELOG + VERSION + TODOS)

ETHOS.md + Search Before Building across all workflow skills.
Deferred: first-time intro flow (blocked on blog post).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Codex review — sanitize search, privacy gate, ETHOS.md sidecar

Three fixes from adversarial Codex review:
- /investigate: sanitize error messages before searching (strip hostnames,
  IPs, file paths, SQL, customer data). Skip search if unsanitizable.
- /office-hours: add privacy gate before landscape search. Use generalized
  category terms, never the user's specific product name or stealth idea.
- setup: link ETHOS.md into .agents/skills/gstack/ sidecar so workspace-
  local Codex sessions can find the builder philosophy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sanitize Phase 2 external pattern search in /investigate

The Phase 2 external search also sent raw error messages to WebSearch.
Apply same sanitization rule as Phase 3 search escalation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: sync documentation with shipped changes

- ARCHITECTURE.md: preamble now handles 5 things (add Search Before Building)
- CLAUDE.md: add ETHOS.md to project structure tree
- README.md: add ETHOS.md to docs table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
709bed9f — Garry Tan a month ago
feat: CEO review handoff context for /office-hours chaining (v0.9.5.0) (#288)

* feat: CEO review saves handoff context for /office-hours chaining

When /plan-ceo-review suggests running /office-hours, it now saves a
handoff note with system audit findings and discussion context. On
re-invocation, the note is auto-discovered and used to avoid redundant
questions. Addresses Codex review feedback: no step tracking or resume
logic — just context as additional input (same pattern as design docs).

* fix: remove PR size nagging from /retro codex variant

Sync the Codex SKILL.md variant with the retro template changes from
v0.9.4.1 — removes "flag these with file counts" for XL PRs and
reframes PR size discussion as neutral data.

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1f4b6fd7 — Garry Tan a month ago
fix: remove PR size nagging from /retro (v0.9.4.1) (#264)

The retro template no longer flags XL PRs as problems or recommends
splitting them. PR size distribution is still reported as neutral data.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9811ed37 — Garry Tan a month ago
feat: default codex reviews in /ship and /review (v0.9.4.0) (#256)

* feat: default codex reviews in /ship and /review with xhigh reasoning

Codex code reviews are now opt-in-once-then-always-on via a one-time
adoption prompt. When enabled, both review + adversarial run automatically
on every /ship and /review — no more choosing between them.

Key changes:
- New {{CODEX_REVIEW_STEP}} resolver centralizes Codex review logic (DRY)
- Three-state config: enabled/not-set/disabled via gstack-config
- P1 findings default to "Investigate and fix" instead of "Ship anyway"
- All reasoning bumped to xhigh (review, adversarial, consult)
- Codex review step stripped from codex-host variants (no self-invocation)
- Ship "Never ask" rule updated to accurately list quality-gate stops
- Error handling for auth, timeout, empty response (all non-blocking)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update touchfiles test for plan-ceo-review-benefits dependency

The merge from main added plan-ceo-review-benefits to E2E_TOUCHFILES,
which means plan-ceo-review/SKILL.md now selects 3 tests, not 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: default codex reviews in /ship and /review (v0.9.4.0)

Codex code reviews now run automatically — both review + adversarial
challenge — with a one-time opt-in prompt for new users. All modes use
xhigh reasoning. Codex-host builds strip the step to prevent recursion.

Fixes from Codex review: TMPERR properly defined, stderr captured for
both review and adversarial, error handling before log persist, commit
hash included in review log for staleness tracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Next