~cytrogen/gstack

ref: fdd45188ff0534bec500d4d7eb6f999a310f6a61 gstack/design-consultation d---------
fdd45188 — Garry Tan a month ago
fix: gstack-slug bash compatibility — source to eval (#354)

* fix: replace source <(gstack-slug) with eval for bash compatibility

Under bash with set -euo pipefail, source <(cmd) process substitution
doesn't reliably set variables in the caller's scope. The variables
stay empty and -u (nounset) crashes the script. eval "$(cmd)" works
correctly in both bash and zsh.

Fixes: gstack-review-read, gstack-review-log, gstack-slug comment,
gen-skill-docs.ts resolver functions, and regression tests.

* chore: bump version and changelog (v0.11.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4cd4d11c — Garry Tan a month ago
feat: design outside voices — cross-model design critique (v0.11.3.0) (#347)

* feat(gen-skill-docs): add design outside voices + hard rules resolvers

Add generateDesignOutsideVoices() — parallel Codex + Claude subagent
dispatch for cross-model design critique with litmus scorecard synthesis.
Branches per skillName (plan-design-review, design-review, design-consultation)
with task-specific reasoning effort (high for analytical, medium for creative).

Add generateDesignHardRules() — OpenAI Frontend Skill hard rules + gstack
AI slop blacklist unified into one shared block with classifier step
(landing page vs app UI vs hybrid).

Extract AI_SLOP_BLACKLIST constant from inline prose in generateDesignMethodology()
for DRY. Extend generateDesignReviewLite() with lightweight Codex block.
Extend generateDesignSketch() with outside voices opt-in after wireframe.

Source: OpenAI "Designing Delightful Frontends with GPT-5.4" (Mar 2026)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(design skills): add outside voices + hard rules to all design templates

Insert {{DESIGN_OUTSIDE_VOICES}} in plan-design-review (between Step 0D
and Pass 1), design-review (between Phase 6 and Phase 7), and
design-consultation (between Phase 2 and Phase 3).

Insert {{DESIGN_HARD_RULES}} in plan-design-review Pass 4 and design-review
Phase 3 checklist.

DESIGN_REVIEW_LITE in /ship and /review now includes a Codex design voice
block with litmus checks.

DESIGN_SKETCH in /office-hours now includes outside voices opt-in after
wireframe approval.

Regenerated all SKILL.md files (both Claude and Codex hosts).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add resolver tests + touchfiles for design outside voices

Add 18 test cases across 4 new describe blocks:
- DESIGN_OUTSIDE_VOICES: host guard, skillName branching, reasoning effort
- DESIGN_HARD_RULES: classifier, 3 rule sets, slop blacklist, OpenAI criteria
- DESIGN_SKETCH extended: outside voices step, original wireframe preserved
- DESIGN_REVIEW_LITE extended: Codex block, codex host exclusion

Update touchfiles: add scripts/gen-skill-docs.ts to design skill E2E
test dependencies for accurate diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.3.0)

Design outside voices — parallel Codex + Claude subagent for cross-model
design critique with litmus scorecard synthesis. OpenAI hard rules + gstack
slop blacklist unified. Classifier for landing page vs app UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: generate .agents/ on demand in tests (not checked in since v0.11.2.0)

.agents/ is gitignored since v0.11.2.0 — tests that read Codex-host
SKILL.md files now generate them on demand via `bun run gen-skill-docs.ts
--host codex` before reading. Fixes test failures on fresh clones.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
264c1ca2 — Garry Tan a month ago
feat: plan files always show review status (v0.11.1.1) (#345)

* feat: plan files always show review status via preamble footer

Add Plan Status Footer to generateCompletionStatus() in the preamble.
When in plan mode before ExitPlanMode, Claude writes a GSTACK REVIEW
REPORT section to the plan file — either populated from review logs
or a "NO REVIEWS YET" placeholder. Skips if a review skill already
wrote a richer report.

* chore: bump version and changelog (v0.11.1.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7ff0f84b — Garry Tan a month ago
feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259)

* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver

DRY extraction of the test coverage audit methodology into a shared
generator function with three explicit placeholders:
- TEST_COVERAGE_AUDIT_PLAN (plan-eng-review)
- TEST_COVERAGE_AUDIT_SHIP (ship)
- TEST_COVERAGE_AUDIT_REVIEW (review)

Shared across all modes: codepath tracing, ASCII diagram format,
quality scoring rubric, E2E test decision matrix, regression rule,
and test framework detection via CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: plan-eng-review uses shared test coverage audit

Replace the thin 6-line Section 3 test review with the full shared
methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now:
- Traces every codepath with full ASCII diagrams
- Adds missing tests to the plan (not just "check for tests")
- Writes test plan artifact for /qa consumption
- Includes E2E/eval recommendations and regression detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: ship uses shared test coverage audit

Replace 135 lines of inline Step 3.4 methodology with
{{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus:
- E2E test decision matrix (marks paths needing E2E vs unit)
- Eval recommendations for LLM prompt changes
- Regression detection iron rule
- Test framework detection via CLAUDE.md first
- Test plan artifact for /qa consumption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /review Step 4.75 test coverage diagram

Add codepath tracing to the pre-landing review via
{{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode:
- Produces ASCII coverage diagram (same methodology as plan/ship)
- Generates tests for gaps via Fix-First (ASK user)
- Subsumes Pass 2 "Test Gaps" checklist category
- Gaps are INFORMATIONAL findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: mode differentiation + regression guard for coverage audit

10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders:
- All modes share: codepath tracing, E2E matrix, regression rule
- Plan mode: adds to plan + artifact, no ship-specific content
- Ship mode: auto-generates + before/after count + coverage summary
- Review mode: Fix-First ASK + INFORMATIONAL, no artifact
- Regression guard: ship SKILL.md preserves all key phrases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: extract shared coverage audit fixture + review E2E

- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY)
- Refactor ship-coverage-audit E2E to use shared fixture
- Add review-coverage-audit E2E for Step 4.75
- Update touchfiles: both E2Es depend on shared fixture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strengthen E2E assertions for coverage audit tests

The coverage audit E2E tests (ship + review) were only asserting
exitReason === 'success' and readCalls > 0 — they passed even
if the agent produced no coverage diagram. Add assertion that
the output contains either GAP or TESTED markers.

Found during /review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: plan mode traces the plan, not the git diff

Codex adversarial review caught that plan-eng-review was inheriting
"git diff origin/<base>...HEAD" from the shared resolver, but plan mode
reviews a plan document, not a code diff. Plan mode now says:
"Trace every codepath in the plan" and "Read the plan document."

Ship and review modes keep the git diff instruction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: test coverage catalog + failure triage (merged branches) (#285)

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching

Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: test failure ownership triage — see something say something

Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle

Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options

/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.

Adds P2 TODO for gstack notes system (deferred lightweight reminder).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for Claude and Codex hosts

All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate repo mode values to prevent shell injection

Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: shell injection via branch names + feature-branch sampling bias

Codex code review found two issues:

P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.

P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.

Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: broaden codex-host stripping test to accommodate triage section

"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.

* fix: replace template placeholder in TODOS.md with readable text

{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add bin/ directory to project structure in CLAUDE.md

* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E

- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
  REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
  (gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
  pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate stale Codex SKILL.md for retro

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Split `git remote get-url origin` into a separate variable with `|| true`
so the script doesn't crash under `set -euo pipefail` in local-only repos.
Falls back to REPO_MODE=unknown gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't
catch empty output) to `source <(...) || true` followed by
`REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

math.test.js called process.exit(1) which killed the runner before
string.test.js could execute. Changed test runner to use child_process
so each test runs independently and both failure classes are exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Fall back through origin/main → origin/master → HEAD when
git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents
shortlog crash in repos where origin/HEAD isn't configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

Add assertions verifying both math.test.js (pre-existing failure) and
string.test.js (in-branch failure) actually executed during triage.
Prevents false passes where only one failure class is exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

- Remove head -20 truncation that biased solo classification by
  dropping low-volume contributors from the denominator
- Use atomic write (mktemp + mv) for cache to prevent concurrent
  preamble reads from seeing partial JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add test coverage catalog to CHANGELOG + update project structure

- CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E
  recommendations, regression iron rule, failure triage, repo-mode fix
- CLAUDE.md: add missing skill directories (autoplan, benchmark, canary,
  codex, land-and-deploy, setup-deploy) to project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.10.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG rules — branch-scoped versions, never fold into old entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f075cb75 — Garry Tan a month ago
feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298)

* feat: ETHOS.md — gstack builder philosophy

Standalone document capturing the four principles: The Golden Age,
Boil the Lake, Search Before Building, and Build for Yourself.

Introduces the three-layer knowledge framework (tried-and-true,
new-and-popular, first-principles) and the Eureka Moment concept —
when first-principles reasoning reveals conventional wisdom is wrong.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Search Before Building preamble section + CLAUDE.md

Add generateSearchBeforeBuildingSection(ctx) to gen-skill-docs.ts.
Every workflow skill now gets a compact router section covering:
- Three layers of knowledge (tried-and-true, new-and-popular, first-principles)
- Eureka moment format and jq-based JSONL logging
- WebSearch fallback clause
- ETHOS.md reference via ctx.paths.skillRoot resolver

Also adds compact "Search before building" section to CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: skill-specific Search Before Building integrations

8 template changes:
- /office-hours: Phase 2.75 Landscape Awareness (WebSearch + three-layer synthesis)
- /plan-eng-review: Step 0 search check with layer provenance annotations
- /investigate: external pattern search + search escalation on hypothesis failure
- /plan-ceo-review: Landscape Check before scope challenge
- /review: search-before-recommending for fix patterns
- /qa-only: WebSearch in allowed-tools
- /design-consultation: three-layer synthesis backport in Phase 2 Step 3
- /retro: eureka moment tracking from ~/.gstack/analytics/eureka.jsonl

All search steps include WebSearch fallback clause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: v0.9.5.0 — Builder Ethos (CHANGELOG + VERSION + TODOS)

ETHOS.md + Search Before Building across all workflow skills.
Deferred: first-time intro flow (blocked on blog post).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Codex review — sanitize search, privacy gate, ETHOS.md sidecar

Three fixes from adversarial Codex review:
- /investigate: sanitize error messages before searching (strip hostnames,
  IPs, file paths, SQL, customer data). Skip search if unsanitizable.
- /office-hours: add privacy gate before landscape search. Use generalized
  category terms, never the user's specific product name or stealth idea.
- setup: link ETHOS.md into .agents/skills/gstack/ sidecar so workspace-
  local Codex sessions can find the builder philosophy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sanitize Phase 2 external pattern search in /investigate

The Phase 2 external search also sent raw error messages to WebSearch.
Apply same sanitization rule as Phase 3 search escalation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: sync documentation with shipped changes

- ARCHITECTURE.md: preamble now handles 5 things (add Search Before Building)
- CLAUDE.md: add ETHOS.md to project structure tree
- README.md: add ETHOS.md to docs table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
91bea066 — Garry Tan a month ago
fix: plan mode exception for review log + telemetry writes (v0.9.0.1) (#234)

* fix: plan mode exception for review log + telemetry writes

Add explicit plan-mode exception notes to review log sections in all
3 plan review skill templates and the telemetry section in gen-skill-docs.ts.
When Claude runs in plan mode, it self-censors bash writes — but review
logging and telemetry write to ~/.gstack/ (user metadata, not project
files). The preamble already writes to the same directory successfully.
The exception note gives Claude a reasoning chain: safety argument,
precedent, and consequence of skipping.

* chore: regenerate Codex/agents SKILL.md files with plan-mode exception

* chore: bump version and changelog (v0.9.0.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: community-first telemetry opt-in with anonymous fallback

Default opt-in is now "Help gstack get better!" (community mode with
stable device ID). If declined, offers anonymous mode as a softer
alternative before fully off.

* chore: regenerate SKILL.md files with community-first telemetry prompt

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
3b22fc39 — Garry Tan a month ago
feat: opt-in usage telemetry + community intelligence platform (v0.8.6) (#210)

* feat: add gstack-telemetry-log and gstack-analytics scripts

Local telemetry infrastructure for gstack usage tracking.
gstack-telemetry-log appends JSONL events with skill name, duration,
outcome, session ID, and platform info. Supports off/anonymous/community
privacy tiers. gstack-analytics renders a personal usage dashboard
from local data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add telemetry preamble injection + opt-in prompt + epilogue

Extends generatePreamble() with telemetry start block (config read,
timer, session ID, .pending marker), opt-in prompt (gated by
.telemetry-prompted), and epilogue instructions for Claude to log
events after skill completion. Adds 5 telemetry tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate all SKILL.md files with telemetry blocks

Automated regeneration from gen-skill-docs.ts changes. All skills
now include telemetry start block, opt-in prompt, and epilogue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Supabase schema, edge functions, and SQL views

Telemetry backend infrastructure: telemetry_events table with RLS
(insert-only), installations table for retention tracking,
update_checks for install pings. Edge functions for update-check
(version + ping), telemetry-ingest (batch insert), and
community-pulse (weekly active count). SQL views for crash
clustering and skill co-occurrence sequences.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add telemetry-sync, community-dashboard, and integration tests

gstack-telemetry-sync: fire-and-forget JSONL → Supabase sync with
privacy tier field stripping, batch limits, and cursor tracking.
gstack-community-dashboard: CLI tool querying Supabase for skill
popularity, crash clusters, and version distribution.
19 integration tests covering all telemetry scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: session-specific .pending markers + crash_clusters view fix

Addresses Codex review findings:
- .pending race condition: use .pending-$SESSION_ID instead of
  shared .pending file to prevent concurrent session interference
- crash_clusters view: add total_occurrences and anonymous_occurrences
  columns since anonymous tier has no installation_id
- Added test: own session pending marker is not finalized

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: dual-attempt update check with Supabase install ping

Fires a parallel background curl to Supabase during the slow-path
version fetch. Logs upgrade_prompted event only on fresh fetches
(not cached replays) to avoid overcounting. GitHub remains the
primary version source — Supabase ping is fire-and-forget.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate telemetry usage stats into /retro output

Retro now reads ~/.gstack/analytics/skill-usage.jsonl and includes
gstack usage metrics (skill run counts, top skills, success rate)
in the weekly retrospective output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: move 'Skill usage telemetry' to Completed in TODOS.md

Implemented in this branch: local JSONL logging, opt-in prompt,
privacy tiers, Supabase backend, community dashboard, /retro
integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: wire Supabase credentials and expose tables via Data API

Add supabase/config.sh with project URL and publishable key (safe to
commit — RLS restricts to INSERT only). Update telemetry-sync,
community-dashboard, and update-check to source the config and
include proper auth headers for the Supabase REST API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add SELECT RLS policies to migration for community dashboard reads

All telemetry data is anonymous (no PII), so public reads via the
publishable key are safe. Needed for the community dashboard to
query skill popularity, crash clusters, and version distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.8.6)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: analytics backward-compatible with old JSONL format

Handle old-format events (no event_type field) alongside new format.
Skip hook_fire events. Fix grep -c whitespace issues and unbound
variable errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: map JSONL field names to Postgres columns in telemetry-sync

Local JSONL uses short names (v, ts, sessions) but the Supabase
table expects full names (schema_version, event_timestamp,
concurrent_sessions). Add sed mapping during field stripping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Codex adversarial findings — cursor, opt-out, queries

- Sync cursor now advances on HTTP 2xx (not grep for "inserted")
- Update-check respects telemetry opt-out before pinging Supabase
- Dashboard queries use correct view column names (total_occurrences)
- Sync strips old-format "repo" field to prevent privacy leak

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Privacy & Telemetry section to README

Transparent disclosure of what telemetry collects, what it never sends,
how to opt out, and a link to the schema so users can verify.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c0f3c3a9 — Garry Tan a month ago
fix: security hardening + issue triage (v0.8.3) (#205)

* fix: check for bun before running setup (#147)

Users without bun installed got a cryptic "command not found" error.
Now prints a clear message with install instructions.

Closes #147

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: block SSRF via URL validation in browse commands (#17)

Adds validateNavigationUrl() that blocks non-HTTP(S) schemes (file://,
javascript:, data:) and cloud metadata endpoints (169.254.169.254,
metadata.google.internal). Applied to goto, diff, and newTab commands.
Localhost and private IPs remain allowed for local dev QA.

Closes #17

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace eval $(gstack-slug) with source <(...) (#133)

Eliminates unnecessary use of eval across all skill templates and
generated files. source <(...) has identical behavior without the
shell injection surface. Also hardens gstack-diff-scope usage.

Closes #133

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rename /debug to /investigate to avoid Claude Code conflict (#190)

Claude Code has a built-in /debug command that shadows the gstack skill.
Renaming to /investigate which better reflects the systematic root-cause
investigation methodology.

Closes #190

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add unit tests for path validation helpers

validateOutputPath() and validateReadPath() are security-critical
functions with zero test coverage. Adds 14 tests covering safe paths,
traversal attacks, and prefix collision edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.8.3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update /debug → /investigate references in docs

CLAUDE.md, README.md, and docs/skills.md still referenced the old
/debug skill name after the rename.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden URL validation against hostname bypasses (Codex P1)

Codex review found that metadata IPs could be reached via hex
(0xA9FEA9FE), decimal (2852039166), octal, trailing dot, and IPv6
bracket forms. Now normalizes hostnames before checking the blocklist
and probes numeric IP representations via URL constructor.

Also moves URL validation before page allocation in newTab() to
prevent zombie tabs on rejection (Codex P3).

5 new test cases for bypass variants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d8523301 — Garry Tan a month ago
feat: /codex skill — multi-AI second opinion + proactive suggestions (#197)

* feat: /codex skill — multi-AI second opinion (review, challenge, consult)

Three modes: code review with pass/fail gate, adversarial challenge mode,
and conversational consult with session continuity. First multi-AI skill
in gstack, wrapping OpenAI's Codex CLI.

* feat: integrate /codex into /review, /ship, /plan-eng-review + dashboard

/review offers Codex second opinion after completing its own review.
/ship offers Codex review as optional gate before pushing.
/plan-eng-review offers Codex plan critique after scope challenge.
Review Readiness Dashboard shows Codex Review as optional row.

* chore: bump version and changelog (v0.8.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: codex skill validation (12 stub tests) + E2E eval test

Stub tests (free tier): verify template content — three modes, gate verdict,
session continuity, cost tracking, cross-model comparison, binary discovery,
error handling, mktemp usage, and integrations into /review, /ship, /plan-eng-review.

E2E test (paid tier): runs /codex review on vulnerable fixture repo via
session-runner, verifies output contains findings and GATE verdict.

* fix: codex auth error message — use codex login, not OPENAI_API_KEY

Codex authenticates via ChatGPT OAuth (codex login), not an env var.

* feat: codex uses high reasoning effort by default

gpt-5.2-codex is the only model available with ChatGPT login.
All commands now use model_reasoning_effort="high" for maximum
depth — the whole point is a thorough second opinion.

* feat: crank codex reasoning to xhigh (maximum)

* feat: per-mode reasoning (high for review/consult, xhigh for challenge) + web search

Review and consult use high reasoning — thorough but not slow.
Challenge (adversarial) uses xhigh — maximum depth for breaking code.
All modes enable web_search_cached so Codex can look up docs/APIs.

* refactor: don't hardcode model — use codex default (always latest)

* feat: JSONL output for codex challenge + consult modes

Use --json flag to parse codex's JSONL events, extracting reasoning
traces ([codex thinking]), tool calls ([codex ran]), and token counts.
This gives richer output than the -o flag alone — you can see what
codex thought through before its answer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: only persist codex-review log when code review actually ran

Don't write a codex-review entry to reviews.jsonl when only the
adversarial challenge (option B) was selected — there's no gate
verdict to record, and a false entry misleads the Review Readiness
Dashboard into thinking a code review happened.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add codex plan review option to /plan-eng-review

After scope challenge (Step 0), offer to have Codex independently
review the plan with a brutally honest tech reviewer persona.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: update e2e test for codex skill

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex integration bugs — plan content, review persistence, quoting, stderr

- plan-eng-review: Codex now reads the plan file itself instead of inlining
  content as a CLI arg (avoids ARG_MAX for large plans)
- review: add missing echo to persist codex-review results to reviews.jsonl
- codex: consult mode uses $TMPERR (mktemp) instead of hardcoded stderr path
- codex + review: quote $SLUG/$BRANCH_SLUG in review log paths
- codex: scope plan lookup to current project, warn on cross-project fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add .context/ to .gitignore to prevent session ID leaks

Codex consult mode stores session IDs in .context/codex-session-id.
Without this ignore rule, session IDs could leak into commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: proactive skill suggestions + opt-out + trigger phrase tests

- Preamble reads proactive config via gstack-config
- Root SKILL.md.tmpl has lifecycle map (stage → skill suggestion)
- Users can opt out ("stop suggesting") / opt in ("be proactive again")
- Restored trigger phrase validation tests (16 skills × "Use when" check)
- Added missing "Use when" trigger phrases to /debug and /office-hours

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update changelog for v0.8.0 — add proactive suggestions note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
c4f679d8 — Garry Tan a month ago
feat: safety hook skills + skill usage telemetry (v0.7.1) (#189)

* feat: add /careful, /freeze, /guard, /unfreeze safety hook skills

Four new on-demand skills using Claude Code's PreToolUse hooks:
- /careful: warns before destructive commands (rm -rf, DROP TABLE, force-push, etc.)
- /freeze: blocks file edits outside a specified directory
- /guard: composes both into one command
- /unfreeze: clears freeze boundary without ending session

Pure bash hook scripts with Python fallback for JSON edge cases.
Safe exceptions for build artifacts (node_modules, dist, .next, etc.).
Hook fire telemetry logs pattern name only (never command content).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add skill usage telemetry to preamble

TemplateContext system passes skill name through resolver pipeline so
each generated SKILL.md gets its own name baked into the telemetry line.
Appends to ~/.gstack/analytics/skill-usage.jsonl on every invocation.

Covers 14 preamble-using skills + 4 hook skills (inline telemetry).
JSONL format: {"skill":"ship","ts":"...","repo":"my-project"}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add analytics CLI for skill usage stats

bun run analytics reads ~/.gstack/analytics/skill-usage.jsonl and shows
top skills, per-repo breakdown, hook fire stats, and daily timeline.
Supports --period 7d/30d/all. Handles missing/empty/malformed data.

22 unit tests cover parsing, filtering, formatting, and edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add skills-used-this-week to /retro

Retro Step 2 now reads skill-usage.jsonl and shows which gstack skills
were used during the retro window. Follows the same pattern as the
Greptile signal and Backlog Health metrics — read file, filter by date,
aggregate, present. Skips silently if no analytics data exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add hook script and telemetry tests

32 unit tests for check-careful.sh covering all 8 destructive patterns,
safe exceptions, Python fallback, and malformed input handling.
7 unit tests for check-freeze.sh covering boundary enforcement,
trailing slash edge case, and missing state file.
Telemetry tests verify per-skill name correctness in generated output.
Adds careful/freeze/guard/unfreeze/document-release to ALL_SKILLS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version to 0.6.5 + changelog + mark TODOs shipped

Safety hook skills and skill usage telemetry shipped.
Analytics CLI and /retro integration included.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /debug auto-freezes edits to the module being debugged

Add PreToolUse hooks (Edit/Write) to debug/SKILL.md.tmpl that reference
the existing freeze/bin/check-freeze.sh. After Phase 1 investigation,
/debug locks edits to the narrowest affected directory.

Graceful degradation: if freeze script is unavailable, scope lock is
skipped. Users can run /unfreeze to remove the restriction.

Deferred 6 enhancements to TODOS.md, gated on telemetry showing the
freeze hook actually fires in real debugging sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4fe0ce9c — Garry Tan a month ago
feat: natural language skill routing + proactive suggestions (v0.7.1) (#195)

* feat: add trigger phrases to /debug and /office-hours

These two skills had zero "Use when asked to..." phrases, making them
completely invisible to natural language. Users saying "debug this" or
"brainstorm an idea" would get no skill invocation.

* feat: add proactive triggers to all workflow skills

Every skill now has "Proactively suggest when..." language so Claude
surfaces skills at natural moments — not just when the user says
specific trigger phrases.

* feat: lifecycle map + proactive preference system

Root gstack description now includes a developer workflow guide mapping
12 stages to skills. Preamble reads proactive preference via gstack-config.
Users can opt out with "stop suggesting things" and re-enable with
"be proactive again" — natural language toggle, no CLI needed.

* test: 11 journey-stage E2E routing tests + trigger phrase validation

Each test simulates a real development stage (ideation, plan review,
debug, QA, ship, retro...) with realistic project context and verifies
the right skill fires from natural language alone. 11/11 pass.

* chore: bump version and changelog (v0.7.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
6000af45 — Garry Tan a month ago
feat: founder discovery engine + /debug skill — v0.7.0 (#185)

* feat: add escalation protocol to preamble — all skills get DONE/BLOCKED/NEEDS_CONTEXT

Every skill now reports completion status (DONE, DONE_WITH_CONCERNS, BLOCKED,
NEEDS_CONTEXT) and has escalation rules: 3 failed attempts → STOP, security
uncertainty → STOP, scope exceeds verification → STOP.

"It is always OK to stop and say 'this is too hard for me.'"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add verification gate to /ship (Step 6.5) — no push without fresh evidence

Before pushing, re-verify tests if code changed during review fixes.
Rationalization prevention: "Should work now" → RUN IT.
"I'm confident" → Confidence is not evidence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add scope drift detection + verification of claims to /review

Step 1.5: Before reviewing code quality, check if the diff matches stated
intent. Flags scope creep and missing requirements (INFORMATIONAL).

Step 5 addition: Every review claim must cite evidence — "this pattern is
safe" needs a line reference, "tests cover this" needs a test name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mandatory implementation alternatives + design doc lookup in /plan-ceo-review

Step 0C-bis: Every plan must consider 2-3 approaches (minimal viable vs ideal
architecture) before mode selection. RECOMMENDATION required.

Pre-Review System Audit now checks ~/.gstack/projects/ for /brainstorm design
docs (branch-filtered with fallback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design doc lookup in /plan-eng-review + fix branch name sanitization

Step 0 now checks ~/.gstack/projects/ for /brainstorm design docs
(branch-filtered with fallback, reads Supersedes: for revision context).

Fix: branch names with '/' (e.g. garrytan/better-process) now get
sanitized via tr '/' '-' in test plan artifact filenames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: new /brainstorm and /debug skills

/brainstorm: Socratic design exploration before planning. Context gathering,
clarifying questions (smart-skip), related design discovery (keyword grep),
premise challenge, forced alternatives, design doc artifact with lineage
tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/.

/debug: Systematic root-cause debugging. Iron Law: no fixes without root
cause investigation. Pattern analysis, hypothesis testing with 3-strike
escalation, structured DEBUG REPORT output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: structural tests for new skills + escalation protocol assertions

Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays.
Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip),
debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike).
Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for
all preamble skills.

Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill),
update CLAUDE.md project structure with new skill directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: rename /brainstorm → /office-hours across references

Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review,
and gen-skill-docs to reference the new office-hours skill name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: YC Office Hours — dual-mode product diagnostic + builder brainstorm

Rewrite /office-hours with two modes:

Startup mode: six forcing questions (Demand Reality, Status Quo, Desperate
Specificity, Narrowest Wedge, Observation & Surprise, Future-Fit) that push
founders toward radical honesty about demand, users, and product decisions.
Includes smart routing by product stage, intrapreneurship adaptation, and
YC apply CTA for strong-signal founders.

Builder mode: generative brainstorming for side projects, hackathons,
learning, and open source. Enthusiastic collaborator tone, design thinking
questions, no business interrogation.

Mode is determined by an explicit question in Phase 1 — no guessing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 14 assertions for YC Office Hours content coverage

Validates dual-mode structure (Startup/Builder), all six forcing questions,
builder brainstorming content, intrapreneurship adaptation, YC apply CTA,
and operating principles for both modes. 192 tests total, all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.6.1

- README.md: added /office-hours and /debug to skills table, updated
  skill count from 13 to 15, added both to install instructions
- docs/skills.md: added /office-hours and /debug deep dive sections
- CLAUDE.md: updated office-hours description to reflect dual-mode
- CONTRIBUTING.md: updated skill count from 13 to 15
- CHANGELOG.md: added YC Office Hours and /debug entries to 0.6.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: founder discovery engine in /office-hours (v0.7.0)

Turn /office-hours into a YC founder discovery engine. Every session now
ends with three beats: signal reflection (specific callbacks to what the
user said), "One more thing." transition, and a personal plea from Garry
Tan with three tiers based on founder signal strength. Top tier uses
AskUserQuestion to ask directly and opens ycombinator.com/apply?ref=gstack.

Adds Phase 4.5 (Founder Signal Synthesis), "What I noticed about how you
think" section to both design doc templates, anti-slop GOOD/BAD examples,
and emotional targets per tier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add validation assertions for founder discovery engine

8 new assertions covering: YC apply CTA with ref=gstack tracking,
"What I noticed" design doc section, golden age framing, Garry Tan
personal plea, founder signal synthesis phase, three-tier decision
rubric, anti-slop GOOD/BAD examples, "One more thing" transition beat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0

VERSION: 0.6.4.1 → 0.7.0
CHANGELOG: new entry — Office Hours Gets Personal
README: updated /office-hours and /plan-design-review descriptions
docs/skills.md: updated /office-hours table + deep dive section
TODOS.md: added /yc-prep skill TODO (P2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate Install section, fix stale skills lists, deduplicate CHANGELOG entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bc86a665 — Garry Tan a month ago
feat: add trigger phrases to skill descriptions for better model matching (v0.6.4.1) (#169)

* feat: add trigger phrases to skill descriptions for better model matching

Anthropic's skill best practices: "the description field is not a summary —
it's when to trigger." Add explicit "Use when asked to..." phrases to 12 skill
descriptions so Claude's auto-discovery works with natural language requests
like "deploy this" or "check my diff", not just explicit /slash-commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add on-demand hooks and telemetry to TODOS.md

Captures two ideas from Anthropic's skill best practices post:
- /careful, /freeze, /guard on-demand hook skills (P3)
- Skill usage telemetry via preamble JSONL append (P3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.4.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: exclude internal details from CHANGELOG style guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9d47619e — Garry Tan a month ago
feat: Completeness Principle — Boil the Lake (v0.6.1) (#140)

* feat: Completeness Principle — Boil the Lake (WIP, pre-merge)

Add Completeness Principle to all skill preambles, dual-time estimates,
compression table, anti-pattern gallery, Lake Score, and completeness
gaps review category. VERSION/CHANGELOG will be rebased after merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update stale version reference in TODOS.md (v0.5.3 → v0.6.1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update CHANGELOG date + README for v0.6.1 features

- Add date to CHANGELOG 0.6.1 entry
- Add Completeness Principle to README intro
- Add SELECTIVE EXPANSION mode to CEO review section
- Add test bootstrap mention to /ship section
- Fix uninstall command missing design-consultation in project uninstall
- Add "recommends shortcuts" and "no tests" to Without gstack list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: split README into lean intro + docs/ directory (gh CLI pattern)

README: 875 → 243 lines. Keeps intro, skill table, demo, install, and
troubleshooting. All per-skill deep dives, Greptile integration guide,
and contributor mode docs moved to docs/ directory.

- docs/skills.md — full philosophy and examples for all 13 skills
- docs/greptile.md — Greptile setup and triage workflow
- docs/contributor-mode.md — how to enable and use contributor mode
- README now links to docs/ via Documentation table
- Updated skill table entries with latest features (fix-first, regression
  tests, test health, completeness gaps)
- Updated demo transcript with AUTO-FIXED, coverage audit, regression test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove "competitor" language, rewrite README in Garry's voice

Replace "browses competitors" with "knows the landscape" / "what's out
there" throughout all user-facing copy. Trim README from 243 to 167
lines — tighter, more opinionated, less listicle energy. Remove
Completeness Principle from README top (it lives in CLAUDE.md and the
skill preambles where Claude actually reads it).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README in Garry's raw voice — AGI era, L8 factory, real stories

The README now sounds like Garry, not a product page. Leads with the
live experiment, the 16k LOC/day reality, the real-life coding stories
(Austin, hospital bedside). Highlights the newest unlocks (design at
the heart, /qa parallelism, smart review routing, test bootstrap).
Closes with an open invitation — free MIT, fork it, let's all ride
the wave together.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Garry's bonafides to README intro — Palantir, Posterous, YC, 600k LOC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add real /retro numbers — 140k lines, 362 commits across 3 projects

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add "in the last 60 days" timeframe to 600k LOC claim

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add GitHub contribution graphs — 2026 vs 2013 side by side

Same person, different era. 2013: 772 contributions building Bookface.
2026: 1,237 contributions and accelerating. The difference is the tooling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify /retro stats are from last 7 days

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add designer/PM/eng manager roles to intro

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove Josh/L8 reference from README

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move demo up, make it dramatically more impressive

Show the actual architecture diagram, auto-fixed issues, 100% coverage,
regression test generation. Punch line: "That is not a copilot. That is
a team."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove "My journey" section — intro already covers it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: prefix all skill commands with You: in demo transcript

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: collapse You/Claude lines in demo — no gap between command and response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify plan mode flow in demo — approve, exit, Claude implements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move /ship to end of demo — review → QA → ship is the real flow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add /plan-design-review to demo, tighten CEO response

Shorter CEO reply, compressed eng diagram, added design audit with
AI Slop score. Seven commands now: plan → eng → build → design →
review → QA → ship.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move design review before implementation — it's part of planning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: reorder demo — design before eng, after CEO

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove URL from /plan-design-review in demo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add [...] annotations showing what actually happens at each step

Each step now shows what the agent does under the hood: 8 expansion
proposals cherry-picked, 80-item design audit, ASCII diagrams for
every flow, 2400 lines written in 8 minutes, real browser QA, bug
found and fixed. Makes the demo feel real, not abstract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rename Contributor Mode to How to Contribute in docs table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add Coinbase, Instacart, Rippling to YC bonafides

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add "one or two people in a garage" to founder story

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add skill table to top of skills.md with anchor links

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate — roll contributor-mode into CONTRIBUTING, greptile into skills

- docs/contributor-mode.md → merged into CONTRIBUTING.md (session awareness section)
- docs/greptile.md → merged into docs/skills.md (Greptile integration section)
- Reordered docs table: Skills > Architecture > Browser > Contributing > Changelog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c99757b5 — Garry Tan a month ago
feat: /design-consultation — risk-taking, visual research, ambitious preview (v0.5.2) (#131)

* feat: /design-consultation — risk-taking thesis, visual research, ambitious preview

Add SAFE/RISK breakdown to design proposals so users see which choices
match category conventions vs. which are deliberate creative departures.

Wire browse binary for visual competitive research — agent browses
competitor sites, takes screenshots, and analyzes fonts/colors/spacing
with graceful degradation to WebSearch-only or built-in knowledge.

Upgrade preview page instructions to render realistic product mockups
(dashboards, marketing pages, settings forms) instead of just swatches.

Rewrite README section with the thesis: "coherence is table stakes —
the real question is where you take risks."

* chore: bump version and changelog (v0.5.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: restore SKILL.md files to match main

Prior commit included SKILL.md files regenerated from stale templates.
Restore to match origin/main content.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
73b00b4e — Garry Tan a month ago
feat: Review Readiness Dashboard + gstack-slug helper (v0.5.1) (#130)

* feat: add bin/gstack-slug helper + migrate all inline SLUG computation

Extract the opaque SLUG sed pipeline into a shared 5-line shell script.
Replace 8 inline copies across templates with eval $(gstack-slug).
Sanitizes branch names (/ → -) to prevent subdirectory creation.

* feat: review readiness dashboard — track CEO/Eng/Design reviews per branch

Each review skill logs its result to JSONL. A shared {{REVIEW_DASHBOARD}}
placeholder displays run counts, timestamps, and a CLEARED TO SHIP verdict.
/ship pre-flight reads the dashboard and prompts when reviews are missing.

* chore: bump version and changelog (v0.5.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
4a77cc2c — Garry Tan a month ago
feat: /plan-design-review + /qa-design-review skills (v0.5.0) (#102)

* feat: add {{DESIGN_METHODOLOGY}} resolver and register design review skills

Add generateDesignMethodology() to gen-skill-docs.ts with 10-category, 80-item
design audit checklist. Register plan-design-review and qa-design-review templates
in findTemplates(). Add both skills to skill-check.ts SKILL_FILES. Add command
and snapshot flag validation tests for both skills in skill-validation.test.ts.

* feat: add /plan-design-review and /qa-design-review skills

/plan-design-review: report-only designer audit with letter grades, AI slop
scoring, structured first impression, design system extraction, DESIGN.md
inference and export offer. Never modifies code.

/qa-design-review: same audit, then iterative fix loop with style(design):
commits, CSS-safe WTF heuristic, before/after screenshots, final re-audit.

* chore: bump version and changelog (v0.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, ARCHITECTURE for design review skills (v0.5.0)

- Update skill count to 11, add /plan-design-review and /qa-design-review
  to skill table, install/uninstall commands, and demo walkthrough
- Add narrative sections: "senior designer mode" and "designer who codes mode"
  with compelling examples showing AI Slop detection and design system inference
- Add {{DESIGN_METHODOLOGY}} to ARCHITECTURE.md placeholder table
- Extend demo to show full plan→eng→review→ship→qa→design-review pipeline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: regenerate design review SKILL.md files after merge from main

Picks up BASE_BRANCH_DETECT resolver and updated contributor mode from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /design-consultation skill — design consultant that creates DESIGN.md

6-phase consultant flow: product context → competitive research (WebSearch) →
complete coherent proposal → drill-downs on demand → font+color preview page →
write DESIGN.md + update CLAUDE.md. Opinionated recommendations grounded in
product context, not menu-driven forms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add E2E tests for design skill family (7 tests + LLM quality judge)

Tests 1-4: /design-consultation (core flow, research integration, existing
DESIGN.md handling, font+color preview generation).
Tests 5-6: /plan-design-review (audit report, DESIGN.md export).
Test 7: /qa-design-review (audit + fix loop).
LLM judge validates font blacklist compliance, coherence, and AI slop avoidance.
Also adds plan-design-review + qa-design-review to ALL_SKILLS test array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: mark /design-consultation as shipped in TODOS.md

Renamed from /setup-design-md to reflect the consultant approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>