fix: security hardening + issue triage (v0.8.3) (#205) * fix: check for bun before running setup (#147) Users without bun installed got a cryptic "command not found" error. Now prints a clear message with install instructions. Closes #147 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via URL validation in browse commands (#17) Adds validateNavigationUrl() that blocks non-HTTP(S) schemes (file://, javascript:, data:) and cloud metadata endpoints (169.254.169.254, metadata.google.internal). Applied to goto, diff, and newTab commands. Localhost and private IPs remain allowed for local dev QA. Closes #17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace eval $(gstack-slug) with source <(...) (#133) Eliminates unnecessary use of eval across all skill templates and generated files. source <(...) has identical behavior without the shell injection surface. Also hardens gstack-diff-scope usage. Closes #133 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: rename /debug to /investigate to avoid Claude Code conflict (#190) Claude Code has a built-in /debug command that shadows the gstack skill. Renaming to /investigate which better reflects the systematic root-cause investigation methodology. Closes #190 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add unit tests for path validation helpers validateOutputPath() and validateReadPath() are security-critical functions with zero test coverage. Adds 14 tests covering safe paths, traversal attacks, and prefix collision edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.8.3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update /debug → /investigate references in docs CLAUDE.md, README.md, and docs/skills.md still referenced the old /debug skill name after the rename. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden URL validation against hostname bypasses (Codex P1) Codex review found that metadata IPs could be reached via hex (0xA9FEA9FE), decimal (2852039166), octal, trailing dot, and IPv6 bracket forms. Now normalizes hostnames before checking the blocklist and probes numeric IP representations via URL constructor. Also moves URL validation before page allocation in newTab() to prevent zombie tabs on rejection (Codex P3). 5 new test cases for bypass variants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docs: rewrite README + skills docs, auto-invoke /document-release (v0.8.4) (#207) * docs: add 6 missing skills to proactive suggestion list Add /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade to the root SKILL.md.tmpl proactive suggestion list so Claude suggests them at the appropriate workflow stages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add 6 new skill entries + browse handoff to docs - docs/skills.md: add /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade to skill table with deep-dive sections. Group safety skills into one "Safety & Guardrails" section. Add browse handoff subsection to /browse deep-dive. - BROWSER.md: add handoff/resume to command reference table + section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add power tools section + update skill lists in README - Update prose: "Fifteen specialists and six power tools" - Add power tools table after sprint specialists: /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade - Update all 4 skill list locations (install Step 1, Step 2, troubleshooting CLAUDE.md example) to include all 21 skills Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add v0.7-v0.8.2 features to README "What's new" section Add paragraphs for browse handoff, /codex multi-AI review, safety guardrails (/careful, /freeze, /guard), proactive skill suggestions, and /ship auto-invoking /document-release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-invoke /document-release after /ship PR creation Add Step 8.5 to /ship that automatically reads document-release/SKILL.md and executes the doc update workflow after creating the PR. This prevents documentation drift — /ship now keeps docs current without a separate command. Completes P1 TODO: "Auto-invoke /document-release from /ship" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.8.4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docs: frame skills as sprint process, rewrite /office-hours examples (#188) * docs: rewrite /office-hours examples with real session showing premise challenge and reframe Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: anonymize /office-hours examples — remove identifying details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: tighten See it work example — keep reframe hook, compress details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: soften user pain description in See it work example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reorder skills tables and sections to match sprint workflow Think → plan → review → test → ship → reflect → utilities. /office-hours is now first in both tables and on the page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: frame skills as a sprint process, not a tool collection Think → Plan → Build → Review → Test → Ship → Reflect. Each skill feeds into the next. 10-15 parallel sprints is the practical max. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: founder discovery engine + /debug skill — v0.7.0 (#185) * feat: add escalation protocol to preamble — all skills get DONE/BLOCKED/NEEDS_CONTEXT Every skill now reports completion status (DONE, DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) and has escalation rules: 3 failed attempts → STOP, security uncertainty → STOP, scope exceeds verification → STOP. "It is always OK to stop and say 'this is too hard for me.'" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add verification gate to /ship (Step 6.5) — no push without fresh evidence Before pushing, re-verify tests if code changed during review fixes. Rationalization prevention: "Should work now" → RUN IT. "I'm confident" → Confidence is not evidence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scope drift detection + verification of claims to /review Step 1.5: Before reviewing code quality, check if the diff matches stated intent. Flags scope creep and missing requirements (INFORMATIONAL). Step 5 addition: Every review claim must cite evidence — "this pattern is safe" needs a line reference, "tests cover this" needs a test name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: mandatory implementation alternatives + design doc lookup in /plan-ceo-review Step 0C-bis: Every plan must consider 2-3 approaches (minimal viable vs ideal architecture) before mode selection. RECOMMENDATION required. Pre-Review System Audit now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design doc lookup in /plan-eng-review + fix branch name sanitization Step 0 now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback, reads Supersedes: for revision context). Fix: branch names with '/' (e.g. garrytan/better-process) now get sanitized via tr '/' '-' in test plan artifact filenames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: new /brainstorm and /debug skills /brainstorm: Socratic design exploration before planning. Context gathering, clarifying questions (smart-skip), related design discovery (keyword grep), premise challenge, forced alternatives, design doc artifact with lineage tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/. /debug: Systematic root-cause debugging. Iron Law: no fixes without root cause investigation. Pattern analysis, hypothesis testing with 3-strike escalation, structured DEBUG REPORT output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: structural tests for new skills + escalation protocol assertions Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays. Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip), debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike). Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for all preamble skills. Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill), update CLAUDE.md project structure with new skill directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: rename /brainstorm → /office-hours across references Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review, and gen-skill-docs to reference the new office-hours skill name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: YC Office Hours — dual-mode product diagnostic + builder brainstorm Rewrite /office-hours with two modes: Startup mode: six forcing questions (Demand Reality, Status Quo, Desperate Specificity, Narrowest Wedge, Observation & Surprise, Future-Fit) that push founders toward radical honesty about demand, users, and product decisions. Includes smart routing by product stage, intrapreneurship adaptation, and YC apply CTA for strong-signal founders. Builder mode: generative brainstorming for side projects, hackathons, learning, and open source. Enthusiastic collaborator tone, design thinking questions, no business interrogation. Mode is determined by an explicit question in Phase 1 — no guessing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add 14 assertions for YC Office Hours content coverage Validates dual-mode structure (Startup/Builder), all six forcing questions, builder brainstorming content, intrapreneurship adaptation, YC apply CTA, and operating principles for both modes. 192 tests total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.6.1 - README.md: added /office-hours and /debug to skills table, updated skill count from 13 to 15, added both to install instructions - docs/skills.md: added /office-hours and /debug deep dive sections - CLAUDE.md: updated office-hours description to reflect dual-mode - CONTRIBUTING.md: updated skill count from 13 to 15 - CHANGELOG.md: added YC Office Hours and /debug entries to 0.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: founder discovery engine in /office-hours (v0.7.0) Turn /office-hours into a YC founder discovery engine. Every session now ends with three beats: signal reflection (specific callbacks to what the user said), "One more thing." transition, and a personal plea from Garry Tan with three tiers based on founder signal strength. Top tier uses AskUserQuestion to ask directly and opens ycombinator.com/apply?ref=gstack. Adds Phase 4.5 (Founder Signal Synthesis), "What I noticed about how you think" section to both design doc templates, anti-slop GOOD/BAD examples, and emotional targets per tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add validation assertions for founder discovery engine 8 new assertions covering: YC apply CTA with ref=gstack tracking, "What I noticed" design doc section, golden age framing, Garry Tan personal plea, founder signal synthesis phase, three-tier decision rubric, anti-slop GOOD/BAD examples, "One more thing" transition beat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.7.0 VERSION: 0.6.4.1 → 0.7.0 CHANGELOG: new entry — Office Hours Gets Personal README: updated /office-hours and /plan-design-review descriptions docs/skills.md: updated /office-hours table + deep dive section TODOS.md: added /yc-prep skill TODO (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate Install section, fix stale skills lists, deduplicate CHANGELOG entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149) * refactor: rename qa-design-review → design-review The "qa-" prefix was confusing — this is the live-site design audit with fix loop, not a QA-only report. Rename directory and update all references across docs, tests, scripts, and skill templates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: interactive /plan-design-review + CEO invokes designer Rewrite /plan-design-review from report-only grading to an interactive plan-fixer that rates each design dimension 0-10, explains what a 10 looks like, and edits the plan to get there. Parallel structure with /plan-ceo-review and /plan-eng-review — one issue = one AskUserQuestion. CEO review now detects UI scope and invokes the designer perspective when the plan has frontend/UX work, so you get design review automatically when it matters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: validation + touchfile entries for 100% coverage Add design-consultation to command/snapshot flag validation. Add 4 skills to contributor mode validation (plan-design-review, design-review, design-consultation, document-release). Add 2 templates to hardcoded branch check. Register touchfile entries for 10 new LLM-judge tests and 1 new E2E test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-judge for 10 skills + gstack-upgrade E2E Add LLM-judge quality evals for all uncovered skills using a DRY runWorkflowJudge helper with section marker guards. Add real E2E test for gstack-upgrade using mock git remote (replaces test.todo). Add plan-edit assertion to plan-design-review E2E. 14/15 skills now at full coverage. setup-browser-cookies remains deferred (needs real browser). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add bisect commit style to CLAUDE.md All commits should be single logical changes, split before pushing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: Completeness Principle — Boil the Lake (v0.6.1) (#140) * feat: Completeness Principle — Boil the Lake (WIP, pre-merge) Add Completeness Principle to all skill preambles, dual-time estimates, compression table, anti-pattern gallery, Lake Score, and completeness gaps review category. VERSION/CHANGELOG will be rebased after merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update stale version reference in TODOS.md (v0.5.3 → v0.6.1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update CHANGELOG date + README for v0.6.1 features - Add date to CHANGELOG 0.6.1 entry - Add Completeness Principle to README intro - Add SELECTIVE EXPANSION mode to CEO review section - Add test bootstrap mention to /ship section - Fix uninstall command missing design-consultation in project uninstall - Add "recommends shortcuts" and "no tests" to Without gstack list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: split README into lean intro + docs/ directory (gh CLI pattern) README: 875 → 243 lines. Keeps intro, skill table, demo, install, and troubleshooting. All per-skill deep dives, Greptile integration guide, and contributor mode docs moved to docs/ directory. - docs/skills.md — full philosophy and examples for all 13 skills - docs/greptile.md — Greptile setup and triage workflow - docs/contributor-mode.md — how to enable and use contributor mode - README now links to docs/ via Documentation table - Updated skill table entries with latest features (fix-first, regression tests, test health, completeness gaps) - Updated demo transcript with AUTO-FIXED, coverage audit, regression test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "competitor" language, rewrite README in Garry's voice Replace "browses competitors" with "knows the landscape" / "what's out there" throughout all user-facing copy. Trim README from 243 to 167 lines — tighter, more opinionated, less listicle energy. Remove Completeness Principle from README top (it lives in CLAUDE.md and the skill preambles where Claude actually reads it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README in Garry's raw voice — AGI era, L8 factory, real stories The README now sounds like Garry, not a product page. Leads with the live experiment, the 16k LOC/day reality, the real-life coding stories (Austin, hospital bedside). Highlights the newest unlocks (design at the heart, /qa parallelism, smart review routing, test bootstrap). Closes with an open invitation — free MIT, fork it, let's all ride the wave together. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Garry's bonafides to README intro — Palantir, Posterous, YC, 600k LOC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add real /retro numbers — 140k lines, 362 commits across 3 projects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "in the last 60 days" timeframe to 600k LOC claim Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add GitHub contribution graphs — 2026 vs 2013 side by side Same person, different era. 2013: 772 contributions building Bookface. 2026: 1,237 contributions and accelerating. The difference is the tooling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify /retro stats are from last 7 days Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add designer/PM/eng manager roles to intro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove Josh/L8 reference from README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move demo up, make it dramatically more impressive Show the actual architecture diagram, auto-fixed issues, 100% coverage, regression test generation. Punch line: "That is not a copilot. That is a team." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "My journey" section — intro already covers it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: prefix all skill commands with You: in demo transcript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: collapse You/Claude lines in demo — no gap between command and response Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify plan mode flow in demo — approve, exit, Claude implements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move /ship to end of demo — review → QA → ship is the real flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add /plan-design-review to demo, tighten CEO response Shorter CEO reply, compressed eng diagram, added design audit with AI Slop score. Seven commands now: plan → eng → build → design → review → QA → ship. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move design review before implementation — it's part of planning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reorder demo — design before eng, after CEO Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove URL from /plan-design-review in demo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add [...] annotations showing what actually happens at each step Each step now shows what the agent does under the hood: 8 expansion proposals cherry-picked, 80-item design audit, ASCII diagrams for every flow, 2400 lines written in 8 minutes, real browser QA, bug found and fixed. Makes the demo feel real, not abstract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rename Contributor Mode to How to Contribute in docs table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Coinbase, Instacart, Rippling to YC bonafides Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "one or two people in a garage" to founder story Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add skill table to top of skills.md with anchor links Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate — roll contributor-mode into CONTRIBUTING, greptile into skills - docs/contributor-mode.md → merged into CONTRIBUTING.md (session awareness section) - docs/greptile.md → merged into docs/skills.md (Greptile integration section) - Reordered docs table: Skills > Architecture > Browser > Contributing > Changelog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>