~cytrogen/gstack

a0328be0 — Garry Tan 9 days ago
feat: always-on adversarial review + scope drift + plan mode design tools (v0.14.3.0) (#694)

* feat: always-on adversarial review + scope drift resolver + cross-model tension format

Rewrite generateAdversarialStep() to remove LOC-based tier skipping. Every review
now runs both Claude adversarial subagent and Codex adversarial challenge. OLD_CFG
only gates Codex passes, not Claude. Add generateScopeDrift() shared resolver.
Fix cross-model tension AskUserQuestion to include RECOMMENDATION + Completeness.

* feat: add scope drift to /ship, extract from /review template

/ship gets {{SCOPE_DRIFT}} at Step 3.48 + PR body slot. /review replaces
hardcoded scope drift with {{SCOPE_DRIFT}} + {{PLAN_COMPLETION_AUDIT_REVIEW}}.

* feat: plan mode safe operations — browse, design, codex allowed in plan mode

Add preamble section declaring $B, $D, codex, and ~/.gstack/ writes as
plan-mode-safe. Unblocks design skills during planning.

* test: update adversarial + add scope drift assertions

Rename adversarial tests to reflect always-on behavior. Remove tier
threshold assertions. Add scope drift content assertions for both
/review and /ship generated SKILL.md files.

* chore: bump version and changelog (v0.14.3.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
a1a93361 — Garry Tan 10 days ago
feat: sidebar CSS inspector + per-tab agents (v0.13.9.0) (#650)

* feat: CDP inspector module — persistent sessions, CSS cascade, style modification

New browse/src/cdp-inspector.ts with full CDP inspection engine:
- inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel
- modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback
- Persistent CDP session lifecycle (create, reuse, detach on nav, re-create)
- Specificity sorting, overridden property detection, UA rule filtering
- Modification history with undo support
- formatInspectorResult() for CLI output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI

Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply,
POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE).
CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter
removal), prettyscreenshot (clean screenshot pipeline).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit

Extension changes for the visual CSS inspector:
- inspector.js: element picker with hover highlight, CSS selector generation,
  basic mode fallback (getComputedStyle + CSSOM), page alteration handlers
- inspector.css: picker overlay styles (blue highlight + tooltip)
- background.js: inspector message routing (picker <-> server <-> sidepanel)
- sidepanel: Inspector tab with box model viz (gstack palette), matched rules
  with specificity badges, computed styles, click-to-edit quick edit,
  Send to Agent/Code button, empty/loading/error states

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document inspect, style, cleanup, prettyscreenshot browse commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: auto-track user-created tabs and handle tab close

browser-manager.ts changes:
- context.on('page') listener: automatically tracks tabs opened by the user
  (Cmd+T, right-click open in new tab, window.open). Previously only
  programmatic newTab() was tracked, so user tabs were invisible.
- page.on('close') handler in wirePageEvents: removes closed tabs from the
  pages map and switches activeTabId to the last remaining tab.
- syncActiveTabByUrl: match Chrome extension's active tab URL to the correct
  Playwright page for accurate tab identity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: per-tab agent isolation via BROWSE_TAB environment variable

Prevents parallel sidebar agents from interfering with each other's tab context.

Three-layer fix:
- sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process,
  per-tab processing set allows concurrent agents across tabs
- cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body
- server.ts: handleCommand() temporarily switches activeTabId when tabId is present,
  restores after command completes (safe: Bun event loop is single-threaded)

Also: per-tab agent state (TabAgentState map), per-tab message queuing,
per-tab chat buffers, verbose streaming narration, stop button endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish

Extension changes:
- sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab()
  swaps entire chat view, browserTabActivated handler for instant tab sync,
  stop button wired to /sidebar-agent/stop, pollTabs renders tab bar
- sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup,
  input placeholder "Ask about this page..."
- sidepanel.css: tab bar styles, stop button styles, loading state fixes
- background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel
  with tab URL for instant tab switch detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX

sidebar-agent.test.ts (new tests):
- BROWSE_TAB env var passed to claude process
- CLI reads BROWSE_TAB and sends tabId in body
- handleCommand accepts tabId, saves/restores activeTabId
- Tab pinning only activates when tabId provided
- Per-tab agent state, queue, concurrency
- processingTabs set for parallel agents

sidebar-ux.test.ts (new tests):
- context.on('page') tracks user-created tabs
- page.on('close') removes tabs from pages map
- Tab isolation uses BROWSE_TAB not system prompt hack
- Per-tab chat context in sidepanel
- Tab bar rendering, stop button, banner text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve merge conflicts — keep security defenses + per-tab isolation

Merged main's security improvements (XML escaping, prompt injection defense,
allowed commands whitelist, --model opus, Write tool, stderr capture) with
our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set,
no --resume). Updated test expectations for expanded system prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add inspector message types to background.js allowlist

Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing
all inspector message types (startInspector, stopInspector, elementPicked,
pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult).
Messages were silently rejected, making the inspector broken on ALL pages.

Also: separate executeScript and insertCSS into individual try blocks in
injectInspector(), store inspectorMode for routing, and add content.js
fallback when script injection fails (CSP, chrome:// pages).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: basic element picker in content.js for CSP-restricted pages

When inspector.js can't be injected (CSP, chrome:// pages), content.js
provides a basic picker using getComputedStyle + CSSOM:
- startBasicPicker/stopBasicPicker message handlers
- captureBasicData() with ~30 key CSS properties, box model, matched rules
- Hover highlight with outline save/restore (never leaves artifacts)
- Click uses e.target directly (no re-querying by selector)
- Sends inspectResult with mode:'basic' for sidebar rendering
- Escape key cancels picker and restores outlines

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cleanup + screenshot buttons in sidebar inspector toolbar

Two action buttons in the inspector toolbar:
- Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat
  notification on success, resets inspector state (element may be removed)
- Screenshot (📸): POSTs screenshot to server, shows spinner, chat
  notification with saved file path

Shared infrastructure:
- .inspector-action-btn CSS with loading spinner via ::after pseudo-element
- chat-notification type in addChatEntry() for system messages
- package.json version bump to 0.13.9.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: inspector allowlist, CSP fallback, cleanup/screenshot buttons

16 new tests in sidebar-ux.test.ts:
- Inspector message allowlist includes all inspector types
- content.js basic picker (startBasicPicker, captureBasicData, CSSOM,
  outline save/restore, inspectResult with mode basic, Escape cleanup)
- background.js CSP fallback (separate try blocks, inspectorMode, fallback)
- Cleanup button (POST /command, inspector reset after success)
- Screenshot button (POST /command, notification rendering)
- Chat notification type and CSS styles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.13.9.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cleanup + screenshot buttons in chat toolbar (not just inspector)

Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat
input, always visible. Both inspector and chat buttons share runCleanup() and
runScreenshot() helper functions. Clicking either set shows loading state on
both simultaneously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: chat toolbar buttons, shared helpers, quick-action-btn styles

Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn,
quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading),
shared runCleanup/runScreenshot helper functions, and cleanup inspector reset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal

Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and
Readability.js research:
- ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.)
- cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns
- overlays (NEW): paywalls, newsletter popups, interstitials, push prompts,
  app download banners, survey modals
- social: follow prompts, share tools
- Cleanup now defaults to --all when no args (sidebar button fix)
- Uses !important on all display:none (overrides inline styles)
- Unlocks body/html scroll (overflow:hidden from modal lockout)
- Removes blur/filter effects (paywall content blur)
- Removes max-height truncation (article teaser truncation)
- Collapses empty ad placeholder whitespace (empty divs after ad removal)
- Skips gstack-ctrl indicator in sticky removal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: disable action buttons when disconnected, no error spam

- setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot
  buttons (both chat toolbar and inspector toolbar)
- Called with false in updateConnection when server URL is null
- Called with true when connection established
- runCleanup/runScreenshot silently return when disconnected instead of
  showing 'Not connected' error notifications
- CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: cleanup heuristics, button disabled state, overlay selectors

17 new tests:
- cleanup defaults to --all on empty args
- CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial)
- Major ad networks in selectors (doubleclick, taboola, criteo, etc.)
- Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast)
- !important override for inline styles
- Scroll unlock (body overflow:hidden)
- Blur removal (paywall content blur)
- Article truncation removal (max-height)
- Empty placeholder collapse
- gstack-ctrl indicator skip in sticky cleanup
- setActionButtonsEnabled function
- Buttons disabled when disconnected
- No error spam from cleanup/screenshot when disconnected
- CSS disabled styles for action buttons

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: LLM-based page cleanup — agent analyzes page semantically

Instead of brittle CSS selectors, the cleanup button now sends a prompt to
the sidebar agent (which IS an LLM). The agent:
1. Runs deterministic $B cleanup --all as a quick first pass
2. Takes a snapshot to see what's left
3. Analyzes the page semantically to identify remaining clutter
4. Removes elements intelligently, preserving site branding

This means cleanup works correctly on any site without site-specific selectors.
The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is
junk, but the SF Chronicle masthead should stay.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: aggressive cleanup heuristics + preserve top nav bar

Deterministic cleanup improvements (used as first pass before LLM analysis):
- New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games,
  recirculation widgets (taboola, outbrain, nativo), cross-promotion banners
- Text-content detection: removes "ADVERTISEMENT", "Article continues below",
  "Sponsored", "Paid content" labels and their parent wrappers
- Sticky fix: preserves the topmost full-width element near viewport top (site
  nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical
  position, preserves the first one that spans >80% viewport width.

Tests: clutter category, ad label removal, nav bar preservation logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: LLM-based cleanup architecture, deterministic heuristics, sticky nav

22 new tests covering:
- Cleanup button uses /sidebar-command (agent) not /command (deterministic)
- Cleanup prompt includes deterministic first pass + agent snapshot analysis
- Cleanup prompt lists specific clutter categories for agent guidance
- Cleanup prompt preserves site identity (masthead, headline, body, byline)
- Cleanup prompt instructs scroll unlock and $B eval removal
- Loading state management (async agent, setTimeout)
- Deterministic clutter: audio/podcast, games/puzzles, recirculation
- Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues)
- Ad label parent wrapper hiding for small containers
- Sticky nav preservation (sort by position, first full-width near top)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent repeat chat message rendering on reconnect/replay

Root cause: server persists chat to disk (chat.jsonl) and replays on restart.
Client had no dedup, so every reconnect re-rendered the entire history.
Messages from an old HN session would repeat endlessly on the SF Chronicle tab.

Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry
skips entries already in the set. Entries without an id (local notifications)
bypass the check. Clear chat resets the set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: agent stops when done, no focus stealing, opus for prompt injection safety

Three fixes for sidebar agent UX:
- System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep
  exploring or doing bonus work." Prevents agent from endlessly taking
  screenshots and highlighting elements after answering the question.
- switchTab(id, opts): new bringToFront option. Internal tab pinning
  (BROWSE_TAB) uses bringToFront: false so agent commands never steal
  window focus from the user's active app.
- Keep opus model (not sonnet) for prompt injection resistance on untrusted
  web pages. Remove Write from allowedTools (agent only needs Bash for $B).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: agent conciseness, focus stealing, opus model, switchTab opts

Tests for the three UX fixes:
- System prompt contains STOP/CONCISE/Do NOT keep exploring
- sidebar agent uses opus (not sonnet) for prompt injection resistance
- switchTab has bringToFront option, defaults to true (opt-out)
- handleCommand tab pinning uses bringToFront: false (no focus steal)
- Updated stale tests: switchTab signature, allowedTools excludes Write,
  narration -> conciseness, tab pinning restore calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: sidebar CSS interaction E2E — HN comment highlight round-trip

New E2E test (periodic tier, ~$2/run) that exercises the full sidebar
agent pipeline with CSS interaction:
1. Agent navigates to Hacker News
2. Clicks into the top story's comments
3. Reads comments and identifies the most insightful one
4. Highlights it with a 4px solid orange outline via style injection

Tests: navigation, snapshot, text reading, LLM judgment, CSS modification.
Requires real browser + real Claude (ANTHROPIC_API_KEY).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio

Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not
seconds. '600' = 0.6 seconds, server died immediately after health check.
Fixed to '600000' (10 minutes).

Also: use 'pipe' stdio instead of file descriptors (closing fds kills child
on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout
for the multi-step opus task.

Test passes: agent navigates to HN, reads comments, identifies most
insightful one, highlights it with orange CSS, stops. 114s, $0.00.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7911b7b9 — Garry Tan 10 days ago
fix: force comparison board as default variant chooser (v0.14.1.0) (#658)

* fix: force comparison board as default variant chooser

The comparison board ($D compare --serve) was being skipped in favor of
showing variants inline + AskUserQuestion "which do you prefer?" — a
degraded experience missing rating controls, comments, and remix buttons.

Changes:
- Replace "show inline" instruction with "do NOT show inline, proceed to
  comparison board" in plan-design-review/SKILL.md.tmpl
- Add CRITICAL RULE: never use AskUserQuestion as the variant chooser
- Change DESIGN_SHOTGUN_LOOP resolver to AskUserQuestion-first wait with
  polling fallback (affects all 3 consumer skills)
- Fix board URL from /design-board.html (404) to / (correct)
- Improve serve-failure fallback to show variants inline via Read tool

* chore: bump version and changelog (v0.14.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
b2b380bf — Garry Tan 10 days ago
docs: update README and skill deep dives for all 31 skills (#656)

* docs: update README with /design-html and /learn skills, sync all skill lists

- Added /design-html to sprint table (Pretext-native HTML from approved mockups)
- Added /learn to sprint table (project learnings management)
- Synced all 5 skill list locations (install step 1, step 2, troubleshooting,
  sprint table, power tools) to include all 31 skills
- Updated intro count from 20 to 23 specialists
- Updated Codex section skill count from 29 to 31
- Expanded "Design is at the heart" paragraph with full pipeline:
  consultation → shotgun → design-html → review

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add 9 missing skills to deep dives doc

Added table entries + full deep dive sections for:
- /design-shotgun (design exploration with comparison board)
- /design-html (Pretext-native HTML from approved mockups)
- /land-and-deploy (merge + deploy + canary verification)
- /canary (post-deploy monitoring loop)
- /benchmark (performance regression detection)
- /autoplan (auto-review pipeline)
- /learn (project learnings management)
- /connect-chrome (headed Chrome with side panel)
- /setup-deploy (one-time deploy configuration)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8151fcd5 — Garry Tan 10 days ago
feat: /design-html skill — Pretext-native HTML from approved mockups (v0.14.0.0) (#653)

* feat: /design-html skill — Pretext-native HTML from approved mockups

New skill that takes approved design-shotgun mockups and generates
production-quality HTML with Pretext for computed text layout. Text
reflows on resize, heights adjust to content, zero hardcoded CSS.

Includes vendored Pretext bundle (30KB), smart API routing per design
type, AskUserQuestion refinement loop, framework detection, and
3-viewport verification screenshots.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate /design-html into design skill pipeline

- design-shotgun: Step 6 option B now chains to /design-html
- design-consultation: suggests /design-html after shipping DESIGN.md
  (conditional on screen-level output, not tokens-only)
- plan-design-review: expanded chaining to include /design-shotgun
  and /design-html alongside review skills

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: update plan-design-review chaining test for design skills

plan-design-review now chains to /design-shotgun and /design-html
in addition to review skills. Update the assertion to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add gstack keyword to design-html description for validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.14.0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
403637f0 — Garry Tan 10 days ago
feat: rotating founder resources in /office-hours closing (v0.13.10.0) (#652)

* feat: rotating founder resources in /office-hours closing

Add Beat 3.5 with 34 curated resources (5 Garry Tan videos, 2 YC Backstory,
9 Lightcone Podcast, 8 Startup School, 10 PG essays) that rotate contextually
each session. Includes dedup log to avoid repeats, analytics logging, and
browser-open offers. Also adds chmod +x safety net to build script.

* chore: bump version and changelog (v0.13.10.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
66c09644 — Garry Tan 10 days ago
feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)

* feat: add parameterized resolver support to gen-skill-docs

Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.

- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add INVOKE_SKILL resolver for composable skill loading

New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.

Usage: {{INVOKE_SKILL:plan-ceo-review}} or
       {{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: use frontmatter name: for skill symlinks and Codex paths

Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).

- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract CHANGELOG_WORKFLOW resolver from /ship

Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback

Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL

Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args

12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
  BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
  no unresolved placeholders across all generated SKILL.md files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.7.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions

Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.

Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
   instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
   ship descriptions — reinforces the routing signal

10/10 journey tests now pass (was 7/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: one-time CLAUDE.md routing injection prompt

Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.

Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.

- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: annotated config file + routing injection tests

gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.

Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.

Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
  safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
  config reads, AskUserQuestion, decline persistence, routing rules)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases

Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify branch-scoped VERSION/CHANGELOG after merging main

Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: put our 0.13.9.0 entry on top of CHANGELOG

Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore missing 0.13.7.0 Community Wave entry

Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CHANGELOG integrity check rule

After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3cda8dee — Garry Tan 10 days ago
fix: security audit round 2 (v0.13.4.0) (#640)

* fix: chrome-cdp localhost-only binding

Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1
and --remote-allow-origins to prevent network-accessible debugging sessions.

Clears 1 Socket anomaly (Chrome CDP session exposure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: extension sender validation + message type allowlist

Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's
message handler. Defense-in-depth against message spoofing from external
extensions or future externally_connectable changes.

Clears 2 Socket anomalies (extension permissions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: checksum-verified bun install

Replace unverified curl|bash bun installation with checksum-verified
download-then-execute pattern. The install script is downloaded, sha256
verified against a known hash, then executed. Preserves the Bun-native
install path without adding a Node/npm dependency.

Clears Snyk W012 + 3 Socket anomalies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: content trust boundary markers in browse output

Wrap page-content commands (text, html, links, forms, accessibility,
console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT ---
markers. Covers direct commands (server.ts), chain sub-commands, and
snapshot output (meta-commands.ts).

Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in
commands.ts (single source of truth, DRY). Expands the SKILL.md trust
warning with explicit processing rules for agents.

Clears Snyk W011 (third-party content exposure).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden trust boundary markers against escape attacks

- Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent
  marker injection via history.pushState
- Escape marker strings in content (zero-width space) so malicious pages
  can't forge the END marker to break out of the untrusted block
- Wrap resume command snapshot with trust boundary markers
- Wrap diff command output with trust boundary markers
- Wrap watch stop last snapshot with trust boundary markers

Found by cross-model adversarial review (Claude + Codex).

* chore: bump version and changelog (v0.13.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: gitignore .factory/ and remove from tracking

Factory Droid support was removed in this branch. The .factory/ directory
was re-added by merging main (which had v0.13.5.0 Factory support).
Gitignore it so it stays out.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cdd6f786 — Garry Tan 10 days ago
feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641)

* test: add 16 failing tests for 6 community fixes

Tests-first for all fixes in this PR wave:
- #594 discoverability: gstack tag in descriptions, 120-char first line
- #573 feature signals: ship/SKILL.md Step 4 detection
- #510 context warnings: no preemptive warnings in generated files
- #474 Safety Net: no find -delete in generated files
- #467 telemetry: JSONL writes gated by _TEL conditional
- #584 sidebar: Write in allowedTools, stderr capture
- #578 relink: prefixed/flat symlinks, cleanup, error, config hook

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: replace find -delete with find -exec rm for Safety Net (#474)

-delete is a non-POSIX extension that fails on Safety Net environments.
-exec rm {} + is POSIX-compliant and works everywhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gate local JSONL writes by telemetry setting (#467)

When telemetry is off, nothing is written anywhere — not just remote,
but local JSONL too. Clean trust contract: off means off everywhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove preemptive context warnings from plan-eng-review (#510)

The system handles context compaction automatically. Preemptive warnings
waste tokens and create false urgency. Skills should not warn about
context limits — just describe the compression priority order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add (gstack) tag to skill descriptions for discoverability (#594)

Every SKILL.md.tmpl description now contains "gstack" on the last line,
making skills findable in Claude Code's command palette. First-line hooks
stay under 120 chars. Split ship description to fix wrapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: auto-relink skill symlinks on prefix config change (#578)

New bin/gstack-relink creates prefixed (gstack-*) or flat symlinks
based on skill_prefix config. gstack-config auto-triggers relink
when skill_prefix changes. Setup guards against recursive calls
with GSTACK_SETUP_RUNNING env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add feature signal detection to version bump heuristic (#573)

/ship Step 4 now checks for feature signals (new routes, migrations,
test+source pairs, feat/ branches) when deciding version bumps.
PATCH requires no feature signals. MINOR asks the user if any signal
is detected or 500+ lines changed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sidebar Write tool, stderr capture, cross-platform URL opener (#584)

Add Write to sidebar allowedTools (both sidebar-agent.ts and server.ts).
Write doesn't expand attack surface beyond what Bash already provides.
Replace empty stderr handler with buffer capture for better error
diagnostics. New bin/gstack-open-url for cross-platform URL opening.

Does NOT include Search Before Building intro flow (deferred).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update sidebar-security test for Write tool addition

The fallback allowedTools string now includes Write, matching the
sidebar-agent.ts change from commit 68dc957.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.5.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: prevent gstack-relink from double-prefixing gstack-upgrade

gstack-relink now checks if a skill directory is already named gstack-*
before prepending the prefix. Previously, setting skill_prefix=true would
create gstack-gstack-upgrade, breaking the /gstack-upgrade command.

Matches setup script behavior (setup:260) which already has this guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add double-prefix fix to changelog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove .factory/ from git tracking and add to .gitignore

Generated Factory Droid skills are build output, same as .agents/.
They should not be committed to the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ae0a9ad1 — Garry Tan 10 days ago
feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622)

* feat: learnings + confidence resolvers — cross-skill memory infrastructure

Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings bin scripts — append-only JSONL read/write

gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).

gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings count in preamble output

Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate learnings + confidence into 9 skill templates

Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /learn skill — manage project learnings

New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: self-learning roadmap — 5-release design doc

Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings bin script unit tests — 13 tests, free

Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings resolver + bin script edge case tests — 21 new tests, free

Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync package.json version with VERSION file (0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: gitignore .factory/ — generated output, not source

Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: /learn E2E — seed 3 learnings, verify agent surfaces them

Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
66894601 — Garry Tan 11 days ago
chore: gitignore .factory and remove tracked files (v0.13.5.1) (#642)

* chore: gitignore .factory and remove tracked files

The .factory/ directory contains generated skill definitions for
Factory Droid compatibility. These should not be tracked in git,
same as .claude/skills/ and .agents/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.5.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
484cf1fb — Garry Tan 11 days ago
feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621)

* refactor: extract processExternalHost() shared helper for multi-host generation

Refactor the Codex-specific output routing block in gen-skill-docs.ts into
a shared processExternalHost() function. Both Codex and future external hosts
(Factory Droid) will use this helper for output routing, symlink loop detection,
frontmatter transformation, path rewrites, and metadata generation.

- Rename codexSkillName() to externalSkillName() everywhere
- Extract ExternalHostConfig interface with per-host settings
- Codex output is byte-identical (verified via --dry-run)
- Skip /codex skill for all non-Claude hosts (not just codex)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Factory Droid host type, preamble, and co-author trailer

- Add 'factory' to Host union type with .factory/skills/gstack paths
- Extend preamble runtime root detection for Factory ($HOME/.factory/)
- Add GSTACK_DESIGN env var to preamble (was missing for Codex too)
- Add Factory Droid co-author trailer for git commits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid generation, --host all, and host-aware frontmatter

- Add --host factory (alias: --host droid) to gen-skill-docs
- Add --host all: generates for claude, codex, and factory in one invocation
  with fault-tolerant per-host error handling (only fails if claude fails)
- Factory frontmatter: name + description + user-invocable: true
- Factory sensitive skills: disable-model-invocation: true (from sensitive: field)
- Claude: strips sensitive: field from output (only Factory uses it)
- Factory tool name translation: Claude tool names → generic phrasing
- Replace chained gen:skill-docs calls with --host all in package.json build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sensitive frontmatter for Factory Droid auto-invocation safety

Add sensitive: true to 6 skill templates with side effects that Factory
Droids shouldn't auto-invoke (ship, land-and-deploy, guard, careful,
freeze, unfreeze). The field is:
- Factory: emitted as disable-model-invocation: true
- Claude/Codex: stripped from output by transformFrontmatter()

Also fix Claude host path: call transformFrontmatter() for Claude to
strip the sensitive: field from Claude output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack-platform-detect binary for multi-host debugging

Bash script that prints a table of installed AI coding agents (Claude,
Codex, Factory Droid, Kiro) with versions, skill paths, and gstack
installation status. Useful for debugging multi-host setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid support in setup script

- Add factory to --host values (auto-detected via command -v droid)
- Add .factory/ skill doc generation step alongside .agents/
- Add create_factory_runtime_root() and link_factory_skill_dirs()
  helpers mirroring the Codex equivalents
- Factory install section creates ~/.factory/skills/ with symlinks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid awareness in skill-check and uninstall

- skill-check.ts: add Factory skills validation and freshness check
- gstack-uninstall: add Factory artifact cleanup (~/.factory/skills/gstack*
  and per-project .factory/ sidecar)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Factory Droid generation + --host all test suites

Add 13 new tests:
- Factory output paths, frontmatter (user-invocable, disable-model-invocation)
- Sensitive vs non-sensitive skill classification
- Path rewrites (no .claude/skills/ in Factory output)
- /codex skill exclusion, openai.yaml absence
- Factory keeps Codex integration blocks (for second opinions)
- --host droid alias, --dry-run freshness, preamble paths
- --host all generates for all 3 hosts
- Setup script host validation updated for factory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: Factory Droid install instructions + CI freshness check

- README: add Factory Droid section with install instructions and
  restart note (Factory requires restart to rescan skills)
- CI: add Factory skill doc freshness verification to skill-docs.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: generated Factory Droid skill output (.factory/skills/)

29 skills generated for Factory Droid with:
- user-invocable: true on all skills
- disable-model-invocation: true on 6 sensitive skills
- .factory/skills/ paths (no .claude/skills/ references)
- $GSTACK_ROOT env vars for runtime root detection
- Tool name translation (Claude tool names → generic phrasing)

Committed to git for CI freshness checks and direct consumption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add Factory Droid P1 TODO for browse MCP server

Add 3 TODOs under new ## Factory Droid section:
- P1: Browse MCP server (Option B, deeper Factory integration)
- P3: .agent/skills/ dual output for cross-agent compatibility
- P3: Custom Droid definitions alongside skills

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ea7dbc9a — Garry Tan 11 days ago
fix: sidebar prompt injection defense (v0.13.4.0) (#611)

* fix: sidebar prompt injection defense — XML framing, command allowlist, arg plumbing

Three security fixes for the Chrome sidebar:

1. XML-framed prompts with trust boundaries and escape of < > & in user
   messages to prevent tag injection attacks.

2. Bash command allowlist in system prompt — only browse binary commands
   ($B goto, $B click, etc.) allowed. All other bash commands forbidden.

3. Fix sidebar-agent.ts ignoring queued args — server-side --model and
   --allowedTools changes were silently dropped because the agent rebuilt
   args from scratch instead of using the queue entry.

Also defaults sidebar to Opus (harder to manipulate).

12 new tests covering XML escaping, command allowlist, Opus default,
trust boundary instructions, and arg plumbing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.4.0)

ML prompt injection defense design doc + P0 TODO for follow-up PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clear stale worktree and claude session on sidebar reconnect

loadSession() was restoring worktreePath and claudeSessionId from prior
crashes. The worktree directory no longer existed (deleted on cleanup)
and --resume with a dead session ID caused claude to fail silently.

Now validates worktree exists on load and clears stale claude session
IDs on every server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cd66fc2f — Garry Tan 12 days ago
fix: 6 critical fixes + community PR guardrails (v0.13.2.0) (#602)

* fix(security): commit bun.lock to pin dependency versions

Remove bun.lock from .gitignore and commit the lockfile. Every bun install
now uses exact pinned versions instead of resolving floating ^ ranges from
npm fresh. Closes the supply-chain vector from #566.

Co-Authored-By: boinger <boinger@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-slug falls back to dirname/unknown when git context is absent

Add || true to git commands and fallback defaults so gstack-slug works
outside git repos. Prevents unbound variable crash that kills every
review skill when no git context exists.

Co-Authored-By: collinstraka-clov <collinstraka-clov@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: setup auto-selects default after 10s timeout to prevent CI hangs

Add -t 10 to the read command in the skill-prefix prompt. In CI, Docker,
and Conductor workspaces where a TTY exists but nobody is watching, the
prompt now auto-selects short names after 10 seconds instead of blocking
forever.

Co-Authored-By: stedfn <stedfn@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: browse CLI Windows lockfile — use string flag instead of numeric constants

Bun compiled binaries on Windows don't handle numeric fs.constants
correctly. The string flag 'wx' is semantically identical to
O_CREAT | O_EXCL | O_WRONLY per Node docs and works on all platforms.

Fixes #599

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add ~/.gstack/projects/ to plan file search path

/office-hours writes design docs to ~/.gstack/projects/$SLUG/ but /ship
and /review only searched ~/.claude/plans, ~/.codex/plans, and .gstack/plans.
Add the project-scoped directory as the first search location so plan
validation finds design docs created by the standard workflow.

Fixes #591

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: autoplan dual-voice — sequential foreground execution instead of broken parallel

Background subagents don't inherit tool permissions in Claude Code, so the
Claude subagent in dual-voice mode was silently failing on every invocation.
Every autoplan run was degrading to single-reviewer mode without warning.

Change all three phases (CEO, Design, Eng) from "simultaneously" to
sequential foreground execution: Claude subagent first (Agent tool,
foreground), then Codex (Bash). Both complete before the consensus table.

Fixes #497

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files from updated templates

Regenerated from autoplan/SKILL.md.tmpl (dual-voice fix) and
scripts/resolvers/review.ts (plan search path fix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add community PR guardrails — protect ETHOS.md and voice

Add explicit CLAUDE.md rule requiring AskUserQuestion before accepting
any community PR that touches ETHOS.md, removes promotional material,
or changes Garry's voice. No exceptions, no auto-merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.2.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gen-skill-docs detects symlink loop, skips codex write that overwrites Claude SKILL.md

When .agents/skills/gstack is symlinked to the repo root (vendored dev mode),
gen-skill-docs --host codex was writing the Codex-transformed SKILL.md through
the symlink, overwriting the Claude version. This caused SKILL.md and
agents/openai.yaml to silently revert to Codex paths after every build.

Now detects when the codex output path resolves to the same real file as the
Claude output and skips the write. Content is still generated for token budget
tracking. The openai.yaml write is also skipped for the same symlink case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve all 7 test failures — version sync, zsh glob guard, symlink-aware codex tests

1. package.json version synced with VERSION file (0.13.3.0)
2. design-shotgun/SKILL.md.tmpl: added setopt +o nomatch guard to
   bash block with variant-*.png glob
3. Codex generation tests: skip skills where .agents/skills/{name}
   is a symlink back to repo root (vendored dev mode). These can't
   have proper codex content since gen-skill-docs skips the write
   to avoid overwriting the Claude SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: boinger <boinger@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: collinstraka-clov <collinstraka-clov@users.noreply.github.com>
Co-authored-by: stedfn <stedfn@users.noreply.github.com>
247fc3ba — Garry Tan 12 days ago
feat: user sovereignty — AI models recommend, users decide (v0.13.2.0) (#603)

* feat: user sovereignty — AI models recommend, users decide

When Claude and Codex agree on a scope change, they now present it to the
user instead of auto-incorporating it. Adds User Sovereignty as the third
core principle in ETHOS.md. Fixes the cross-model tension template in
review.ts to present both perspectives neutrally instead of judging. Adds
User Challenge category to autoplan with proper contract updates (intro,
important rules, audit trail, gate handling). Adds Outside Voice Integration
Rule to CEO and eng review templates.

* chore: regenerate SKILL.md files from updated templates

* chore: bump version and changelog (v0.13.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: proper gstack description in openai.yaml + block Codex from rewriting it

Codex kept overwriting agents/openai.yaml with a browse-only description.
Two fixes: (1) better description covering full PM/dev/eng/CEO/QA scope,
(2) add agents/ to the filesystem boundary so Codex stops modifying it.

* chore: regenerate SKILL.md files with updated filesystem boundary

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7450b516 — Garry Tan 12 days ago
fix: security audit remediation — 12 fixes, 20 tests (v0.13.1.0) (#595)

* fix: remove auth token from /health, secure extension bootstrap (CRITICAL-02 + HIGH-03)

- Remove token from /health response (was leaked to any localhost process)
- Write .auth.json to extension dir for Manifest V3 bootstrap
- sidebar-agent reads token from state file via BROWSE_STATE_FILE env var
- Remove getToken handler from extension (token via health broadcast)
- Extension loads token before first health poll to prevent race condition

* fix: require auth on cookie-picker data routes (CRITICAL-01)

- Add Bearer token auth gate on all /cookie-picker/* data/action routes
- GET /cookie-picker HTML page stays unauthenticated (UI shell)
- Token embedded in served HTML for picker's fetch calls
- CORS preflight now allows Authorization header

* fix: add state file TTL and plaintext cookie warning (HIGH-02)

- Add savedAt timestamp to state save output
- Warn on load if state file older than 7 days
- Auto-delete stale state files (>7 days) on server startup
- Warning about plaintext cookie storage in save message

* fix: innerHTML XSS in extension content script and sidepanel (MEDIUM-01)

- content.js: replace innerHTML with createElement/textContent for ref panel
- sidepanel.js: escape entry.command with escapeHtml() in activity feed
- Both found by security audit + Codex adversarial red team

* fix: symlink bypass in validateReadPath (MEDIUM-02)

- Always resolve to absolute path first (fixes relative path bypass)
- Use realpathSync to follow symlinks before boundary check
- Throw on non-ENOENT realpathSync failures (explicit over silent)
- Resolve SAFE_DIRECTORIES through realpathSync (macOS /tmp → /private/tmp)
- Resolve directory part for non-existent files (ENOENT with symlinked parent)

* fix: freeze hook symlink bypass and prefix collision (MEDIUM-03)

- Add POSIX-portable path resolution (cd + pwd -P, works on macOS)
- Fix prefix collision: /project-evil no longer matches /project freeze dir
- Use trailing slash in boundary check to require directory boundary

* fix: shell script injection in gstack-config and telemetry (MEDIUM-04)

- gstack-config: validate keys (alphanumeric+underscore only)
- gstack-config: use grep -F (fixed string) instead of -E (regex)
- gstack-config: escape sed special chars in values, drop newlines
- gstack-telemetry-log: sanitize REPO_SLUG and BRANCH via json_safe()

* test: 20 security tests for audit remediation

- server-auth: verify token removed from /health, auth on /refs, /activity/*
- cookie-picker: auth required on data routes, HTML page unauthenticated
- path-validation: symlink bypass blocked, realpathSync failure throws
- gstack-config: regex key rejected, sed special chars preserved
- state-ttl: savedAt timestamp, 7-day TTL warning
- telemetry: branch/repo with quotes don't corrupt JSON
- adversarial: sidepanel escapes entry.command, freeze prefix collision

* chore: bump version and changelog (v0.13.1.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: tone down changelog — defense in depth, not catastrophic bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
78bc1d19 — Garry Tan 12 days ago
feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0) (#551)

* docs: design tools v1 plan — visual mockup generation for gstack skills

Full design doc covering the `design` binary that wraps OpenAI's GPT Image API
to generate real UI mockups from gstack's design skills. Includes comparison
board UX spec, auth model, 6 CEO expansions (design memory, mockup diffing,
screenshot evolution, design intent verification, responsive variants,
design-to-code prompt), and 9-commit implementation plan.

Reviewed: /office-hours + /plan-eng-review (CLEARED) + /plan-ceo-review
(EXPANSION, 6/6 accepted) + /plan-design-review (2/10 → 8/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design tools prototype validation — GPT Image API works

Prototype script sends 3 design briefs to OpenAI Responses API with
image_generation tool. Results: dashboard (47s, 2.1MB), landing page
(42s, 1.3MB), settings page (37s, 1.3MB) all produce real, implementable
UI mockups with accurate text rendering and clean layouts.

Key finding: Codex OAuth tokens lack image generation scopes. Direct
API key (sk-proj-*) required, stored in ~/.gstack/openai.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design binary core — generate, check, compare commands

Stateless CLI (design/dist/design) wrapping OpenAI Responses API for
UI mockup generation. Three working commands:

- generate: brief -> PNG mockup via gpt-4o + image_generation tool
- check: vision-based quality gate via GPT-4o (text readability, layout
  completeness, visual coherence)
- compare: generates self-contained HTML comparison board with star
  ratings, radio Pick, per-variant feedback, regenerate controls,
  and Submit button that writes structured JSON for agent polling

Auth reads from ~/.gstack/openai.json (0600), falls back to
OPENAI_API_KEY env var. Compiled separately from browse binary
(openai added to devDependencies, not runtime deps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design binary variants + iterate commands

variants: generates N style variations with staggered parallel (1.5s
between launches, exponential backoff on 429). 7 built-in style
variations (bold, calm, warm, corporate, dark, playful + default).
Tested: 3/3 variants in 41.6s.

iterate: multi-turn design iteration using previous_response_id for
conversational threading. Falls back to re-generation with accumulated
feedback if threading doesn't retain visual context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DESIGN_SETUP + DESIGN_MOCKUP template resolvers

Add generateDesignSetup() and generateDesignMockup() to the existing
design.ts resolver file. Add designDir to HostPaths (claude + codex).
Register DESIGN_SETUP and DESIGN_MOCKUP in the resolver index.

DESIGN_SETUP: $D binary discovery (mirrors $B browse setup pattern).
Falls back to DESIGN_SKETCH if binary not available.

DESIGN_MOCKUP: full visual exploration workflow template — construct
brief from DESIGN.md context, generate 3 variants, open comparison
board in Chrome, poll for user feedback, save approved mockup to
docs/designs/, generate HTML wireframe for implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync package.json version with VERSION file (0.12.2.0)

Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was
0.12.0.0. Also adds design binary to build script and dev:design
convenience command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /office-hours visual design exploration integration

Add {{DESIGN_MOCKUP}} to office-hours template before the existing
{{DESIGN_SKETCH}}. When the design binary is available, /office-hours
generates 3 visual mockup variants, opens a comparison board in Chrome,
and polls for user feedback. Falls back to HTML wireframes if the
design binary isn't built.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /plan-design-review visual mockup integration

Add {{DESIGN_SETUP}} to pre-review audit and "show me what 10/10
looks like" mockup generation to the 0-10 rating method. When a
design dimension rates below 7/10, the review can generate a mockup
showing the improved version. Falls back to text descriptions if
the design binary isn't available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design memory — extract visual language from mockups into DESIGN.md

New `$D extract` command: sends approved mockup to GPT-4o vision,
extracts color palette, typography, spacing, and layout patterns,
writes/updates DESIGN.md with an "Extracted Design Language" section.

Progressive constraint: if DESIGN.md exists, future mockup briefs
include it as style context. If no DESIGN.md, explorations run wide.
readDesignConstraints() reads existing DESIGN.md for brief construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mockup diffing + design intent verification

New commands:
- $D diff --before old.png --after new.png: visual diff using GPT-4o
  vision. Returns differences by area with severity (high/medium/low)
  and a matchScore (0-100).
- $D verify --mockup approved.png --screenshot live.png: compares live
  site screenshot against approved design mockup. Pass if matchScore
  >= 70 and no high-severity differences.

Used by /design-review to close the design loop: design -> implement ->
verify visually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: screenshot-to-mockup evolution ($D evolve)

New command: $D evolve --screenshot current.png --brief "make it calmer"

Two-step process: first analyzes the screenshot via GPT-4o vision to
produce a detailed description, then generates a new mockup that keeps
the existing layout structure but applies the requested changes. Starts
from reality, not blank canvas.

Bridges the gap between /design-review critique ("the spacing is off")
and a visual proposal of the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: responsive variants + design-to-code prompt

Responsive variants: $D variants --viewports desktop,tablet,mobile
generates mockups at 1536x1024, 1024x1024, and 1024x1536 (portrait)
with viewport-appropriate layout instructions.

Design-to-code prompt: $D prompt --image approved.png extracts colors,
typography, layout, and components via GPT-4o vision, producing a
structured implementation prompt. Reads DESIGN.md for additional
constraint context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.0.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer as first-class tool in /plan-design-review

Brand the gstack designer prominently, add Step 0.5 for proactive visual
mockup generation before review passes, and update priority hierarchy.
When a plan describes new UI, the skill now offers to generate mockups
with $D variants, run $D check for quality gating, and present a
comparison board via $B goto before any review passes begin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate mockups into review passes and outputs

Thread Step 0.5 mockups through the review workflow: Pass 4 (AI Slop)
evaluates generated mockups visually, Pass 7 uses mockups as evidence
for unresolved decisions, post-pass offers one-shot regeneration after
design changes, and Approved Mockups section records chosen variants
with paths for the implementer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer target mockups in /design-review fix loop

Add $D generate for target mockups in Phase 8a.5 — before fixing a
design finding, generate a mockup showing what it should look like.
Add $D verify in Phase 9 to compare fix results against targets.
Not plan mode — goes straight to implementation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack designer AI mockups in /design-consultation Phase 5

Replace HTML preview with $D variants + comparison board when designer
is available (Path A). Use $D extract to derive DESIGN.md tokens from
the approved mockup. Handles both plan mode (write to plan) and
non-plan mode (implement immediately). Falls back to HTML preview
(Path B) when designer binary is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make gstack designer the default in /plan-design-review, not optional

The transcript showed the agent writing 5 text descriptions of homepage
variants instead of generating visual mockups, even when the user explicitly
asked for design tools. The skill treated mockups as optional ("Want me to
generate?") when they should be the default behavior.

Changes:
- Rename "Your Visual Design Tool" to "YOUR PRIMARY TOOL" with aggressive
  language: "Don't ask permission. Show it."
- Step 0.5 now generates mockups automatically when DESIGN_READY, no
  AskUserQuestion gatekeeping the default path
- Priority hierarchy: mockups are "non-negotiable" not "if available"
- Step 0D tells the user mockups are coming next
- DESIGN_NOT_AVAILABLE fallback now tells user what they're missing

The only valid reasons to skip mockups: no UI scope, or designer not
installed. Everything else generates by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: persist design mockups to ~/.gstack/projects/$SLUG/designs/

Mockups were going to .context/mockups/ (gitignored, workspace-local).
This meant designs disappeared when switching workspaces or conversations,
and downstream skills couldn't reference approved mockups from earlier
reviews.

Now all three design skills save to persistent project-scoped dirs:
- /plan-design-review: ~/.gstack/projects/$SLUG/designs/<screen>-<date>/
- /design-consultation: ~/.gstack/projects/$SLUG/designs/design-system-<date>/
- /design-review: ~/.gstack/projects/$SLUG/designs/design-audit-<date>/

Each directory gets an approved.json recording the user's pick, feedback,
and branch. This lets /design-review verify against mockups that
/plan-design-review approved, and design history is browsable via
ls ~/.gstack/projects/$SLUG/designs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate codex ship skill with zsh glob guards

Picked up setopt +o nomatch guards from main's v0.12.8.1 merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add browse binary discovery to DESIGN_SETUP resolver

The design setup block now discovers $B alongside $D, so skills can
open comparison boards via $B goto and poll feedback via $B eval.
Falls back to `open` on macOS when browse binary is unavailable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: comparison board DOM polling in plan-design-review

After opening the comparison board, the agent now polls
#status via $B eval instead of asking a rigid AskUserQuestion.
Handles submit (read structured JSON feedback), regenerate
(new variants with updated brief), and $B-unavailable fallback
(free-form text response). The user interacts with the real
board UI, not a constrained option picker.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comparison board feedback loop integration test

16 tests covering the full DOM polling cycle: structure verification,
submit with pick/rating/comment, regenerate flows (totally different,
more like this, custom text), and the agent polling pattern
(empty → submitted → read JSON). Uses real generateCompareHtml()
from design/src/compare.ts, served via HTTP. Runs in <1s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add $D serve command for HTTP-based comparison board feedback

The comparison board feedback loop was fundamentally broken: browse blocks
file:// URLs (url-validation.ts:71), so $B goto file://board.html always
fails. The fallback open + $B eval polls a different browser instance.

$D serve fixes this by serving the board over HTTP on localhost. The server
is stateful: stays alive across regeneration rounds, exposes /api/progress
for the board to poll, and accepts /api/reload from the agent to swap in
new board HTML. Stdout carries feedback JSON only; stderr carries telemetry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: dual-mode feedback + post-submit lifecycle in comparison board

When __GSTACK_SERVER_URL is set (injected by $D serve), the board POSTs
feedback to the server instead of only writing to hidden DOM elements.
After submit: disables all inputs, shows "Return to your coding agent."
After regenerate: shows spinner, polls /api/progress, auto-refreshes on
ready. On POST failure: shows copyable JSON fallback. On progress timeout
(5 min): shows error with /design-shotgun prompt. DOM fallback preserved
for headed browser mode and tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: HTTP serve command endpoints and regeneration lifecycle

11 tests covering: HTML serving with injected server URL, /api/progress
state reporting, submit → done lifecycle, regenerate → regenerating state,
remix with remixSpec, malformed JSON rejection, /api/reload HTML swapping,
missing file validation, and full regenerate → reload → submit round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add DESIGN_SHOTGUN_LOOP resolver + fix design artifact paths

Adds generateDesignShotgunLoop() resolver for the shared comparison board
feedback loop (serve via HTTP, handle regenerate/remix, AskUserQuestion
fallback, feedback confirmation). Registered as {{DESIGN_SHOTGUN_LOOP}}.

Fixes generateDesignMockup() to use ~/.gstack/projects/$SLUG/designs/
instead of /tmp/ and docs/designs/. Replaces broken $B goto file:// +
$B eval polling with $D compare --serve (HTTP-based, stdout feedback).

Adds CRITICAL PATH RULE guardrail to DESIGN_SETUP: design artifacts must
go to ~/.gstack/projects/$SLUG/designs/, never .context/ or /tmp/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add /design-shotgun standalone design exploration skill

New skill for visual brainstorming: generate AI design variants, open a
comparison board in the user's browser, collect structured feedback, and
iterate. Features: session detection (revisit prior explorations), 5-dimension
context gathering (who, job to be done, what exists, user flow, edge cases),
taste memory (prior approved designs bias new generations), inline variant
preview, configurable variant count, screenshot-to-variants via $D evolve.

Uses {{DESIGN_SHOTGUN_LOOP}} resolver for the feedback loop. Saves all
artifacts to ~/.gstack/projects/$SLUG/designs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for design-shotgun + resolver changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add remix UI to comparison board

Per-variant element selectors (Layout, Colors, Typography, Spacing) with
radio buttons in a grid. Remix button collects selections into a remixSpec
object and sends via the same HTTP POST feedback mechanism. Enabled only
when at least one element is selected. Board shows regenerating spinner
while agent generates the hybrid variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add $D gallery command for design history timeline

Generates a self-contained HTML page showing all prior design explorations
for a project: every variant (approved or not), feedback notes, organized
by date (newest first). Images embedded as base64. Handles corrupted
approved.json gracefully (skips, still shows the session). Empty state
shows "No history yet" with /design-shotgun prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: gallery generation — sessions, dates, corruption, empty state

7 tests: empty dir, nonexistent dir, single session with approved variant,
multiple sessions sorted newest-first, corrupted approved.json handled
gracefully, session without approved.json, self-contained HTML (no
external dependencies).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace broken file:// polling with {{DESIGN_SHOTGUN_LOOP}}

plan-design-review and design-consultation templates previously used
$B goto file:// + $B eval polling for the comparison board feedback loop.
This was broken (browse blocks file:// URLs). Both templates now use
{{DESIGN_SHOTGUN_LOOP}} which serves via HTTP, handles regeneration in
the same browser tab, and falls back to AskUserQuestion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add design-shotgun touchfile entries and tier classifications

design-shotgun-path (gate): verify artifacts go to ~/.gstack/, not .context/
design-shotgun-session (gate): verify repeat-run detection + AskUserQuestion
design-shotgun-full (periodic): full round-trip with real design binary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for template refactor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: comparison board UI improvements — option headers, pick confirmation, grid view

Three changes to the design comparison board:

1. Pick confirmation: selecting "Pick" on Option A shows "We'll move
   forward with Option A" in green, plus a status line above the submit
   button repeating the choice.

2. Clear option headers: each variant now has "Option A" in bold with a
   subtitle above the image, instead of just the raw image.

3. View toggle: top-right Large/Grid buttons switch between single-column
   (default) and 3-across grid view.

Also restructured the bottom section into a 2-column grid: submit/overall
feedback on the left, regenerate controls on the right.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use 127.0.0.1 instead of localhost for serve URL

Avoids DNS resolution issues on some systems where localhost may resolve
to IPv6 ::1 while Bun listens on IPv4 only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: write ALL feedback to disk so agent can poll in background mode

The agent backgrounds $D serve (Claude Code can't block on a subprocess
and do other work simultaneously). With stdout-only feedback delivery,
the agent never sees regenerate/remix feedback.

Fix: write feedback-pending.json (regenerate/remix) and feedback.json
(submit) to disk next to the board HTML. Agent polls the filesystem
instead of reading stdout. Both channels (stdout + disk) are always
active so foreground mode still works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: DESIGN_SHOTGUN_LOOP uses file polling instead of stdout reading

Update the template resolver to instruct the agent to background $D serve
and poll for feedback-pending.json / feedback.json on a 5-second loop.
This matches the real-world pattern where Claude Code / Conductor agents
can't block on subprocess stdout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for file-polling feedback loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: null-safe DOM selectors for post-submit and regenerating states

The user's layout restructure renamed .regenerate-bar → .regen-column,
.submit-bar → .submit-column, and .overall-section → .bottom-section.
The JS still referenced the old class names, causing querySelector to
return null and showPostSubmitState() / showRegeneratingState() to
silently crash. This meant Submit and Regenerate buttons appeared to
work (DOM elements updated, HTTP POST succeeded) but the visual
feedback (disabled inputs, spinner, success message) never appeared.

Fix: use fallback selectors that check both old and new class names,
with null guards so a missing element doesn't crash the function.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: end-to-end feedback roundtrip — browser click to file on disk

The test that proves "changes on the website propagate to Claude Code."
Opens the comparison board in a real headless browser with __GSTACK_SERVER_URL
injected, simulates user clicks (Submit, Regenerate, More Like This), and
verifies that feedback.json / feedback-pending.json land on disk with the
correct structured data.

6 tests covering: submit → feedback.json, post-submit UI lockdown,
regenerate → feedback-pending.json, more-like-this → feedback-pending.json,
regenerate spinner display, and full regen → reload → submit round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: comprehensive design doc for Design Shotgun feedback loop

Documents the full browser-to-agent feedback architecture: state machine,
file-based polling, port discovery, post-submit lifecycle, and every known
edge case (zombie forms, dead servers, stale spinners, file:// bug,
double-click races, port coordination, sequential generate rule).

Includes ASCII diagrams of the data flow and state transitions, complete
step-by-step walkthrough of happy path and regeneration path, test coverage
map with gaps, and short/medium/long-term improvement ideas.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: plan-design-review agent guardrails for feedback loop

Four fixes to prevent agents from reinventing the feedback loop badly:

1. Sequential generate rule: explicit instruction that $D generate calls
   must run one at a time (API rate-limits concurrent image generation).
2. No-AskUserQuestion-for-feedback rule: agent reads feedback.json instead
   of re-asking what the user picked.
3. Remove file:// references: $B goto file:// was always rejected by
   url-validation.ts. The --serve flag handles everything.
4. Remove $B eval polling reference: no longer needed with HTTP POST.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: design-shotgun Step 3 progressive reveal, silent failure detection, timing estimate

Three production UX bugs fixed:
1. Dead air — now shows timing estimate before generation starts
2. Silent variant drop — replaced $D variants batch with individual $D generate
   calls, each verified for existence and non-zero size with retry
3. No progressive reveal — each variant shown inline via Read tool immediately
   after generation (~60s increments instead of all at ~180s)

Also: /tmp/ then cp as default output pattern (sandbox workaround),
screenshot taken once for evolve path (not per-variant).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: parallel design-shotgun with concept-first confirmation

Step 3 rewritten to concept-first + parallel Agent architecture:
- 3a: generate text concepts (free, instant)
- 3b: AskUserQuestion to confirm/modify before spending API credits
- 3c: launch N Agent subagents in parallel (~60s total regardless of count)
- 3d: show all results, dynamic image list for comparison board

Adds Agent to allowed-tools. Softens plan-design-review sequential
warning to note design-shotgun uses parallel at Tier 2+.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.13.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: untrack .agents/skills/ — generated at setup, already gitignored

These files were committed despite .agents/ being in .gitignore.
They regenerate from ./setup --host codex on any machine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate design-shotgun SKILL.md for v0.12.12.0 preamble changes

Merge from main brought updated preamble resolver (conditional telemetry,
local JSONL logging) but design-shotgun/SKILL.md wasn't regenerated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11695e3a — Garry Tan 13 days ago
fix: security audit compliance — credentials, telemetry, bun pin, untrusted warning (v0.12.12.0) (#574)

* fix: replace hardcoded credentials with env vars in documentation

Addresses Snyk W007 (HIGH). Replaces test@example.com/password123 with
$TEST_EMAIL/$TEST_PASSWORD env vars. Adds credential safety and cookie
safety notes.

* fix: make telemetry binary calls conditional on _TEL and binary existence

Addresses Socket's 14 MEDIUM findings for opaque telemetry binary.
Adds local JSONL fallback (always available, inspectable). Remote
binary only runs if _TEL != "off" and binary exists.

* fix: pin bun install to v1.3.10 with existence check

Addresses Snyk W012 (MEDIUM). Pins BUN_VERSION in browse.ts resolver,
Dockerfile.ci, and setup script error message. Adds command -v check
to skip install if bun already present.

* docs: add data flow documentation to review.ts

Addresses Socket HIGH finding (98% confidence). Documents what data
is sent to external review services and what is NOT sent.

* test: add audit compliance regression tests

6 tests enforce Snyk/Socket fixes stay in place: no hardcoded creds,
conditional telemetry, version-pinned bun, untrusted content warning,
data flow docs, all SKILL.md telemetry conditional.

* refactor: remove 2017 lines of dead code from gen-skill-docs.ts

The Placeholder Resolvers section (lines 77-2092) contained duplicate
functions that were superseded by scripts/resolvers/*.ts. The RESOLVERS
map from resolvers/index.ts is the sole resolution path. Verified: zero
call sites outside self-references.

* chore: regenerate SKILL.md files from updated templates

Reflects: conditional telemetry, version-pinned bun install,
untrusted content warning after Navigation commands.

* chore: bump version and changelog (v0.12.12.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
43c078f1 — Garry Tan 13 days ago
feat: skill prefix is now a persistent user choice (v0.12.11.0) (#571)

* feat: make skill prefix a persistent, interactive user setting

- Add --prefix flag alongside --no-prefix
- Read/write skill_prefix from ~/.gstack/config.yaml (true/false)
- Interactive prompt on first setup when no preference saved
- Non-TTY environments default to flat names (no prefix)
- Add cleanup_prefixed_claude_symlinks() for reverse direction
- Fix gstack-config sed portability (mktemp+mv instead of BSD sed -i '')
- Add SKILL_PREFIX to preamble output with namespace-aware instruction

* test: add prefix config tests + README switching instructions

8 structural tests for persistent prefix setting:
config reading, --prefix flag, config persistence, interactive
prompt, TTY fallback, reverse cleanup, cleanup ordering, welcome.

* chore: regenerate SKILL.md files with SKILL_PREFIX preamble

* chore: bump version and changelog (v0.12.11.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: reframe changelog as feature, not mea culpa

* docs: update CONTRIBUTING + CLAUDE.md for prefix-aware vendoring

- CONTRIBUTING: vendoring now includes ./setup step for per-skill symlinks
- CONTRIBUTING: prefix choice documented in contributor workflow + dev diagram
- CONTRIBUTING: switching prefix mode section added
- CLAUDE.md: vendored symlink awareness section covers prefix setting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
22ad3e5b — Garry Tan 13 days ago
fix: Codex filesystem boundary — prevent skill-file prompt injection (v0.12.10.0) (#570)

* fix: add filesystem boundary to all codex prompts

Codex CLI can read files outside the repo root despite -s read-only.
It discovers ~/.claude/skills/ and ~/.agents/skills/, treats SKILL.md
files as instructions, and executes preamble scripts instead of
reviewing code. Fix: prepend a boundary instruction to all 11 codex
exec/review callsites across codex/SKILL.md.tmpl (3), autoplan/
SKILL.md.tmpl (3), and scripts/resolvers/review.ts (5). Add rabbit-
hole detection rule and 5 regression tests.

* chore: bump version and changelog (v0.12.10.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Next