feat: /design-html works from any starting point (v0.15.1.0) (#734)
* feat: /design-html works from any starting point — not just design-shotgun
Three routing modes: approved mockup (Case A), CEO plan or design variants
without formal approval (Case B), or clean slate with just a description
(Case C). Each mode asks the right questions via AskUserQuestion instead of
blocking with "no approved design found."
* chore: bump version and changelog (v0.15.1.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733)
* feat: session timeline binaries (gstack-timeline-log + gstack-timeline-read)
New binaries for the Session Intelligence Layer. gstack-timeline-log appends
JSONL events to ~/.gstack/projects/$SLUG/timeline.jsonl. gstack-timeline-read
reads, filters, and formats timeline data for /retro consumption.
Timeline is local-only project intelligence, never sent anywhere. Always-on
regardless of telemetry setting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: preamble context recovery + timeline events + predictive suggestions
Layers 1-3 of the Session Intelligence Layer:
- Timeline start/complete events injected into every skill via preamble
- Context recovery (tier 2+): lists recent CEO plans, checkpoints, reviews
- Cross-session injection: LAST_SESSION and LATEST_CHECKPOINT for branch
- Predictive skill suggestion from recent timeline patterns
- Welcome back message synthesis
- Routing rules for /checkpoint and /health
Timeline writes are NOT gated by telemetry (local project intelligence).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /checkpoint + /health skills (Layers 4-5)
/checkpoint: save/resume/list working state snapshots. Supports cross-branch
listing for Conductor workspace handoff. Session duration tracking.
/health: code quality scorekeeper. Wraps project tools (tsc, biome, knip,
shellcheck, tests), computes composite 0-10 score, tracks trends over time.
Auto-detects tools or reads from CLAUDE.md ## Health Stack.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files + add timeline tests
9 timeline tests (all passing) mirroring learnings.test.ts pattern.
All 34 SKILL.md files regenerated with new preamble (context recovery,
timeline events, routing rules for /checkpoint and /health).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.0.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update self-learning roadmap post-Session Intelligence
R1-R3 marked shipped with actual versions. R4 becomes Adaptive Ceremony
(trust as separate policy engine, scope-aware, gradual degradation). R5
becomes /autoship (resumable state machine, not linear chain). R6-R7
unbundled from old R5. Added State Systems reference, Risk Register
(Codex-reviewed), and validation metrics for R4.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: E2E tests for Session Intelligence (timeline, recovery, checkpoint)
3 gate-tier E2E tests:
- timeline-event-flow: binary data flow round-trip (no LLM)
- context-recovery-artifacts: seeded artifacts appear in preamble
- checkpoint-save-resume: checkpoint file created with YAML frontmatter
Also fixes package.json version sync (0.14.6.0 → 0.15.0.0).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647)
* refactor: remove dead contributor mode, replace with operational self-improvement slot
Contributor mode never fired in 18 days of heavy use (required manual opt-in
via gstack-config, gated behind _CONTRIB=true, wrote disconnected markdown).
Removes: generateContributorMode(), _CONTRIB bash var, 2 E2E tests, touchfile
entry, doc references. Cleans up skip-lists in plan-ceo-review, autoplan,
review resolver, and document-release templates.
The operational self-improvement system (next commit) replaces this slot with
automatic learning capture that requires no opt-in.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: operational self-improvement — every skill learns from failures
Adds universal operational learning capture to the preamble completion protocol.
At the end of every skill session, the agent reflects on CLI failures, wrong
approaches, and project quirks, logging them as type "operational" to the
learnings JSONL. Future sessions surface these automatically.
- generateCompletionStatus(ctx) now includes operational capture section
- Preamble bash shows top 3 learnings inline when count > 5
- New "operational" type in generateLearningsLog alongside pattern/pitfall/etc
- Updated unit tests + operational seed entry in learnings E2E
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: wire learnings into all insight-producing skills
Adds LEARNINGS_SEARCH and/or LEARNINGS_LOG to 10 skill templates that
produce reusable insights but were previously disconnected from the
learning system:
- office-hours, plan-ceo-review, plan-eng-review: add LOG (had SEARCH)
- plan-design-review: add both SEARCH + LOG (had neither)
- design-review, design-consultation, cso, qa, qa-only: add both
- retro: add SEARCH (had LOG)
13 skills now fully participate in the learning loop (read + write).
Every review, QA, investigation, and design session both consults prior
learnings and contributes new ones.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add operational-learning E2E test (gate-tier)
Validates the write path: agent encounters a CLI failure, logs an
operational learning to JSONL via gstack-learnings-log. Replaces the
removed contributor-mode E2E test.
Setup: temp git repo, copy bin scripts, set GSTACK_HOME.
Prompt: simulated npm test failure needing --experimental-vm-modules.
Assert: learnings.jsonl exists with type=operational entry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: learnings-show E2E slug mismatch — seed at computed slug, not hardcoded
The test seeded learnings at projects/test-project/ but gstack-slug computes
the slug from basename(workDir) when no git remote exists. The agent's search
looked at the wrong path and found nothing.
Fix: compute slug the same way gstack-slug does (basename + sanitize) and
seed the learnings there.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.8.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: session intelligence roadmap + design doc (#727)
* docs: Session Intelligence Layer design doc
Frames gstack as the persistent brain that survives Claude's ephemeral
context window. Architecture diagram, 5-layer feature breakdown, and
research sources from claude-mem, Anthropic engineering blog, CodeScene,
and Claude Code Agent Teams.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add context intelligence, health, swarm, and refactoring to roadmap
9 new TODOS across 4 sections based on Claude Code ecosystem research:
- Context Intelligence (P1): preamble artifact recovery, session timeline,
cross-session injection, /checkpoint skill, vision doc
- Health (P1): /health dashboard with CodeScene MCP integration option,
/health as /ship quality gate
- Swarm (P2): extract Review Army into reusable multi-agent primitive
- Refactoring (P2): /refactor-prep for pre-refactor token hygiene
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: ship idempotency + skill prefix name patching (v0.14.3.0) (#693)
* fix: add idempotency guards to /ship Steps 4, 7, 8 (#649)
If git push succeeds but gh pr create fails, re-running /ship would
double-bump VERSION and duplicate CHANGELOG entries. Now:
- Step 4: check if VERSION already differs from base branch
- Step 7: fetch only the specific branch, skip push if already up to date
- Step 8: if PR exists, update body via gh pr edit instead of creating duplicate
No CHANGELOG guard needed — Step 5 is already idempotent by design
("replace existing entries with one unified entry").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: patch name: in SKILL.md frontmatter for prefix mode (#620, #578)
./setup --prefix creates gstack-* symlinks but SKILL.md still says
name: qa, so Claude Code ignores the prefix. Now:
- New bin/gstack-patch-names shared helper patches name: field via sed
- setup calls it after link_claude_skill_dirs
- gstack-relink calls it after symlink loop
- gen-skill-docs.ts prints warning when skill_prefix is true
Edge cases: gstack-upgrade not double-prefixed, root gstack skill
never prefixed, prefix removal restores original names, SKILL.md
without frontmatter is a safe no-op.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add name patching + ship idempotency tests (#620, #649)
- 4 unit tests for name: patching in relink.test.ts (prefix on/off,
gstack-upgrade not double-prefixed, no-frontmatter no-op)
- 2 tests for gen-skill-docs prefix warning
- 1 E2E test for ship idempotency (periodic tier)
- Updated setupMockInstall to write SKILL.md with proper frontmatter
- Added ship-idempotency touchfiles + tier classification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: PR idempotency checks open state, dedupe touchfiles, sync package.json
- Step 8 PR guard now checks state==OPEN so closed PRs don't prevent
new PR creation (adversarial review finding)
- Remove duplicate ship-idempotency entry in E2E_TOUCHFILES
- Sync package.json version to 0.14.3.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: patch name: before creating symlinks to fix --no-prefix ordering bug
gstack-patch-names must run BEFORE link_claude_skill_dirs so symlink
names reflect the correct (patched) name: values. Previously, switching
from --prefix to --no-prefix would read stale gstack-* names from
SKILL.md and create wrong symlinks. (Codex adversarial finding)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692)
* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH
Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*
* feat: add 7 specialist checklist files for Review Army
- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns
All use standard header with JSON output schema and NO FINDINGS fallback.
* feat: Review Army resolver — parallel specialist dispatch + merge
New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch
Registered as REVIEW_ARMY in resolvers/index.ts.
* refactor: restructure /review template for Review Army
- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
(subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass
* test: Review Army — 14 diff-scope tests + 7 E2E tests
- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
Gate: migration safety, N+1 detection, delivery audit,
quality score, JSON findings
Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications
* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5
Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: always-on adversarial review + scope drift + plan mode design tools (v0.14.3.0) (#694)
* feat: always-on adversarial review + scope drift resolver + cross-model tension format
Rewrite generateAdversarialStep() to remove LOC-based tier skipping. Every review
now runs both Claude adversarial subagent and Codex adversarial challenge. OLD_CFG
only gates Codex passes, not Claude. Add generateScopeDrift() shared resolver.
Fix cross-model tension AskUserQuestion to include RECOMMENDATION + Completeness.
* feat: add scope drift to /ship, extract from /review template
/ship gets {{SCOPE_DRIFT}} at Step 3.48 + PR body slot. /review replaces
hardcoded scope drift with {{SCOPE_DRIFT}} + {{PLAN_COMPLETION_AUDIT_REVIEW}}.
* feat: plan mode safe operations — browse, design, codex allowed in plan mode
Add preamble section declaring $B, $D, codex, and ~/.gstack/ writes as
plan-mode-safe. Unblocks design skills during planning.
* test: update adversarial + add scope drift assertions
Rename adversarial tests to reflect always-on behavior. Remove tier
threshold assertions. Add scope drift content assertions for both
/review and /ship generated SKILL.md files.
* chore: bump version and changelog (v0.14.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: sidebar CSS inspector + per-tab agents (v0.13.9.0) (#650)
* feat: CDP inspector module — persistent sessions, CSS cascade, style modification
New browse/src/cdp-inspector.ts with full CDP inspection engine:
- inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel
- modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback
- Persistent CDP session lifecycle (create, reuse, detach on nav, re-create)
- Specificity sorting, overridden property detection, UA rule filtering
- Modification history with undo support
- formatInspectorResult() for CLI output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI
Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply,
POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE).
CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter
removal), prettyscreenshot (clean screenshot pipeline).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit
Extension changes for the visual CSS inspector:
- inspector.js: element picker with hover highlight, CSS selector generation,
basic mode fallback (getComputedStyle + CSSOM), page alteration handlers
- inspector.css: picker overlay styles (blue highlight + tooltip)
- background.js: inspector message routing (picker <-> server <-> sidepanel)
- sidepanel: Inspector tab with box model viz (gstack palette), matched rules
with specificity badges, computed styles, click-to-edit quick edit,
Send to Agent/Code button, empty/loading/error states
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: document inspect, style, cleanup, prettyscreenshot browse commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: auto-track user-created tabs and handle tab close
browser-manager.ts changes:
- context.on('page') listener: automatically tracks tabs opened by the user
(Cmd+T, right-click open in new tab, window.open). Previously only
programmatic newTab() was tracked, so user tabs were invisible.
- page.on('close') handler in wirePageEvents: removes closed tabs from the
pages map and switches activeTabId to the last remaining tab.
- syncActiveTabByUrl: match Chrome extension's active tab URL to the correct
Playwright page for accurate tab identity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: per-tab agent isolation via BROWSE_TAB environment variable
Prevents parallel sidebar agents from interfering with each other's tab context.
Three-layer fix:
- sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process,
per-tab processing set allows concurrent agents across tabs
- cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body
- server.ts: handleCommand() temporarily switches activeTabId when tabId is present,
restores after command completes (safe: Bun event loop is single-threaded)
Also: per-tab agent state (TabAgentState map), per-tab message queuing,
per-tab chat buffers, verbose streaming narration, stop button endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish
Extension changes:
- sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab()
swaps entire chat view, browserTabActivated handler for instant tab sync,
stop button wired to /sidebar-agent/stop, pollTabs renders tab bar
- sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup,
input placeholder "Ask about this page..."
- sidepanel.css: tab bar styles, stop button styles, loading state fixes
- background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel
with tab URL for instant tab switch detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX
sidebar-agent.test.ts (new tests):
- BROWSE_TAB env var passed to claude process
- CLI reads BROWSE_TAB and sends tabId in body
- handleCommand accepts tabId, saves/restores activeTabId
- Tab pinning only activates when tabId provided
- Per-tab agent state, queue, concurrency
- processingTabs set for parallel agents
sidebar-ux.test.ts (new tests):
- context.on('page') tracks user-created tabs
- page.on('close') removes tabs from pages map
- Tab isolation uses BROWSE_TAB not system prompt hack
- Per-tab chat context in sidepanel
- Tab bar rendering, stop button, banner text
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflicts — keep security defenses + per-tab isolation
Merged main's security improvements (XML escaping, prompt injection defense,
allowed commands whitelist, --model opus, Write tool, stderr capture) with
our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set,
no --resume). Updated test expectations for expanded system prompt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.9.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add inspector message types to background.js allowlist
Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing
all inspector message types (startInspector, stopInspector, elementPicked,
pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult).
Messages were silently rejected, making the inspector broken on ALL pages.
Also: separate executeScript and insertCSS into individual try blocks in
injectInspector(), store inspectorMode for routing, and add content.js
fallback when script injection fails (CSP, chrome:// pages).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: basic element picker in content.js for CSP-restricted pages
When inspector.js can't be injected (CSP, chrome:// pages), content.js
provides a basic picker using getComputedStyle + CSSOM:
- startBasicPicker/stopBasicPicker message handlers
- captureBasicData() with ~30 key CSS properties, box model, matched rules
- Hover highlight with outline save/restore (never leaves artifacts)
- Click uses e.target directly (no re-querying by selector)
- Sends inspectResult with mode:'basic' for sidebar rendering
- Escape key cancels picker and restores outlines
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cleanup + screenshot buttons in sidebar inspector toolbar
Two action buttons in the inspector toolbar:
- Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat
notification on success, resets inspector state (element may be removed)
- Screenshot (📸): POSTs screenshot to server, shows spinner, chat
notification with saved file path
Shared infrastructure:
- .inspector-action-btn CSS with loading spinner via ::after pseudo-element
- chat-notification type in addChatEntry() for system messages
- package.json version bump to 0.13.9.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: inspector allowlist, CSP fallback, cleanup/screenshot buttons
16 new tests in sidebar-ux.test.ts:
- Inspector message allowlist includes all inspector types
- content.js basic picker (startBasicPicker, captureBasicData, CSSOM,
outline save/restore, inspectResult with mode basic, Escape cleanup)
- background.js CSP fallback (separate try blocks, inspectorMode, fallback)
- Cleanup button (POST /command, inspector reset after success)
- Screenshot button (POST /command, notification rendering)
- Chat notification type and CSS styles
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.13.9.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cleanup + screenshot buttons in chat toolbar (not just inspector)
Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat
input, always visible. Both inspector and chat buttons share runCleanup() and
runScreenshot() helper functions. Clicking either set shows loading state on
both simultaneously.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: chat toolbar buttons, shared helpers, quick-action-btn styles
Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn,
quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading),
shared runCleanup/runScreenshot helper functions, and cleanup inspector reset.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal
Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and
Readability.js research:
- ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.)
- cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns
- overlays (NEW): paywalls, newsletter popups, interstitials, push prompts,
app download banners, survey modals
- social: follow prompts, share tools
- Cleanup now defaults to --all when no args (sidebar button fix)
- Uses !important on all display:none (overrides inline styles)
- Unlocks body/html scroll (overflow:hidden from modal lockout)
- Removes blur/filter effects (paywall content blur)
- Removes max-height truncation (article teaser truncation)
- Collapses empty ad placeholder whitespace (empty divs after ad removal)
- Skips gstack-ctrl indicator in sticky removal
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable action buttons when disconnected, no error spam
- setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot
buttons (both chat toolbar and inspector toolbar)
- Called with false in updateConnection when server URL is null
- Called with true when connection established
- runCleanup/runScreenshot silently return when disconnected instead of
showing 'Not connected' error notifications
- CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: cleanup heuristics, button disabled state, overlay selectors
17 new tests:
- cleanup defaults to --all on empty args
- CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial)
- Major ad networks in selectors (doubleclick, taboola, criteo, etc.)
- Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast)
- !important override for inline styles
- Scroll unlock (body overflow:hidden)
- Blur removal (paywall content blur)
- Article truncation removal (max-height)
- Empty placeholder collapse
- gstack-ctrl indicator skip in sticky cleanup
- setActionButtonsEnabled function
- Buttons disabled when disconnected
- No error spam from cleanup/screenshot when disconnected
- CSS disabled styles for action buttons
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: LLM-based page cleanup — agent analyzes page semantically
Instead of brittle CSS selectors, the cleanup button now sends a prompt to
the sidebar agent (which IS an LLM). The agent:
1. Runs deterministic $B cleanup --all as a quick first pass
2. Takes a snapshot to see what's left
3. Analyzes the page semantically to identify remaining clutter
4. Removes elements intelligently, preserving site branding
This means cleanup works correctly on any site without site-specific selectors.
The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is
junk, but the SF Chronicle masthead should stay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: aggressive cleanup heuristics + preserve top nav bar
Deterministic cleanup improvements (used as first pass before LLM analysis):
- New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games,
recirculation widgets (taboola, outbrain, nativo), cross-promotion banners
- Text-content detection: removes "ADVERTISEMENT", "Article continues below",
"Sponsored", "Paid content" labels and their parent wrappers
- Sticky fix: preserves the topmost full-width element near viewport top (site
nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical
position, preserves the first one that spans >80% viewport width.
Tests: clutter category, ad label removal, nav bar preservation logic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: LLM-based cleanup architecture, deterministic heuristics, sticky nav
22 new tests covering:
- Cleanup button uses /sidebar-command (agent) not /command (deterministic)
- Cleanup prompt includes deterministic first pass + agent snapshot analysis
- Cleanup prompt lists specific clutter categories for agent guidance
- Cleanup prompt preserves site identity (masthead, headline, body, byline)
- Cleanup prompt instructs scroll unlock and $B eval removal
- Loading state management (async agent, setTimeout)
- Deterministic clutter: audio/podcast, games/puzzles, recirculation
- Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues)
- Ad label parent wrapper hiding for small containers
- Sticky nav preservation (sort by position, first full-width near top)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent repeat chat message rendering on reconnect/replay
Root cause: server persists chat to disk (chat.jsonl) and replays on restart.
Client had no dedup, so every reconnect re-rendered the entire history.
Messages from an old HN session would repeat endlessly on the SF Chronicle tab.
Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry
skips entries already in the set. Entries without an id (local notifications)
bypass the check. Clear chat resets the set.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: agent stops when done, no focus stealing, opus for prompt injection safety
Three fixes for sidebar agent UX:
- System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep
exploring or doing bonus work." Prevents agent from endlessly taking
screenshots and highlighting elements after answering the question.
- switchTab(id, opts): new bringToFront option. Internal tab pinning
(BROWSE_TAB) uses bringToFront: false so agent commands never steal
window focus from the user's active app.
- Keep opus model (not sonnet) for prompt injection resistance on untrusted
web pages. Remove Write from allowedTools (agent only needs Bash for $B).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: agent conciseness, focus stealing, opus model, switchTab opts
Tests for the three UX fixes:
- System prompt contains STOP/CONCISE/Do NOT keep exploring
- sidebar agent uses opus (not sonnet) for prompt injection resistance
- switchTab has bringToFront option, defaults to true (opt-out)
- handleCommand tab pinning uses bringToFront: false (no focus steal)
- Updated stale tests: switchTab signature, allowedTools excludes Write,
narration -> conciseness, tab pinning restore calls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: sidebar CSS interaction E2E — HN comment highlight round-trip
New E2E test (periodic tier, ~$2/run) that exercises the full sidebar
agent pipeline with CSS interaction:
1. Agent navigates to Hacker News
2. Clicks into the top story's comments
3. Reads comments and identifies the most insightful one
4. Highlights it with a 4px solid orange outline via style injection
Tests: navigation, snapshot, text reading, LLM judgment, CSS modification.
Requires real browser + real Claude (ANTHROPIC_API_KEY).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio
Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not
seconds. '600' = 0.6 seconds, server died immediately after health check.
Fixed to '600000' (10 minutes).
Also: use 'pipe' stdio instead of file descriptors (closing fds kills child
on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout
for the multi-step opus task.
Test passes: agent navigates to HN, reads comments, identifies most
insightful one, highlights it with orange CSS, stops. 114s, $0.00.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: force comparison board as default variant chooser (v0.14.1.0) (#658)
* fix: force comparison board as default variant chooser
The comparison board ($D compare --serve) was being skipped in favor of
showing variants inline + AskUserQuestion "which do you prefer?" — a
degraded experience missing rating controls, comments, and remix buttons.
Changes:
- Replace "show inline" instruction with "do NOT show inline, proceed to
comparison board" in plan-design-review/SKILL.md.tmpl
- Add CRITICAL RULE: never use AskUserQuestion as the variant chooser
- Change DESIGN_SHOTGUN_LOOP resolver to AskUserQuestion-first wait with
polling fallback (affects all 3 consumer skills)
- Fix board URL from /design-board.html (404) to / (correct)
- Improve serve-failure fallback to show variants inline via Read tool
* chore: bump version and changelog (v0.14.1.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
docs: update README and skill deep dives for all 31 skills (#656)
* docs: update README with /design-html and /learn skills, sync all skill lists
- Added /design-html to sprint table (Pretext-native HTML from approved mockups)
- Added /learn to sprint table (project learnings management)
- Synced all 5 skill list locations (install step 1, step 2, troubleshooting,
sprint table, power tools) to include all 31 skills
- Updated intro count from 20 to 23 specialists
- Updated Codex section skill count from 29 to 31
- Expanded "Design is at the heart" paragraph with full pipeline:
consultation → shotgun → design-html → review
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add 9 missing skills to deep dives doc
Added table entries + full deep dive sections for:
- /design-shotgun (design exploration with comparison board)
- /design-html (Pretext-native HTML from approved mockups)
- /land-and-deploy (merge + deploy + canary verification)
- /canary (post-deploy monitoring loop)
- /benchmark (performance regression detection)
- /autoplan (auto-review pipeline)
- /learn (project learnings management)
- /connect-chrome (headed Chrome with side panel)
- /setup-deploy (one-time deploy configuration)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: /design-html skill — Pretext-native HTML from approved mockups (v0.14.0.0) (#653)
* feat: /design-html skill — Pretext-native HTML from approved mockups
New skill that takes approved design-shotgun mockups and generates
production-quality HTML with Pretext for computed text layout. Text
reflows on resize, heights adjust to content, zero hardcoded CSS.
Includes vendored Pretext bundle (30KB), smart API routing per design
type, AskUserQuestion refinement loop, framework detection, and
3-viewport verification screenshots.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate /design-html into design skill pipeline
- design-shotgun: Step 6 option B now chains to /design-html
- design-consultation: suggests /design-html after shipping DESIGN.md
(conditional on screen-level output, not tokens-only)
- plan-design-review: expanded chaining to include /design-shotgun
and /design-html alongside review skills
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: update plan-design-review chaining test for design skills
plan-design-review now chains to /design-shotgun and /design-html
in addition to review skills. Update the assertion to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add gstack keyword to design-html description for validation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.14.0.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: rotating founder resources in /office-hours closing (v0.13.10.0) (#652)
* feat: rotating founder resources in /office-hours closing
Add Beat 3.5 with 34 curated resources (5 Garry Tan videos, 2 YC Backstory,
9 Lightcone Podcast, 8 Startup School, 10 PG essays) that rotate contextually
each session. Includes dedup log to avoid repeats, analytics logging, and
browser-open offers. Also adds chmod +x safety net to build script.
* chore: bump version and changelog (v0.13.10.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)
* feat: add parameterized resolver support to gen-skill-docs
Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.
- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add INVOKE_SKILL resolver for composable skill loading
New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.
Usage: {{INVOKE_SKILL:plan-ceo-review}} or
{{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: use frontmatter name: for skill symlinks and Codex paths
Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).
- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract CHANGELOG_WORKFLOW resolver from /ship
Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback
Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL
Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args
12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
no unresolved placeholders across all generated SKILL.md files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions
Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.
Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
ship descriptions — reinforces the routing signal
10/10 journey tests now pass (was 7/10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: one-time CLAUDE.md routing injection prompt
Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.
Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.
- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: annotated config file + routing injection tests
gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.
Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.
Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
config reads, AskUserQuestion, decline persistence, routing rules)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases
Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: clarify branch-scoped VERSION/CHANGELOG after merging main
Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: put our 0.13.9.0 entry on top of CHANGELOG
Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: restore missing 0.13.7.0 Community Wave entry
Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add CHANGELOG integrity check rule
After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: security audit round 2 (v0.13.4.0) (#640)
* fix: chrome-cdp localhost-only binding
Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1
and --remote-allow-origins to prevent network-accessible debugging sessions.
Clears 1 Socket anomaly (Chrome CDP session exposure).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: extension sender validation + message type allowlist
Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's
message handler. Defense-in-depth against message spoofing from external
extensions or future externally_connectable changes.
Clears 2 Socket anomalies (extension permissions).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: checksum-verified bun install
Replace unverified curl|bash bun installation with checksum-verified
download-then-execute pattern. The install script is downloaded, sha256
verified against a known hash, then executed. Preserves the Bun-native
install path without adding a Node/npm dependency.
Clears Snyk W012 + 3 Socket anomalies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: content trust boundary markers in browse output
Wrap page-content commands (text, html, links, forms, accessibility,
console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT ---
markers. Covers direct commands (server.ts), chain sub-commands, and
snapshot output (meta-commands.ts).
Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in
commands.ts (single source of truth, DRY). Expands the SKILL.md trust
warning with explicit processing rules for agents.
Clears Snyk W011 (third-party content exposure).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: harden trust boundary markers against escape attacks
- Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent
marker injection via history.pushState
- Escape marker strings in content (zero-width space) so malicious pages
can't forge the END marker to break out of the untrusted block
- Wrap resume command snapshot with trust boundary markers
- Wrap diff command output with trust boundary markers
- Wrap watch stop last snapshot with trust boundary markers
Found by cross-model adversarial review (Claude + Codex).
* chore: bump version and changelog (v0.13.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: gitignore .factory/ and remove from tracking
Factory Droid support was removed in this branch. The .factory/ directory
was re-added by merging main (which had v0.13.5.0 Factory support).
Gitignore it so it stays out.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641)
* test: add 16 failing tests for 6 community fixes
Tests-first for all fixes in this PR wave:
- #594 discoverability: gstack tag in descriptions, 120-char first line
- #573 feature signals: ship/SKILL.md Step 4 detection
- #510 context warnings: no preemptive warnings in generated files
- #474 Safety Net: no find -delete in generated files
- #467 telemetry: JSONL writes gated by _TEL conditional
- #584 sidebar: Write in allowedTools, stderr capture
- #578 relink: prefixed/flat symlinks, cleanup, error, config hook
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace find -delete with find -exec rm for Safety Net (#474)
-delete is a non-POSIX extension that fails on Safety Net environments.
-exec rm {} + is POSIX-compliant and works everywhere.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gate local JSONL writes by telemetry setting (#467)
When telemetry is off, nothing is written anywhere — not just remote,
but local JSONL too. Clean trust contract: off means off everywhere.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove preemptive context warnings from plan-eng-review (#510)
The system handles context compaction automatically. Preemptive warnings
waste tokens and create false urgency. Skills should not warn about
context limits — just describe the compression priority order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add (gstack) tag to skill descriptions for discoverability (#594)
Every SKILL.md.tmpl description now contains "gstack" on the last line,
making skills findable in Claude Code's command palette. First-line hooks
stay under 120 chars. Split ship description to fix wrapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: auto-relink skill symlinks on prefix config change (#578)
New bin/gstack-relink creates prefixed (gstack-*) or flat symlinks
based on skill_prefix config. gstack-config auto-triggers relink
when skill_prefix changes. Setup guards against recursive calls
with GSTACK_SETUP_RUNNING env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add feature signal detection to version bump heuristic (#573)
/ship Step 4 now checks for feature signals (new routes, migrations,
test+source pairs, feat/ branches) when deciding version bumps.
PATCH requires no feature signals. MINOR asks the user if any signal
is detected or 500+ lines changed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar Write tool, stderr capture, cross-platform URL opener (#584)
Add Write to sidebar allowedTools (both sidebar-agent.ts and server.ts).
Write doesn't expand attack surface beyond what Bash already provides.
Replace empty stderr handler with buffer capture for better error
diagnostics. New bin/gstack-open-url for cross-platform URL opening.
Does NOT include Search Before Building intro flow (deferred).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update sidebar-security test for Write tool addition
The fallback allowedTools string now includes Write, matching the
sidebar-agent.ts change from commit 68dc957.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.5.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent gstack-relink from double-prefixing gstack-upgrade
gstack-relink now checks if a skill directory is already named gstack-*
before prepending the prefix. Previously, setting skill_prefix=true would
create gstack-gstack-upgrade, breaking the /gstack-upgrade command.
Matches setup script behavior (setup:260) which already has this guard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add double-prefix fix to changelog
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove .factory/ from git tracking and add to .gitignore
Generated Factory Droid skills are build output, same as .agents/.
They should not be committed to the repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622)
* feat: learnings + confidence resolvers — cross-skill memory infrastructure
Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: learnings bin scripts — append-only JSONL read/write
gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).
gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: learnings count in preamble output
Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate learnings + confidence into 9 skill templates
Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /learn skill — manage project learnings
New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: self-learning roadmap — 5-release design doc
Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: learnings bin script unit tests — 13 tests, free
Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.4.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: learnings resolver + bin script edge case tests — 21 new tests, free
Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync package.json version with VERSION file (0.13.4.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: gitignore .factory/ — generated output, not source
Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: /learn E2E — seed 3 learnings, verify agent surfaces them
Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore: gitignore .factory and remove tracked files (v0.13.5.1) (#642)
* chore: gitignore .factory and remove tracked files
The .factory/ directory contains generated skill definitions for
Factory Droid compatibility. These should not be tracked in git,
same as .claude/skills/ and .agents/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.5.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621)
* refactor: extract processExternalHost() shared helper for multi-host generation
Refactor the Codex-specific output routing block in gen-skill-docs.ts into
a shared processExternalHost() function. Both Codex and future external hosts
(Factory Droid) will use this helper for output routing, symlink loop detection,
frontmatter transformation, path rewrites, and metadata generation.
- Rename codexSkillName() to externalSkillName() everywhere
- Extract ExternalHostConfig interface with per-host settings
- Codex output is byte-identical (verified via --dry-run)
- Skip /codex skill for all non-Claude hosts (not just codex)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Factory Droid host type, preamble, and co-author trailer
- Add 'factory' to Host union type with .factory/skills/gstack paths
- Extend preamble runtime root detection for Factory ($HOME/.factory/)
- Add GSTACK_DESIGN env var to preamble (was missing for Codex too)
- Add Factory Droid co-author trailer for git commits
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid generation, --host all, and host-aware frontmatter
- Add --host factory (alias: --host droid) to gen-skill-docs
- Add --host all: generates for claude, codex, and factory in one invocation
with fault-tolerant per-host error handling (only fails if claude fails)
- Factory frontmatter: name + description + user-invocable: true
- Factory sensitive skills: disable-model-invocation: true (from sensitive: field)
- Claude: strips sensitive: field from output (only Factory uses it)
- Factory tool name translation: Claude tool names → generic phrasing
- Replace chained gen:skill-docs calls with --host all in package.json build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sensitive frontmatter for Factory Droid auto-invocation safety
Add sensitive: true to 6 skill templates with side effects that Factory
Droids shouldn't auto-invoke (ship, land-and-deploy, guard, careful,
freeze, unfreeze). The field is:
- Factory: emitted as disable-model-invocation: true
- Claude/Codex: stripped from output by transformFrontmatter()
Also fix Claude host path: call transformFrontmatter() for Claude to
strip the sensitive: field from Claude output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: gstack-platform-detect binary for multi-host debugging
Bash script that prints a table of installed AI coding agents (Claude,
Codex, Factory Droid, Kiro) with versions, skill paths, and gstack
installation status. Useful for debugging multi-host setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid support in setup script
- Add factory to --host values (auto-detected via command -v droid)
- Add .factory/ skill doc generation step alongside .agents/
- Add create_factory_runtime_root() and link_factory_skill_dirs()
helpers mirroring the Codex equivalents
- Factory install section creates ~/.factory/skills/ with symlinks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid awareness in skill-check and uninstall
- skill-check.ts: add Factory skills validation and freshness check
- gstack-uninstall: add Factory artifact cleanup (~/.factory/skills/gstack*
and per-project .factory/ sidecar)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: Factory Droid generation + --host all test suites
Add 13 new tests:
- Factory output paths, frontmatter (user-invocable, disable-model-invocation)
- Sensitive vs non-sensitive skill classification
- Path rewrites (no .claude/skills/ in Factory output)
- /codex skill exclusion, openai.yaml absence
- Factory keeps Codex integration blocks (for second opinions)
- --host droid alias, --dry-run freshness, preamble paths
- --host all generates for all 3 hosts
- Setup script host validation updated for factory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: Factory Droid install instructions + CI freshness check
- README: add Factory Droid section with install instructions and
restart note (Factory requires restart to rescan skills)
- CI: add Factory skill doc freshness verification to skill-docs.yml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: generated Factory Droid skill output (.factory/skills/)
29 skills generated for Factory Droid with:
- user-invocable: true on all skills
- disable-model-invocation: true on 6 sensitive skills
- .factory/skills/ paths (no .claude/skills/ references)
- $GSTACK_ROOT env vars for runtime root detection
- Tool name translation (Claude tool names → generic phrasing)
Committed to git for CI freshness checks and direct consumption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add Factory Droid P1 TODO for browse MCP server
Add 3 TODOs under new ## Factory Droid section:
- P1: Browse MCP server (Option B, deeper Factory integration)
- P3: .agent/skills/ dual output for cross-agent compatibility
- P3: Custom Droid definitions alongside skills
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: sidebar prompt injection defense (v0.13.4.0) (#611)
* fix: sidebar prompt injection defense — XML framing, command allowlist, arg plumbing
Three security fixes for the Chrome sidebar:
1. XML-framed prompts with trust boundaries and escape of < > & in user
messages to prevent tag injection attacks.
2. Bash command allowlist in system prompt — only browse binary commands
($B goto, $B click, etc.) allowed. All other bash commands forbidden.
3. Fix sidebar-agent.ts ignoring queued args — server-side --model and
--allowedTools changes were silently dropped because the agent rebuilt
args from scratch instead of using the queue entry.
Also defaults sidebar to Opus (harder to manipulate).
12 new tests covering XML escaping, command allowlist, Opus default,
trust boundary instructions, and arg plumbing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.4.0)
ML prompt injection defense design doc + P0 TODO for follow-up PR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: clear stale worktree and claude session on sidebar reconnect
loadSession() was restoring worktreePath and claudeSessionId from prior
crashes. The worktree directory no longer existed (deleted on cleanup)
and --resume with a dead session ID caused claude to fail silently.
Now validates worktree exists on load and clears stale claude session
IDs on every server restart.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: 6 critical fixes + community PR guardrails (v0.13.2.0) (#602)
* fix(security): commit bun.lock to pin dependency versions
Remove bun.lock from .gitignore and commit the lockfile. Every bun install
now uses exact pinned versions instead of resolving floating ^ ranges from
npm fresh. Closes the supply-chain vector from #566.
Co-Authored-By: boinger <boinger@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gstack-slug falls back to dirname/unknown when git context is absent
Add || true to git commands and fallback defaults so gstack-slug works
outside git repos. Prevents unbound variable crash that kills every
review skill when no git context exists.
Co-Authored-By: collinstraka-clov <collinstraka-clov@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: setup auto-selects default after 10s timeout to prevent CI hangs
Add -t 10 to the read command in the skill-prefix prompt. In CI, Docker,
and Conductor workspaces where a TTY exists but nobody is watching, the
prompt now auto-selects short names after 10 seconds instead of blocking
forever.
Co-Authored-By: stedfn <stedfn@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: browse CLI Windows lockfile — use string flag instead of numeric constants
Bun compiled binaries on Windows don't handle numeric fs.constants
correctly. The string flag 'wx' is semantically identical to
O_CREAT | O_EXCL | O_WRONLY per Node docs and works on all platforms.
Fixes #599
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add ~/.gstack/projects/ to plan file search path
/office-hours writes design docs to ~/.gstack/projects/$SLUG/ but /ship
and /review only searched ~/.claude/plans, ~/.codex/plans, and .gstack/plans.
Add the project-scoped directory as the first search location so plan
validation finds design docs created by the standard workflow.
Fixes #591
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: autoplan dual-voice — sequential foreground execution instead of broken parallel
Background subagents don't inherit tool permissions in Claude Code, so the
Claude subagent in dual-voice mode was silently failing on every invocation.
Every autoplan run was degrading to single-reviewer mode without warning.
Change all three phases (CEO, Design, Eng) from "simultaneously" to
sequential foreground execution: Claude subagent first (Agent tool,
foreground), then Codex (Bash). Both complete before the consensus table.
Fixes #497
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files from updated templates
Regenerated from autoplan/SKILL.md.tmpl (dual-voice fix) and
scripts/resolvers/review.ts (plan search path fix).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add community PR guardrails — protect ETHOS.md and voice
Add explicit CLAUDE.md rule requiring AskUserQuestion before accepting
any community PR that touches ETHOS.md, removes promotional material,
or changes Garry's voice. No exceptions, no auto-merging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.2.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gen-skill-docs detects symlink loop, skips codex write that overwrites Claude SKILL.md
When .agents/skills/gstack is symlinked to the repo root (vendored dev mode),
gen-skill-docs --host codex was writing the Codex-transformed SKILL.md through
the symlink, overwriting the Claude version. This caused SKILL.md and
agents/openai.yaml to silently revert to Codex paths after every build.
Now detects when the codex output path resolves to the same real file as the
Claude output and skips the write. Content is still generated for token budget
tracking. The openai.yaml write is also skipped for the same symlink case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve all 7 test failures — version sync, zsh glob guard, symlink-aware codex tests
1. package.json version synced with VERSION file (0.13.3.0)
2. design-shotgun/SKILL.md.tmpl: added setopt +o nomatch guard to
bash block with variant-*.png glob
3. Codex generation tests: skip skills where .agents/skills/{name}
is a symlink back to repo root (vendored dev mode). These can't
have proper codex content since gen-skill-docs skips the write
to avoid overwriting the Claude SKILL.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: boinger <boinger@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: collinstraka-clov <collinstraka-clov@users.noreply.github.com>
Co-authored-by: stedfn <stedfn@users.noreply.github.com>