feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)
* feat: add parameterized resolver support to gen-skill-docs
Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.
- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add INVOKE_SKILL resolver for composable skill loading
New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.
Usage: {{INVOKE_SKILL:plan-ceo-review}} or
{{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: use frontmatter name: for skill symlinks and Codex paths
Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).
- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: extract CHANGELOG_WORKFLOW resolver from /ship
Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback
Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL
Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args
12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
no unresolved placeholders across all generated SKILL.md files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions
Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.
Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
ship descriptions — reinforces the routing signal
10/10 journey tests now pass (was 7/10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: one-time CLAUDE.md routing injection prompt
Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.
Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.
- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: annotated config file + routing injection tests
gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.
Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.
Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
config reads, AskUserQuestion, decline persistence, routing rules)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases
Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: clarify branch-scoped VERSION/CHANGELOG after merging main
Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: put our 0.13.9.0 entry on top of CHANGELOG
Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: restore missing 0.13.7.0 Community Wave entry
Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add CHANGELOG integrity check rule
After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622)
* feat: learnings + confidence resolvers — cross-skill memory infrastructure
Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: learnings bin scripts — append-only JSONL read/write
gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).
gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: learnings count in preamble output
Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate learnings + confidence into 9 skill templates
Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /learn skill — manage project learnings
New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: self-learning roadmap — 5-release design doc
Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: learnings bin script unit tests — 13 tests, free
Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.4.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: learnings resolver + bin script edge case tests — 21 new tests, free
Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync package.json version with VERSION file (0.13.4.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: gitignore .factory/ — generated output, not source
Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: /learn E2E — seed 3 learnings, verify agent surfaces them
Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: design binary — real UI mockup generation for gstack skills (v0.13.0.0) (#551)
* docs: design tools v1 plan — visual mockup generation for gstack skills
Full design doc covering the `design` binary that wraps OpenAI's GPT Image API
to generate real UI mockups from gstack's design skills. Includes comparison
board UX spec, auth model, 6 CEO expansions (design memory, mockup diffing,
screenshot evolution, design intent verification, responsive variants,
design-to-code prompt), and 9-commit implementation plan.
Reviewed: /office-hours + /plan-eng-review (CLEARED) + /plan-ceo-review
(EXPANSION, 6/6 accepted) + /plan-design-review (2/10 → 8/10).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: design tools prototype validation — GPT Image API works
Prototype script sends 3 design briefs to OpenAI Responses API with
image_generation tool. Results: dashboard (47s, 2.1MB), landing page
(42s, 1.3MB), settings page (37s, 1.3MB) all produce real, implementable
UI mockups with accurate text rendering and clean layouts.
Key finding: Codex OAuth tokens lack image generation scopes. Direct
API key (sk-proj-*) required, stored in ~/.gstack/openai.json.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: design binary core — generate, check, compare commands
Stateless CLI (design/dist/design) wrapping OpenAI Responses API for
UI mockup generation. Three working commands:
- generate: brief -> PNG mockup via gpt-4o + image_generation tool
- check: vision-based quality gate via GPT-4o (text readability, layout
completeness, visual coherence)
- compare: generates self-contained HTML comparison board with star
ratings, radio Pick, per-variant feedback, regenerate controls,
and Submit button that writes structured JSON for agent polling
Auth reads from ~/.gstack/openai.json (0600), falls back to
OPENAI_API_KEY env var. Compiled separately from browse binary
(openai added to devDependencies, not runtime deps).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: design binary variants + iterate commands
variants: generates N style variations with staggered parallel (1.5s
between launches, exponential backoff on 429). 7 built-in style
variations (bold, calm, warm, corporate, dark, playful + default).
Tested: 3/3 variants in 41.6s.
iterate: multi-turn design iteration using previous_response_id for
conversational threading. Falls back to re-generation with accumulated
feedback if threading doesn't retain visual context.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: DESIGN_SETUP + DESIGN_MOCKUP template resolvers
Add generateDesignSetup() and generateDesignMockup() to the existing
design.ts resolver file. Add designDir to HostPaths (claude + codex).
Register DESIGN_SETUP and DESIGN_MOCKUP in the resolver index.
DESIGN_SETUP: $D binary discovery (mirrors $B browse setup pattern).
Falls back to DESIGN_SKETCH if binary not available.
DESIGN_MOCKUP: full visual exploration workflow template — construct
brief from DESIGN.md context, generate 3 variants, open comparison
board in Chrome, poll for user feedback, save approved mockup to
docs/designs/, generate HTML wireframe for implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync package.json version with VERSION file (0.12.2.0)
Pre-existing mismatch: VERSION was 0.12.2.0 but package.json was
0.12.0.0. Also adds design binary to build script and dev:design
convenience command.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /office-hours visual design exploration integration
Add {{DESIGN_MOCKUP}} to office-hours template before the existing
{{DESIGN_SKETCH}}. When the design binary is available, /office-hours
generates 3 visual mockup variants, opens a comparison board in Chrome,
and polls for user feedback. Falls back to HTML wireframes if the
design binary isn't built.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /plan-design-review visual mockup integration
Add {{DESIGN_SETUP}} to pre-review audit and "show me what 10/10
looks like" mockup generation to the 0-10 rating method. When a
design dimension rates below 7/10, the review can generate a mockup
showing the improved version. Falls back to text descriptions if
the design binary isn't available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: design memory — extract visual language from mockups into DESIGN.md
New `$D extract` command: sends approved mockup to GPT-4o vision,
extracts color palette, typography, spacing, and layout patterns,
writes/updates DESIGN.md with an "Extracted Design Language" section.
Progressive constraint: if DESIGN.md exists, future mockup briefs
include it as style context. If no DESIGN.md, explorations run wide.
readDesignConstraints() reads existing DESIGN.md for brief construction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: mockup diffing + design intent verification
New commands:
- $D diff --before old.png --after new.png: visual diff using GPT-4o
vision. Returns differences by area with severity (high/medium/low)
and a matchScore (0-100).
- $D verify --mockup approved.png --screenshot live.png: compares live
site screenshot against approved design mockup. Pass if matchScore
>= 70 and no high-severity differences.
Used by /design-review to close the design loop: design -> implement ->
verify visually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: screenshot-to-mockup evolution ($D evolve)
New command: $D evolve --screenshot current.png --brief "make it calmer"
Two-step process: first analyzes the screenshot via GPT-4o vision to
produce a detailed description, then generates a new mockup that keeps
the existing layout structure but applies the requested changes. Starts
from reality, not blank canvas.
Bridges the gap between /design-review critique ("the spacing is off")
and a visual proposal of the fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: responsive variants + design-to-code prompt
Responsive variants: $D variants --viewports desktop,tablet,mobile
generates mockups at 1536x1024, 1024x1024, and 1024x1536 (portrait)
with viewport-appropriate layout instructions.
Design-to-code prompt: $D prompt --image approved.png extracts colors,
typography, layout, and components via GPT-4o vision, producing a
structured implementation prompt. Reads DESIGN.md for additional
constraint context.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.0.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: gstack designer as first-class tool in /plan-design-review
Brand the gstack designer prominently, add Step 0.5 for proactive visual
mockup generation before review passes, and update priority hierarchy.
When a plan describes new UI, the skill now offers to generate mockups
with $D variants, run $D check for quality gating, and present a
comparison board via $B goto before any review passes begin.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate mockups into review passes and outputs
Thread Step 0.5 mockups through the review workflow: Pass 4 (AI Slop)
evaluates generated mockups visually, Pass 7 uses mockups as evidence
for unresolved decisions, post-pass offers one-shot regeneration after
design changes, and Approved Mockups section records chosen variants
with paths for the implementer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: gstack designer target mockups in /design-review fix loop
Add $D generate for target mockups in Phase 8a.5 — before fixing a
design finding, generate a mockup showing what it should look like.
Add $D verify in Phase 9 to compare fix results against targets.
Not plan mode — goes straight to implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: gstack designer AI mockups in /design-consultation Phase 5
Replace HTML preview with $D variants + comparison board when designer
is available (Path A). Use $D extract to derive DESIGN.md tokens from
the approved mockup. Handles both plan mode (write to plan) and
non-plan mode (implement immediately). Falls back to HTML preview
(Path B) when designer binary is unavailable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: make gstack designer the default in /plan-design-review, not optional
The transcript showed the agent writing 5 text descriptions of homepage
variants instead of generating visual mockups, even when the user explicitly
asked for design tools. The skill treated mockups as optional ("Want me to
generate?") when they should be the default behavior.
Changes:
- Rename "Your Visual Design Tool" to "YOUR PRIMARY TOOL" with aggressive
language: "Don't ask permission. Show it."
- Step 0.5 now generates mockups automatically when DESIGN_READY, no
AskUserQuestion gatekeeping the default path
- Priority hierarchy: mockups are "non-negotiable" not "if available"
- Step 0D tells the user mockups are coming next
- DESIGN_NOT_AVAILABLE fallback now tells user what they're missing
The only valid reasons to skip mockups: no UI scope, or designer not
installed. Everything else generates by default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: persist design mockups to ~/.gstack/projects/$SLUG/designs/
Mockups were going to .context/mockups/ (gitignored, workspace-local).
This meant designs disappeared when switching workspaces or conversations,
and downstream skills couldn't reference approved mockups from earlier
reviews.
Now all three design skills save to persistent project-scoped dirs:
- /plan-design-review: ~/.gstack/projects/$SLUG/designs/<screen>-<date>/
- /design-consultation: ~/.gstack/projects/$SLUG/designs/design-system-<date>/
- /design-review: ~/.gstack/projects/$SLUG/designs/design-audit-<date>/
Each directory gets an approved.json recording the user's pick, feedback,
and branch. This lets /design-review verify against mockups that
/plan-design-review approved, and design history is browsable via
ls ~/.gstack/projects/$SLUG/designs/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate codex ship skill with zsh glob guards
Picked up setopt +o nomatch guards from main's v0.12.8.1 merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add browse binary discovery to DESIGN_SETUP resolver
The design setup block now discovers $B alongside $D, so skills can
open comparison boards via $B goto and poll feedback via $B eval.
Falls back to `open` on macOS when browse binary is unavailable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: comparison board DOM polling in plan-design-review
After opening the comparison board, the agent now polls
#status via $B eval instead of asking a rigid AskUserQuestion.
Handles submit (read structured JSON feedback), regenerate
(new variants with updated brief), and $B-unavailable fallback
(free-form text response). The user interacts with the real
board UI, not a constrained option picker.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: comparison board feedback loop integration test
16 tests covering the full DOM polling cycle: structure verification,
submit with pick/rating/comment, regenerate flows (totally different,
more like this, custom text), and the agent polling pattern
(empty → submitted → read JSON). Uses real generateCompareHtml()
from design/src/compare.ts, served via HTTP. Runs in <1s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add $D serve command for HTTP-based comparison board feedback
The comparison board feedback loop was fundamentally broken: browse blocks
file:// URLs (url-validation.ts:71), so $B goto file://board.html always
fails. The fallback open + $B eval polls a different browser instance.
$D serve fixes this by serving the board over HTTP on localhost. The server
is stateful: stays alive across regeneration rounds, exposes /api/progress
for the board to poll, and accepts /api/reload from the agent to swap in
new board HTML. Stdout carries feedback JSON only; stderr carries telemetry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: dual-mode feedback + post-submit lifecycle in comparison board
When __GSTACK_SERVER_URL is set (injected by $D serve), the board POSTs
feedback to the server instead of only writing to hidden DOM elements.
After submit: disables all inputs, shows "Return to your coding agent."
After regenerate: shows spinner, polls /api/progress, auto-refreshes on
ready. On POST failure: shows copyable JSON fallback. On progress timeout
(5 min): shows error with /design-shotgun prompt. DOM fallback preserved
for headed browser mode and tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: HTTP serve command endpoints and regeneration lifecycle
11 tests covering: HTML serving with injected server URL, /api/progress
state reporting, submit → done lifecycle, regenerate → regenerating state,
remix with remixSpec, malformed JSON rejection, /api/reload HTML swapping,
missing file validation, and full regenerate → reload → submit round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add DESIGN_SHOTGUN_LOOP resolver + fix design artifact paths
Adds generateDesignShotgunLoop() resolver for the shared comparison board
feedback loop (serve via HTTP, handle regenerate/remix, AskUserQuestion
fallback, feedback confirmation). Registered as {{DESIGN_SHOTGUN_LOOP}}.
Fixes generateDesignMockup() to use ~/.gstack/projects/$SLUG/designs/
instead of /tmp/ and docs/designs/. Replaces broken $B goto file:// +
$B eval polling with $D compare --serve (HTTP-based, stdout feedback).
Adds CRITICAL PATH RULE guardrail to DESIGN_SETUP: design artifacts must
go to ~/.gstack/projects/$SLUG/designs/, never .context/ or /tmp/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add /design-shotgun standalone design exploration skill
New skill for visual brainstorming: generate AI design variants, open a
comparison board in the user's browser, collect structured feedback, and
iterate. Features: session detection (revisit prior explorations), 5-dimension
context gathering (who, job to be done, what exists, user flow, edge cases),
taste memory (prior approved designs bias new generations), inline variant
preview, configurable variant count, screenshot-to-variants via $D evolve.
Uses {{DESIGN_SHOTGUN_LOOP}} resolver for the feedback loop. Saves all
artifacts to ~/.gstack/projects/$SLUG/designs/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files for design-shotgun + resolver changes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add remix UI to comparison board
Per-variant element selectors (Layout, Colors, Typography, Spacing) with
radio buttons in a grid. Remix button collects selections into a remixSpec
object and sends via the same HTTP POST feedback mechanism. Enabled only
when at least one element is selected. Board shows regenerating spinner
while agent generates the hybrid variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add $D gallery command for design history timeline
Generates a self-contained HTML page showing all prior design explorations
for a project: every variant (approved or not), feedback notes, organized
by date (newest first). Images embedded as base64. Handles corrupted
approved.json gracefully (skips, still shows the session). Empty state
shows "No history yet" with /design-shotgun prompt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: gallery generation — sessions, dates, corruption, empty state
7 tests: empty dir, nonexistent dir, single session with approved variant,
multiple sessions sorted newest-first, corrupted approved.json handled
gracefully, session without approved.json, self-contained HTML (no
external dependencies).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace broken file:// polling with {{DESIGN_SHOTGUN_LOOP}}
plan-design-review and design-consultation templates previously used
$B goto file:// + $B eval polling for the comparison board feedback loop.
This was broken (browse blocks file:// URLs). Both templates now use
{{DESIGN_SHOTGUN_LOOP}} which serves via HTTP, handles regeneration in
the same browser tab, and falls back to AskUserQuestion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add design-shotgun touchfile entries and tier classifications
design-shotgun-path (gate): verify artifacts go to ~/.gstack/, not .context/
design-shotgun-session (gate): verify repeat-run detection + AskUserQuestion
design-shotgun-full (periodic): full round-trip with real design binary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files for template refactor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: comparison board UI improvements — option headers, pick confirmation, grid view
Three changes to the design comparison board:
1. Pick confirmation: selecting "Pick" on Option A shows "We'll move
forward with Option A" in green, plus a status line above the submit
button repeating the choice.
2. Clear option headers: each variant now has "Option A" in bold with a
subtitle above the image, instead of just the raw image.
3. View toggle: top-right Large/Grid buttons switch between single-column
(default) and 3-across grid view.
Also restructured the bottom section into a 2-column grid: submit/overall
feedback on the left, regenerate controls on the right.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use 127.0.0.1 instead of localhost for serve URL
Avoids DNS resolution issues on some systems where localhost may resolve
to IPv6 ::1 while Bun listens on IPv4 only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: write ALL feedback to disk so agent can poll in background mode
The agent backgrounds $D serve (Claude Code can't block on a subprocess
and do other work simultaneously). With stdout-only feedback delivery,
the agent never sees regenerate/remix feedback.
Fix: write feedback-pending.json (regenerate/remix) and feedback.json
(submit) to disk next to the board HTML. Agent polls the filesystem
instead of reading stdout. Both channels (stdout + disk) are always
active so foreground mode still works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: DESIGN_SHOTGUN_LOOP uses file polling instead of stdout reading
Update the template resolver to instruct the agent to background $D serve
and poll for feedback-pending.json / feedback.json on a 5-second loop.
This matches the real-world pattern where Claude Code / Conductor agents
can't block on subprocess stdout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files for file-polling feedback loop
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: null-safe DOM selectors for post-submit and regenerating states
The user's layout restructure renamed .regenerate-bar → .regen-column,
.submit-bar → .submit-column, and .overall-section → .bottom-section.
The JS still referenced the old class names, causing querySelector to
return null and showPostSubmitState() / showRegeneratingState() to
silently crash. This meant Submit and Regenerate buttons appeared to
work (DOM elements updated, HTTP POST succeeded) but the visual
feedback (disabled inputs, spinner, success message) never appeared.
Fix: use fallback selectors that check both old and new class names,
with null guards so a missing element doesn't crash the function.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: end-to-end feedback roundtrip — browser click to file on disk
The test that proves "changes on the website propagate to Claude Code."
Opens the comparison board in a real headless browser with __GSTACK_SERVER_URL
injected, simulates user clicks (Submit, Regenerate, More Like This), and
verifies that feedback.json / feedback-pending.json land on disk with the
correct structured data.
6 tests covering: submit → feedback.json, post-submit UI lockdown,
regenerate → feedback-pending.json, more-like-this → feedback-pending.json,
regenerate spinner display, and full regen → reload → submit round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: comprehensive design doc for Design Shotgun feedback loop
Documents the full browser-to-agent feedback architecture: state machine,
file-based polling, port discovery, post-submit lifecycle, and every known
edge case (zombie forms, dead servers, stale spinners, file:// bug,
double-click races, port coordination, sequential generate rule).
Includes ASCII diagrams of the data flow and state transitions, complete
step-by-step walkthrough of happy path and regeneration path, test coverage
map with gaps, and short/medium/long-term improvement ideas.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: plan-design-review agent guardrails for feedback loop
Four fixes to prevent agents from reinventing the feedback loop badly:
1. Sequential generate rule: explicit instruction that $D generate calls
must run one at a time (API rate-limits concurrent image generation).
2. No-AskUserQuestion-for-feedback rule: agent reads feedback.json instead
of re-asking what the user picked.
3. Remove file:// references: $B goto file:// was always rejected by
url-validation.ts. The --serve flag handles everything.
4. Remove $B eval polling reference: no longer needed with HTTP POST.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: design-shotgun Step 3 progressive reveal, silent failure detection, timing estimate
Three production UX bugs fixed:
1. Dead air — now shows timing estimate before generation starts
2. Silent variant drop — replaced $D variants batch with individual $D generate
calls, each verified for existence and non-zero size with retry
3. No progressive reveal — each variant shown inline via Read tool immediately
after generation (~60s increments instead of all at ~180s)
Also: /tmp/ then cp as default output pattern (sandbox workaround),
screenshot taken once for evolve path (not per-variant).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: parallel design-shotgun with concept-first confirmation
Step 3 rewritten to concept-first + parallel Agent architecture:
- 3a: generate text concepts (free, instant)
- 3b: AskUserQuestion to confirm/modify before spending API credits
- 3c: launch N Agent subagents in parallel (~60s total regardless of count)
- 3d: show all results, dynamic image list for comparison board
Adds Agent to allowed-tools. Softens plan-design-review sequential
warning to note design-shotgun uses parallel at Tier 2+.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.13.0.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: untrack .agents/skills/ — generated at setup, already gitignored
These files were committed despite .agents/ being in .gitignore.
They regenerate from ./setup --host codex on any machine.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate design-shotgun SKILL.md for v0.12.12.0 preamble changes
Merge from main brought updated preamble resolver (conditional telemetry,
local JSONL logging) but design-shotgun/SKILL.md wasn't regenerated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>