~cytrogen/gstack (be96ff5ce771f67d4502ea4b2fbbcba53654cdcf): docs/designs/DESIGN_TOOLS_V1.md

#Design: gstack Visual Design Generation (`design` binary)

Generated by /office-hours on 2026-03-26 Branch: garrytan/agent-design-tools Repo: gstack Status: DRAFT Mode: Intrapreneurship

#Context

gstack's design skills (/office-hours, /design-consultation, /plan-design-review, /design-review) all produce text descriptions of design — DESIGN.md files with hex codes, plan docs with pixel specs in prose, ASCII art wireframes. The creator is a designer who hand-designed HelloSign in OmniGraffle and finds this embarrassing.

The unit of value is wrong. Users don't need richer design language — they need an executable visual artifact that changes the conversation from "do you like this spec?" to "is this the screen?"

#Problem Statement

Design skills describe design in text instead of showing it. The Argus UX overhaul plan is the example: 487 lines of detailed emotional arc specs, typography choices, animation timing — zero visual artifacts. An AI coding agent that "designs" should produce something you can look at and react to viscerally.

#Demand Evidence

The creator/primary user finds the current output embarrassing. Every design skill session ends with prose where a mockup should be. GPT Image API now generates pixel-perfect UI mockups with accurate text rendering — the capability gap that justified text-only output no longer exists.

#Narrowest Wedge

A compiled TypeScript binary (design/dist/design) that wraps the OpenAI Images/Responses API, callable from skill templates via $D (mirroring the existing $B browse binary pattern). Priority integration order: /office-hours → /plan-design-review → /design-consultation → /design-review.

#Agreed Premises

GPT Image API (via OpenAI Responses API) is the right engine. Google Stitch SDK is backup.
Visual mockups are default-on for design skills with an easy skip path — not opt-in. (Revised per Codex challenge.)
The integration is a shared utility (not per-skill reimplementation) — a design binary that any skill can call.
Priority: /office-hours first, then /plan-design-review, /design-consultation, /design-review.

#Cross-Model Perspective (Codex)

Codex independently validated the core thesis: "The failure is not output quality within markdown; it is that the current unit of value is wrong." Key contributions:

Challenged premise #2 (opt-in → default-on) — accepted
Proposed vision-based quality gate: use GPT-4o vision to verify generated mockups for unreadable text, missing sections, broken layout, auto-retry once
Scoped 48-hour prototype: shared visual_mockup.ts utility, /office-hours + /plan-design-review only, hero mockup + 2 variants

#Recommended Approach: `design` Binary (Approach B)

#Architecture

Shares the browse binary's compilation and distribution pattern (bun build --compile, setup script, $VARIABLE resolution in skill templates) but is architecturally simpler — no persistent daemon server, no Chromium, no health checks, no token auth. The design binary is a stateless CLI that makes OpenAI API calls and writes PNGs to disk. Session state (for multi-turn iteration) is a JSON file.

New dependency: openai npm package (add to devDependencies, NOT runtime deps). Design binary compiled separately from browse so openai doesn't bloat the browse binary.

design/
├── src/
│   ├── cli.ts            # Entry point, command dispatch
│   ├── commands.ts        # Command registry (source of truth for docs + validation)
│   ├── generate.ts        # Generate mockups from structured brief
│   ├── iterate.ts         # Multi-turn iteration on existing mockups
│   ├── variants.ts        # Generate N design variants from brief
│   ├── check.ts           # Vision-based quality gate (GPT-4o)
│   ├── brief.ts           # Structured brief type + assembly helpers
│   └── session.ts         # Session state (response IDs for multi-turn)
├── dist/
│   ├── design             # Compiled binary
│   └── .version           # Git hash
└── test/
    └── design.test.ts     # Integration tests

#Commands

# Generate a hero mockup from a structured brief
$D generate --brief "Dashboard for a coding assessment tool. Dark theme, cream accents. Shows: builder name, score badge, narrative letter, score cards. Target: technical users." --output /tmp/mockup-hero.png

# Generate 3 design variants
$D variants --brief "..." --count 3 --output-dir /tmp/mockups/

# Iterate on an existing mockup with feedback
$D iterate --session /tmp/design-session.json --feedback "Make the score cards larger, move the narrative above the scores" --output /tmp/mockup-v2.png

# Vision-based quality check (returns PASS/FAIL + issues)
$D check --image /tmp/mockup-hero.png --brief "Dashboard with builder name, score badge, narrative"

# One-shot with quality gate + auto-retry
$D generate --brief "..." --output /tmp/mockup.png --check --retry 1

# Pass a structured brief via JSON file
$D generate --brief-file /tmp/brief.json --output /tmp/mockup.png

# Generate comparison board HTML for user review
$D compare --images /tmp/mockups/variant-*.png --output /tmp/design-board.html

# Guided API key setup + smoke test
$D setup

Brief input modes:

--brief "plain text" — free-form text prompt (simple mode)
--brief-file path.json — structured JSON matching the DesignBrief interface (rich mode)
Skills construct a JSON brief file, write it to /tmp, and pass --brief-file

All commands are registered in commands.ts including --check and --retry as flags on generate.

#Design Exploration Workflow (from eng review)

The workflow is sequential, not parallel. PNGs are for visual exploration (human-facing), HTML wireframes are for implementation (agent-facing):

1. $D variants --brief "..." --count 3 --output-dir /tmp/mockups/
   → Generates 2-5 PNG mockup variations

2. $D compare --images /tmp/mockups/*.png --output /tmp/design-board.html
   → Generates HTML comparison board (spec below)

3. $B goto file:///tmp/design-board.html
   → User reviews all variants in headed Chrome

4. User picks favorite, rates, comments, clicks [Submit]
   Agent polls: $B eval document.getElementById('status').textContent
   Agent reads: $B eval document.getElementById('feedback-result').textContent
   → No clipboard, no pasting. Agent reads feedback directly from the page.

5. Claude generates HTML wireframe via DESIGN_SKETCH matching approved direction
   → Agent implements from the inspectable HTML, not the opaque PNG

#Comparison Board Design Spec (from /plan-design-review)

Classifier: APP UI (task-focused, utility page). No product branding.

Layout: Single column, full-width mockups. Each variant gets the full viewport width for maximum image fidelity. Users scroll vertically through variants.

┌─────────────────────────────────────────────────────────────┐
│  HEADER BAR                                                 │
│  "Design Exploration" . project name . "3 variants"         │
│  Mode indicator: [Wide exploration] | [Matching DESIGN.md]  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              VARIANT A (full width)                    │  │
│  │         [ mockup PNG, max-width: 1200px ]              │  │
│  ├───────────────────────────────────────────────────────┤  │
│  │ (●) Pick   ★★★★☆   [What do you like/dislike?____]   │  │
│  │            [More like this]                            │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              VARIANT B (full width)                    │  │
│  │         [ mockup PNG, max-width: 1200px ]              │  │
│  ├───────────────────────────────────────────────────────┤  │
│  │ ( ) Pick   ★★★☆☆   [What do you like/dislike?____]   │  │
│  │            [More like this]                            │  │
│  └───────────────────────────────────────────────────────┘  │
│                                                             │
│  ... (scroll for more variants)                             │
│                                                             │
│  ─── separator ─────────────────────────────────────────    │
│  Overall direction (optional, collapsed by default)         │
│  [textarea, 3 lines, expand on focus]                       │
│                                                             │
│  ─── REGENERATE BAR (#f7f7f7 bg) ───────────────────────    │
│  "Want to explore more?"                                    │
│  [Totally different]  [Match my design]  [Custom: ______]   │
│                                          [Regenerate ->]    │
│  ─────────────────────────────────────────────────────────  │
│                                        [ ✓ Submit ]         │
└─────────────────────────────────────────────────────────────┘

Visual spec:

Background: #fff. No shadows, no card borders. Variant separation: 1px #e5e5e5 line.
Typography: system font stack. Header: 16px semibold. Labels: 14px semibold. Feedback placeholder: 13px regular #999.
Star rating: 5 clickable stars, filled=#000, unfilled=#ddd. Not colored, not animated.
Radio button "Pick": explicit favorite selection. One per variant, mutually exclusive.
"More like this" button: per-variant, triggers regeneration with that variant's style as seed.
Submit button: #000 background, white text, right-aligned. Single CTA.
Regenerate bar: #f7f7f7 background, visually distinct from feedback area.
Max-width: 1200px centered for mockup images. Margins: 24px sides.

Interaction states:

Loading (page opens before images ready): skeleton pulse with "Generating variant A..." per card. Stars/textarea/pick disabled.
Partial failure (2 of 3 succeed): show good ones, error card for failed with per-variant [Retry].
Post-submit: "Feedback submitted! Return to your coding agent." Page stays open.
Regeneration: smooth transition, fade out old variants, skeleton pulses, fade in new. Scroll resets to top. Previous feedback cleared.

Feedback JSON structure (written to hidden #feedback-result element):

{
  "preferred": "A",
  "ratings": { "A": 4, "B": 3, "C": 2 },
  "comments": {
    "A": "Love the spacing, header feels right",
    "B": "Too busy, but good color palette",
    "C": "Wrong mood entirely"
  },
  "overall": "Go with A, make the CTA bigger",
  "regenerated": false
}

Accessibility: Star ratings keyboard navigable (arrow keys). Textareas labeled ("Feedback for Variant A"). Submit/Regenerate keyboard accessible with visible focus ring. All text #333+ on white.

Responsive: >1200px: comfortable margins. 768-1200px: tighter margins. <768px: full-width, no horizontal scroll.

Screenshot consent (first-time only for $D evolve): "This will send a screenshot of your live site to OpenAI for design evolution. [Proceed] [Don't ask again]" Stored in ~/.gstack/config.yaml as design_screenshot_consent.

Why sequential: Codex adversarial review identified that raster PNGs are opaque to agents (no DOM, no states, no diffable structure). HTML wireframes preserve a bridge back to code. The PNG is for the human to say "yes, that's right." The HTML is for the agent to say "I know how to build this."

#Key Design Decisions

1. Stateless CLI, not daemon Browse needs a persistent Chromium instance. Design is just API calls — no reason for a server. Session state for multi-turn iteration is a JSON file written to /tmp/design-session-{id}.json containing previous_response_id.

Session ID: generated from ${PID}-${timestamp}, passed via --session flag
Discovery: the generate command creates the session file and prints its path; iterate reads it via --session
Cleanup: session files in /tmp are ephemeral (OS cleans up); no explicit cleanup needed

2. Structured brief input The brief is the interface between skill prose and image generation. Skills construct it from design context:

interface DesignBrief {
  goal: string;           // "Dashboard for coding assessment tool"
  audience: string;       // "Technical users, YC partners"
  style: string;          // "Dark theme, cream accents, minimal"
  elements: string[];     // ["builder name", "score badge", "narrative letter"]
  constraints?: string;   // "Max width 1024px, mobile-first"
  reference?: string;     // Path to existing screenshot or DESIGN.md excerpt
  screenType: string;     // "desktop-dashboard" | "mobile-app" | "landing-page" | etc.
}

3. Default-on in design skills Skills generate mockups by default. The template includes skip language:

Generating visual mockup of the proposed design... (say "skip" if you don't need visuals)

4. Vision quality gate After generating, optionally pass the image through GPT-4o vision to check:

Text readability (are labels/headings legible?)
Layout completeness (are all requested elements present?)
Visual coherence (does it look like a real UI, not a collage?) Auto-retry once on failure. If still fails, present anyway with a warning.

5. Output location: explorations in /tmp, approved finals in docs/designs/

Exploration variants go to /tmp/gstack-mockups-{session}/ (ephemeral, not committed)
Only the user-approved final mockup gets saved to docs/designs/ (checked in)
Default output directory configurable via CLAUDE.md design_output_dir setting
Filename pattern: {skill}-{description}-{timestamp}.png
Create docs/designs/ if it doesn't exist (mkdir -p)
Design doc references the committed image path
Always show to user via the Read tool (which renders images inline in Claude Code)
This avoids repo bloat: only approved designs are committed, not every exploration variant
Fallback: if not in a git repo, save to /tmp/gstack-mockup-{timestamp}.png

6. Trust boundary acknowledgment Default-on generation sends design brief text to OpenAI. This is a new external data flow vs. the existing HTML wireframe path which is entirely local. The brief contains only abstract design descriptions (goal, style, elements), never source code or user data. Screenshots from $B are NOT sent to OpenAI (the reference field in DesignBrief is a local file path used by the agent, not uploaded to the API). Document this in CLAUDE.md.

7. Rate limit mitigation Variant generation uses staggered parallel: start each API call 1 second apart via Promise.allSettled() with delays. This avoids the 5-7 RPM rate limit on image generation while still being faster than fully serial. If any call 429s, retry with exponential backoff (2s, 4s, 8s).

#Template Integration

Add to existing resolver: scripts/resolvers/design.ts (NOT a new file)

Add generateDesignSetup() for {{DESIGN_SETUP}} placeholder (mirrors generateBrowseSetup())
Add generateDesignMockup() for {{DESIGN_MOCKUP}} placeholder (full exploration workflow)
Keeps all design resolvers in one file (consistent with existing codebase convention)

New HostPaths entry: types.ts

// claude host:
designDir: '~/.claude/skills/gstack/design/dist'
// codex host:
designDir: '$GSTACK_DESIGN'

Note: Codex runtime setup (setup script) must also export GSTACK_DESIGN env var, similar to how GSTACK_BROWSE is set.

$D resolution bash block (generated by {{DESIGN_SETUP}}):

_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
D=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/design/dist/design" ] && D="$_ROOT/.claude/skills/gstack/design/dist/design"
[ -z "$D" ] && D=~/.claude/skills/gstack/design/dist/design
if [ -x "$D" ]; then
  echo "DESIGN_READY: $D"
else
  echo "DESIGN_NOT_AVAILABLE"
fi

If DESIGN_NOT_AVAILABLE: skills fall back to HTML wireframe generation (existing DESIGN_SKETCH pattern). Design mockup is a progressive enhancement, not a hard requirement.

New functions in existing resolver: scripts/resolvers/design.ts

Add generateDesignSetup() for {{DESIGN_SETUP}} — mirrors generateBrowseSetup() pattern
Add generateDesignMockup() for {{DESIGN_MOCKUP}} — the full generate+check+present workflow
Keeps all design resolvers in one file (consistent with existing codebase convention)

#Skill Integration (Priority Order)

1. /office-hours — Replace the Visual Sketch section

After approach selection (Phase 4), generate hero mockup + 2 variants
Present all three via Read tool, ask user to pick
Iterate if requested
Save chosen mockup alongside design doc

2. /plan-design-review — "What better looks like"

When rating a design dimension <7/10, generate a mockup showing what 10/10 would look like
Side-by-side: current (screenshot via $B) vs. proposed (mockup via $D)

3. /design-consultation — Design system preview

Generate visual preview of proposed design system (typography, colors, components)
Replace the /tmp HTML preview page with a proper mockup

4. /design-review — Design intent comparison

Generate "design intent" mockup from the plan/DESIGN.md specs
Compare against live site screenshot for visual delta

#Files to Create

File	Purpose
`design/src/cli.ts`	Entry point, command dispatch
`design/src/commands.ts`	Command registry
`design/src/generate.ts`	GPT Image generation via Responses API
`design/src/iterate.ts`	Multi-turn iteration with session state
`design/src/variants.ts`	Generate N design variants
`design/src/check.ts`	Vision-based quality gate
`design/src/brief.ts`	Structured brief types + helpers
`design/src/session.ts`	Session state management
`design/src/compare.ts`	HTML comparison board generator
`design/test/design.test.ts`	Integration tests (mock OpenAI API)
(none — add to existing `scripts/resolvers/design.ts`)	`{{DESIGN_SETUP}}` + `{{DESIGN_MOCKUP}}` resolvers

#Files to Modify

File	Change
`scripts/resolvers/types.ts`	Add `designDir` to `HostPaths`
`scripts/resolvers/index.ts`	Register DESIGN_SETUP + DESIGN_MOCKUP resolvers
`package.json`	Add `design` build command
`setup`	Build design binary alongside browse
`scripts/resolvers/preamble.ts`	Add `GSTACK_DESIGN` env var export for Codex host
`test/gen-skill-docs.test.ts`	Update DESIGN_SKETCH test suite for new resolvers
`setup`	Add design binary build + Codex/Kiro asset linking
`office-hours/SKILL.md.tmpl`	Replace Visual Sketch section with `{{DESIGN_MOCKUP}}`
`plan-design-review/SKILL.md.tmpl`	Add `{{DESIGN_SETUP}}` + mockup generation for low-scoring dimensions

#Existing Code to Reuse

Code	Location	Used For
Browse CLI pattern	`browse/src/cli.ts`	Command dispatch architecture
`commands.ts` registry	`browse/src/commands.ts`	Single source of truth pattern
`generateBrowseSetup()`	`scripts/resolvers/browse.ts`	Template for `generateDesignSetup()`
`DESIGN_SKETCH` resolver	`scripts/resolvers/design.ts`	Template for `DESIGN_MOCKUP` resolver
HostPaths system	`scripts/resolvers/types.ts`	Multi-host path resolution
Build pipeline	`package.json` build script	`bun build --compile` pattern

#API Details

Generate: OpenAI Responses API with image_generation tool

const response = await openai.responses.create({
  model: "gpt-4o",
  input: briefToPrompt(brief),
  tools: [{ type: "image_generation", size: "1536x1024", quality: "high" }],
});
// Extract image from response output items
const imageItem = response.output.find(item => item.type === "image_generation_call");
const base64Data = imageItem.result; // base64-encoded PNG
fs.writeFileSync(outputPath, Buffer.from(base64Data, "base64"));

Iterate: Same API with previous_response_id

const response = await openai.responses.create({
  model: "gpt-4o",
  input: feedback,
  previous_response_id: session.lastResponseId,
  tools: [{ type: "image_generation" }],
});

NOTE: Multi-turn image iteration via previous_response_id is an assumption that needs prototype validation. The Responses API supports conversation threading, but whether it retains visual context of generated images for edit-style iteration is not confirmed in docs. Fallback: if multi-turn doesn't work, iterate falls back to re-generating with the original brief + accumulated feedback in a single prompt.

Check: GPT-4o vision

const check = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{
    role: "user",
    content: [
      { type: "image_url", image_url: { url: `data:image/png;base64,${imageData}` } },
      { type: "text", text: `Check this UI mockup. Brief: ${brief}. Is text readable? Are all elements present? Does it look like a real UI? Return PASS or FAIL with issues.` }
    ]
  }]
});

Cost: ~$0.10-$0.40 per design session (1 hero + 2 variants + 1 quality check + 1 iteration). Negligible next to the LLM costs already in each skill invocation.

#Auth (validated via smoke test)

Codex OAuth tokens DO NOT work for image generation. Tested 2026-03-26: both the Images API and Responses API reject ~/.codex/auth.json access_token with "Missing scopes: api.model.images.request". Codex CLI also has no native imagegen capability.

Auth resolution order:

Read ~/.gstack/openai.json → { "api_key": "sk-..." } (file permissions 0600)
Fall back to OPENAI_API_KEY environment variable
If neither exists → guided setup flow:
- Tell user: "Design mockups need an OpenAI API key with image generation permissions. Get one at platform.openai.com/api-keys"
- Prompt user to paste the key
- Write to ~/.gstack/openai.json with 0600 permissions
- Run a smoke test (generate a 1024x1024 test image) to verify the key works
- If smoke test passes, proceed. If it fails, show the error and fall back to DESIGN_SKETCH.
If auth exists but API call fails → fall back to DESIGN_SKETCH (existing HTML wireframe approach). Design mockups are a progressive enhancement, never a hard requirement.

New command: $D setup — guided API key setup + smoke test. Can be run anytime to update the key.

#Assumptions to Validate in Prototype

Image quality: "Pixel-perfect UI mockups" is aspirational. GPT Image generation may not reliably produce accurate text rendering, alignment, and spacing at true UI fidelity. The vision quality gate helps, but success criterion "good enough to implement from" needs prototype validation before full skill integration.
Multi-turn iteration: Whether previous_response_id retains visual context is unproven (see API Details section).
Cost model: Estimated $0.10-$0.40/session needs real-world validation.

Prototype validation plan: Build Commit 1 (core generate + check), run 10 design briefs across different screen types, evaluate output quality before proceeding to skill integration.

#CEO Expansion Scope (accepted via /plan-ceo-review SCOPE EXPANSION)

#1. Design Memory + Exploration Width Control

Auto-extract visual language from approved mockups into DESIGN.md
If DESIGN.md exists, constrain future mockups to established design language
If no DESIGN.md (bootstrap), explore WIDE across diverse directions
Progressive constraint: more established design = narrower exploration band
Comparison board gets REGENERATE section with exploration controls:
- "Something totally different" (wide exploration)
- "More like option ___" (narrow around a favorite)
- "Match my existing design" (constrain to DESIGN.md)
- Free text input for specific direction changes
- Regenerate refreshes the page, agent polls for new submission

#2. Mockup Diffing

$D diff --before old.png --after new.png generates visual diff
Side-by-side with changed regions highlighted
Uses GPT-4o vision to identify differences
Used in: /design-review, iteration feedback, PR review

#3. Screenshot-to-Mockup Evolution

$D evolve --screenshot current.png --brief "make it calmer"
Takes live site screenshot, generates mockup showing how it SHOULD look
Starts from reality, not blank canvas
Bridge between /design-review critique and visual fix proposal

#4. Design Intent Verification

During /design-review, overlay approved mockup (docs/designs/) onto live screenshot
Highlight divergence: "You designed X, you built Y, here's the gap"
Closes the full loop: design -> implement -> verify visually
Combines $B screenshot + $D diff + vision analysis

#5. Responsive Variants

$D variants --brief "..." --viewports desktop,tablet,mobile
Auto-generates mockups at multiple viewport sizes
Comparison board shows responsive grid for simultaneous approval
Makes responsive design a first-class concern from mockup stage

#6. Design-to-Code Prompt

After comparison board approval, auto-generate structured implementation prompt
Extracts colors, typography, layout from approved PNG via vision analysis
Combines with DESIGN.md and HTML wireframe as structured spec
Bridges "approved design" to "agent starts coding" with zero interpretation gap

#Future Engines (NOT in this plan's scope)

Magic Patterns integration (extract patterns from existing designs)
Variant API (when they ship it, multi-variation React code + preview)
Figma MCP (bidirectional design file access)
Google Stitch SDK (free TypeScript alternative)

#Open Questions

When Variant ships an API, what's the integration path? (Separate engine in the design binary, or a standalone Variant binary?)
How should Magic Patterns integrate? (Another engine in $D, or a separate tool?)
At what point does the design binary need a plugin/engine architecture to support multiple generation backends?

#Success Criteria

Running /office-hours on a UI idea produces actual PNG mockups alongside the design doc
Running /plan-design-review shows "what better looks like" as a mockup, not prose
Mockups are good enough that a developer could implement from them
The quality gate catches obviously broken mockups and retries
Cost per design session stays under $0.50

#Distribution Plan

The design binary is compiled and distributed alongside the browse binary:

bun build --compile design/src/cli.ts --outfile design/dist/design
Built during ./setup and bun run build
Symlinked via existing ~/.claude/skills/gstack/ install path

#Next Steps (Implementation Order)

#Commit 0: Prototype validation (MUST PASS before building infrastructure)

Single-file prototype script (~50 lines) that sends 3 different design briefs to GPT Image API
Validates: text rendering quality, layout accuracy, visual coherence
If output is "embarrassingly bad AI art" for UI mockups, STOP. Re-evaluate approach.
This is the cheapest way to validate the core assumption before building 8 files of infrastructure.

#Commit 1: Design binary core (generate + check + compare)

design/src/ with cli.ts, commands.ts, generate.ts, check.ts, brief.ts, session.ts, compare.ts
Auth module (read ~/.gstack/openai.json, fallback to env var, guided setup flow)
compare command generates HTML comparison board with per-variant feedback textareas
package.json build command (separate bun build --compile from browse)
setup script integration (including Codex + Kiro asset linking)
Unit tests with mock OpenAI API server

#Commit 2: Variants + iterate

design/src/variants.ts, design/src/iterate.ts
Staggered parallel generation (1s delay between starts, exponential backoff on 429)
Session state management for multi-turn
Tests for iteration flow + rate limit handling

#Commit 3: Template integration

Add generateDesignSetup() + generateDesignMockup() to existing scripts/resolvers/design.ts
Add designDir to HostPaths in scripts/resolvers/types.ts
Register DESIGN_SETUP + DESIGN_MOCKUP in scripts/resolvers/index.ts
Add GSTACK_DESIGN env var export to scripts/resolvers/preamble.ts (Codex host)
Update test/gen-skill-docs.test.ts (DESIGN_SKETCH test suite)
Regenerate SKILL.md files

#Commit 4: /office-hours integration

Replace Visual Sketch section with {{DESIGN_MOCKUP}}
Sequential workflow: generate variants → $D compare → user feedback → DESIGN_SKETCH HTML wireframe
Save approved mockup to docs/designs/ (only the approved one, not explorations)

#Commit 5: /plan-design-review integration

Add {{DESIGN_SETUP}} and mockup generation for low-scoring dimensions
"What 10/10 looks like" mockup comparison

#Commit 6: Design Memory + Exploration Width Control (CEO expansion)

After mockup approval, extract visual language via GPT-4o vision
Write/update DESIGN.md with extracted colors, typography, spacing, layout patterns
If DESIGN.md exists, feed it as constraint context to all future mockup prompts
Add REGENERATE section to comparison board HTML (chiclets + free text + refresh loop)
Progressive constraint logic in brief construction

#Commit 7: Mockup Diffing + Design Intent Verification (CEO expansion)

$D diff command: takes two PNGs, uses GPT-4o vision to identify differences, generates overlay
$D verify command: screenshots live site via $B, diffs against approved mockup from docs/designs/
Integration into /design-review template: auto-verify when approved mockup exists

#Commit 8: Screenshot-to-Mockup Evolution (CEO expansion)

$D evolve command: takes screenshot + brief, generates "how it should look" mockup
Sends screenshot as reference image to GPT Image API
Integration into /design-review: "Here's what the fix should look like" visual proposals

#Commit 9: Responsive Variants + Design-to-Code Prompt (CEO expansion)

--viewports flag on $D variants for multi-size generation
Comparison board responsive grid layout
Auto-generate structured implementation prompt after approval
Vision analysis of approved PNG to extract colors, typography, layout for the prompt

#The Assignment

Tell Variant to build an API. As their investor: "I'm building a workflow where AI agents generate visual designs programmatically. GPT Image API works today — but I'd rather use Variant because the multi-variation approach is better for design exploration. Ship an API endpoint: prompt in, React code + preview image out. I'll be your first integration partner."

#Verification

bun run build compiles design/dist/design binary
$D generate --brief "Landing page for a developer tool" --output /tmp/test.png produces a real PNG
$D check --image /tmp/test.png --brief "Landing page" returns PASS/FAIL
$D variants --brief "..." --count 3 --output-dir /tmp/variants/ produces 3 PNGs
Running /office-hours on a UI idea produces mockups inline
bun test passes (skill validation, gen-skill-docs)
bun run test:evals passes (E2E tests)

#What I noticed about how you think

You said "that isn't design" about text descriptions and ASCII art. That's a designer's instinct — you know the difference between describing a thing and showing a thing. Most people building AI tools don't notice this gap because they were never designers.
You prioritized /office-hours first — the upstream leverage point. If the brainstorm produces real mockups, every downstream skill (/plan-design-review, /design-review) has a visual artifact to reference instead of re-interpreting prose.
You funded Variant and immediately thought "they should have an API." That's investor-as-user thinking — you're not just evaluating the company, you're designing how their product fits into your workflow.
When Codex challenged the opt-in premise, you accepted it immediately. No ego defense. That's the fastest path to the right answer.

#Spec Review Results

Doc survived 1 round of adversarial review. 11 issues caught and fixed. Quality score: 7/10 → estimated 8.5/10 after fixes.

Issues fixed:

OpenAI SDK dependency declared
Image data extraction path specified (response.output item shape)
--check and --retry flags formally registered in command registry
Brief input modes specified (plain text vs JSON file)
Resolver file contradiction fixed (add to existing design.ts)
HostPaths Codex env var setup noted
"Mirrors browse" reframed to "shares compilation/distribution pattern"
Session state specified (ID generation, discovery, cleanup)
"Pixel-perfect" flagged as assumption needing prototype validation
Multi-turn iteration flagged as unproven with fallback plan
$D discovery bash block fully specified with fallback to DESIGN_SKETCH

#Eng Review Completion Summary

Step 0: Scope Challenge — scope accepted as-is (full binary, user overrode reduction recommendation)
Architecture Review: 5 issues found (openai dep separation, graceful degrade, output dir config, auth model, trust boundary)
Code Quality Review: 1 issue found (8 files vs 5, kept 8)
Test Review: diagram produced, 42 gaps identified, test plan written
Performance Review: 1 issue found (parallel variants with staggered start)
NOT in scope: Google Stitch SDK integration, Figma MCP, Variant API (deferred)
What already exists: browse CLI pattern, DESIGN_SKETCH resolver, HostPaths system, gen-skill-docs pipeline
Outside voice: 4 passes (Claude structured 12 issues, Codex structured 8 issues, Claude adversarial 1 fatal flaw, Codex adversarial 1 fatal flaw). Key insight: sequential PNG→HTML workflow resolved the "opaque raster" fatal flaw.
Failure modes: 0 critical gaps (all identified failure modes have error handling + tests planned)
Lake Score: 7/7 recommendations chose complete option

#GSTACK REVIEW REPORT

Review	Trigger	Why	Runs	Status	Findings
Office Hours	`/office-hours`	Design brainstorm	1	DONE	4 premises, 1 revised (Codex: opt-in->default-on)
CEO Review	`/plan-ceo-review`	Scope & strategy	1	CLEAR	EXPANSION: 6 proposed, 6 accepted, 0 deferred
Eng Review	`/plan-eng-review`	Architecture & tests (required)	1	CLEAR	7 issues, 0 critical gaps, 4 outside voices
Design Review	`/plan-design-review`	UI/UX gaps	1	CLEAR	score: 2/10 -> 8/10, 5 decisions made
Outside Voice	structured + adversarial	Independent challenge	4	DONE	Sequential PNG->HTML workflow, trust boundary noted

CEO EXPANSIONS: Design Memory + Exploration Width, Mockup Diffing, Screenshot Evolution, Design Intent Verification, Responsive Variants, Design-to-Code Prompt. DESIGN DECISIONS: Single-column full-width layout, per-card "More like this", explicit radio Pick, smooth fade regeneration, skeleton loading states. UNRESOLVED: 0 VERDICT: CEO + ENG + DESIGN CLEARED. Ready to implement. Start with Commit 0 (prototype validation).

~cytrogen/gstack

#Design: gstack Visual Design Generation (design binary)