design binary)Generated by /office-hours on 2026-03-26 Branch: garrytan/agent-design-tools Repo: gstack Status: DRAFT Mode: Intrapreneurship
gstack's design skills (/office-hours, /design-consultation, /plan-design-review, /design-review) all produce text descriptions of design — DESIGN.md files with hex codes, plan docs with pixel specs in prose, ASCII art wireframes. The creator is a designer who hand-designed HelloSign in OmniGraffle and finds this embarrassing.
The unit of value is wrong. Users don't need richer design language — they need an executable visual artifact that changes the conversation from "do you like this spec?" to "is this the screen?"
Design skills describe design in text instead of showing it. The Argus UX overhaul plan is the example: 487 lines of detailed emotional arc specs, typography choices, animation timing — zero visual artifacts. An AI coding agent that "designs" should produce something you can look at and react to viscerally.
The creator/primary user finds the current output embarrassing. Every design skill session ends with prose where a mockup should be. GPT Image API now generates pixel-perfect UI mockups with accurate text rendering — the capability gap that justified text-only output no longer exists.
A compiled TypeScript binary (design/dist/design) that wraps the OpenAI Images/Responses API, callable from skill templates via $D (mirroring the existing $B browse binary pattern). Priority integration order: /office-hours → /plan-design-review → /design-consultation → /design-review.
design binary that any skill can call.Codex independently validated the core thesis: "The failure is not output quality within markdown; it is that the current unit of value is wrong." Key contributions:
visual_mockup.ts utility, /office-hours + /plan-design-review only, hero mockup + 2 variantsdesign Binary (Approach B)Shares the browse binary's compilation and distribution pattern (bun build --compile, setup script, $VARIABLE resolution in skill templates) but is architecturally simpler — no persistent daemon server, no Chromium, no health checks, no token auth. The design binary is a stateless CLI that makes OpenAI API calls and writes PNGs to disk. Session state (for multi-turn iteration) is a JSON file.
New dependency: openai npm package (add to devDependencies, NOT runtime deps). Design binary compiled separately from browse so openai doesn't bloat the browse binary.
design/
├── src/
│ ├── cli.ts # Entry point, command dispatch
│ ├── commands.ts # Command registry (source of truth for docs + validation)
│ ├── generate.ts # Generate mockups from structured brief
│ ├── iterate.ts # Multi-turn iteration on existing mockups
│ ├── variants.ts # Generate N design variants from brief
│ ├── check.ts # Vision-based quality gate (GPT-4o)
│ ├── brief.ts # Structured brief type + assembly helpers
│ └── session.ts # Session state (response IDs for multi-turn)
├── dist/
│ ├── design # Compiled binary
│ └── .version # Git hash
└── test/
└── design.test.ts # Integration tests
# Generate a hero mockup from a structured brief
$D generate --brief "Dashboard for a coding assessment tool. Dark theme, cream accents. Shows: builder name, score badge, narrative letter, score cards. Target: technical users." --output /tmp/mockup-hero.png
# Generate 3 design variants
$D variants --brief "..." --count 3 --output-dir /tmp/mockups/
# Iterate on an existing mockup with feedback
$D iterate --session /tmp/design-session.json --feedback "Make the score cards larger, move the narrative above the scores" --output /tmp/mockup-v2.png
# Vision-based quality check (returns PASS/FAIL + issues)
$D check --image /tmp/mockup-hero.png --brief "Dashboard with builder name, score badge, narrative"
# One-shot with quality gate + auto-retry
$D generate --brief "..." --output /tmp/mockup.png --check --retry 1
# Pass a structured brief via JSON file
$D generate --brief-file /tmp/brief.json --output /tmp/mockup.png
# Generate comparison board HTML for user review
$D compare --images /tmp/mockups/variant-*.png --output /tmp/design-board.html
# Guided API key setup + smoke test
$D setup
Brief input modes:
--brief "plain text" — free-form text prompt (simple mode)--brief-file path.json — structured JSON matching the DesignBrief interface (rich mode)--brief-fileAll commands are registered in commands.ts including --check and --retry as flags on generate.
The workflow is sequential, not parallel. PNGs are for visual exploration (human-facing), HTML wireframes are for implementation (agent-facing):
1. $D variants --brief "..." --count 3 --output-dir /tmp/mockups/
→ Generates 2-5 PNG mockup variations
2. $D compare --images /tmp/mockups/*.png --output /tmp/design-board.html
→ Generates HTML comparison board (spec below)
3. $B goto file:///tmp/design-board.html
→ User reviews all variants in headed Chrome
4. User picks favorite, rates, comments, clicks [Submit]
Agent polls: $B eval document.getElementById('status').textContent
Agent reads: $B eval document.getElementById('feedback-result').textContent
→ No clipboard, no pasting. Agent reads feedback directly from the page.
5. Claude generates HTML wireframe via DESIGN_SKETCH matching approved direction
→ Agent implements from the inspectable HTML, not the opaque PNG
Classifier: APP UI (task-focused, utility page). No product branding.
Layout: Single column, full-width mockups. Each variant gets the full viewport width for maximum image fidelity. Users scroll vertically through variants.
┌─────────────────────────────────────────────────────────────┐
│ HEADER BAR │
│ "Design Exploration" . project name . "3 variants" │
│ Mode indicator: [Wide exploration] | [Matching DESIGN.md] │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ VARIANT A (full width) │ │
│ │ [ mockup PNG, max-width: 1200px ] │ │
│ ├───────────────────────────────────────────────────────┤ │
│ │ (●) Pick ★★★★☆ [What do you like/dislike?____] │ │
│ │ [More like this] │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ VARIANT B (full width) │ │
│ │ [ mockup PNG, max-width: 1200px ] │ │
│ ├───────────────────────────────────────────────────────┤ │
│ │ ( ) Pick ★★★☆☆ [What do you like/dislike?____] │ │
│ │ [More like this] │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ... (scroll for more variants) │
│ │
│ ─── separator ───────────────────────────────────────── │
│ Overall direction (optional, collapsed by default) │
│ [textarea, 3 lines, expand on focus] │
│ │
│ ─── REGENERATE BAR (#f7f7f7 bg) ─────────────────────── │
│ "Want to explore more?" │
│ [Totally different] [Match my design] [Custom: ______] │
│ [Regenerate ->] │
│ ───────────────────────────────────────────────────────── │
│ [ ✓ Submit ] │
└─────────────────────────────────────────────────────────────┘
Visual spec:
Interaction states:
Feedback JSON structure (written to hidden #feedback-result element):
{
"preferred": "A",
"ratings": { "A": 4, "B": 3, "C": 2 },
"comments": {
"A": "Love the spacing, header feels right",
"B": "Too busy, but good color palette",
"C": "Wrong mood entirely"
},
"overall": "Go with A, make the CTA bigger",
"regenerated": false
}
Accessibility: Star ratings keyboard navigable (arrow keys). Textareas labeled ("Feedback for Variant A"). Submit/Regenerate keyboard accessible with visible focus ring. All text #333+ on white.
Responsive: >1200px: comfortable margins. 768-1200px: tighter margins. <768px: full-width, no horizontal scroll.
Screenshot consent (first-time only for $D evolve): "This will send a screenshot of your live site to OpenAI for design evolution. [Proceed] [Don't ask again]" Stored in ~/.gstack/config.yaml as design_screenshot_consent.
Why sequential: Codex adversarial review identified that raster PNGs are opaque to agents (no DOM, no states, no diffable structure). HTML wireframes preserve a bridge back to code. The PNG is for the human to say "yes, that's right." The HTML is for the agent to say "I know how to build this."
1. Stateless CLI, not daemon
Browse needs a persistent Chromium instance. Design is just API calls — no reason for a server. Session state for multi-turn iteration is a JSON file written to /tmp/design-session-{id}.json containing previous_response_id.
${PID}-${timestamp}, passed via --session flaggenerate command creates the session file and prints its path; iterate reads it via --session2. Structured brief input The brief is the interface between skill prose and image generation. Skills construct it from design context:
interface DesignBrief {
goal: string; // "Dashboard for coding assessment tool"
audience: string; // "Technical users, YC partners"
style: string; // "Dark theme, cream accents, minimal"
elements: string[]; // ["builder name", "score badge", "narrative letter"]
constraints?: string; // "Max width 1024px, mobile-first"
reference?: string; // Path to existing screenshot or DESIGN.md excerpt
screenType: string; // "desktop-dashboard" | "mobile-app" | "landing-page" | etc.
}
3. Default-on in design skills Skills generate mockups by default. The template includes skip language:
Generating visual mockup of the proposed design... (say "skip" if you don't need visuals)
4. Vision quality gate After generating, optionally pass the image through GPT-4o vision to check:
5. Output location: explorations in /tmp, approved finals in docs/designs/
/tmp/gstack-mockups-{session}/ (ephemeral, not committed)docs/designs/ (checked in)design_output_dir setting{skill}-{description}-{timestamp}.pngdocs/designs/ if it doesn't exist (mkdir -p)/tmp/gstack-mockup-{timestamp}.png6. Trust boundary acknowledgment Default-on generation sends design brief text to OpenAI. This is a new external data flow vs. the existing HTML wireframe path which is entirely local. The brief contains only abstract design descriptions (goal, style, elements), never source code or user data. Screenshots from $B are NOT sent to OpenAI (the reference field in DesignBrief is a local file path used by the agent, not uploaded to the API). Document this in CLAUDE.md.
7. Rate limit mitigation
Variant generation uses staggered parallel: start each API call 1 second apart via Promise.allSettled() with delays. This avoids the 5-7 RPM rate limit on image generation while still being faster than fully serial. If any call 429s, retry with exponential backoff (2s, 4s, 8s).
Add to existing resolver: scripts/resolvers/design.ts (NOT a new file)
generateDesignSetup() for {{DESIGN_SETUP}} placeholder (mirrors generateBrowseSetup())generateDesignMockup() for {{DESIGN_MOCKUP}} placeholder (full exploration workflow)New HostPaths entry: types.ts
// claude host:
designDir: '~/.claude/skills/gstack/design/dist'
// codex host:
designDir: '$GSTACK_DESIGN'
Note: Codex runtime setup (setup script) must also export GSTACK_DESIGN env var, similar to how GSTACK_BROWSE is set.
$D resolution bash block (generated by {{DESIGN_SETUP}}):
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
D=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/design/dist/design" ] && D="$_ROOT/.claude/skills/gstack/design/dist/design"
[ -z "$D" ] && D=~/.claude/skills/gstack/design/dist/design
if [ -x "$D" ]; then
echo "DESIGN_READY: $D"
else
echo "DESIGN_NOT_AVAILABLE"
fi
If DESIGN_NOT_AVAILABLE: skills fall back to HTML wireframe generation (existing DESIGN_SKETCH pattern). Design mockup is a progressive enhancement, not a hard requirement.
New functions in existing resolver: scripts/resolvers/design.ts
generateDesignSetup() for {{DESIGN_SETUP}} — mirrors generateBrowseSetup() patterngenerateDesignMockup() for {{DESIGN_MOCKUP}} — the full generate+check+present workflow1. /office-hours — Replace the Visual Sketch section
2. /plan-design-review — "What better looks like"
3. /design-consultation — Design system preview
4. /design-review — Design intent comparison
| File | Purpose |
|---|---|
design/src/cli.ts |
Entry point, command dispatch |
design/src/commands.ts |
Command registry |
design/src/generate.ts |
GPT Image generation via Responses API |
design/src/iterate.ts |
Multi-turn iteration with session state |
design/src/variants.ts |
Generate N design variants |
design/src/check.ts |
Vision-based quality gate |
design/src/brief.ts |
Structured brief types + helpers |
design/src/session.ts |
Session state management |
design/src/compare.ts |
HTML comparison board generator |
design/test/design.test.ts |
Integration tests (mock OpenAI API) |
(none — add to existing scripts/resolvers/design.ts) |
{{DESIGN_SETUP}} + {{DESIGN_MOCKUP}} resolvers |
| File | Change |
|---|---|
scripts/resolvers/types.ts |
Add designDir to HostPaths |
scripts/resolvers/index.ts |
Register DESIGN_SETUP + DESIGN_MOCKUP resolvers |
package.json |
Add design build command |
setup |
Build design binary alongside browse |
scripts/resolvers/preamble.ts |
Add GSTACK_DESIGN env var export for Codex host |
test/gen-skill-docs.test.ts |
Update DESIGN_SKETCH test suite for new resolvers |
setup |
Add design binary build + Codex/Kiro asset linking |
office-hours/SKILL.md.tmpl |
Replace Visual Sketch section with {{DESIGN_MOCKUP}} |
plan-design-review/SKILL.md.tmpl |
Add {{DESIGN_SETUP}} + mockup generation for low-scoring dimensions |
| Code | Location | Used For |
|---|---|---|
| Browse CLI pattern | browse/src/cli.ts |
Command dispatch architecture |
commands.ts registry |
browse/src/commands.ts |
Single source of truth pattern |
generateBrowseSetup() |
scripts/resolvers/browse.ts |
Template for generateDesignSetup() |
DESIGN_SKETCH resolver |
scripts/resolvers/design.ts |
Template for DESIGN_MOCKUP resolver |
| HostPaths system | scripts/resolvers/types.ts |
Multi-host path resolution |
| Build pipeline | package.json build script |
bun build --compile pattern |
Generate: OpenAI Responses API with image_generation tool
const response = await openai.responses.create({
model: "gpt-4o",
input: briefToPrompt(brief),
tools: [{ type: "image_generation", size: "1536x1024", quality: "high" }],
});
// Extract image from response output items
const imageItem = response.output.find(item => item.type === "image_generation_call");
const base64Data = imageItem.result; // base64-encoded PNG
fs.writeFileSync(outputPath, Buffer.from(base64Data, "base64"));
Iterate: Same API with previous_response_id
const response = await openai.responses.create({
model: "gpt-4o",
input: feedback,
previous_response_id: session.lastResponseId,
tools: [{ type: "image_generation" }],
});
NOTE: Multi-turn image iteration via previous_response_id is an assumption that needs prototype validation. The Responses API supports conversation threading, but whether it retains visual context of generated images for edit-style iteration is not confirmed in docs. Fallback: if multi-turn doesn't work, iterate falls back to re-generating with the original brief + accumulated feedback in a single prompt.
Check: GPT-4o vision
const check = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{
role: "user",
content: [
{ type: "image_url", image_url: { url: `data:image/png;base64,${imageData}` } },
{ type: "text", text: `Check this UI mockup. Brief: ${brief}. Is text readable? Are all elements present? Does it look like a real UI? Return PASS or FAIL with issues.` }
]
}]
});
Cost: ~$0.10-$0.40 per design session (1 hero + 2 variants + 1 quality check + 1 iteration). Negligible next to the LLM costs already in each skill invocation.
Codex OAuth tokens DO NOT work for image generation. Tested 2026-03-26: both the Images API and Responses API reject ~/.codex/auth.json access_token with "Missing scopes: api.model.images.request". Codex CLI also has no native imagegen capability.
Auth resolution order:
~/.gstack/openai.json → { "api_key": "sk-..." } (file permissions 0600)OPENAI_API_KEY environment variable~/.gstack/openai.json with 0600 permissionsNew command: $D setup — guided API key setup + smoke test. Can be run anytime to update the key.
previous_response_id retains visual context is unproven (see API Details section).Prototype validation plan: Build Commit 1 (core generate + check), run 10 design briefs across different screen types, evaluate output quality before proceeding to skill integration.
$D diff --before old.png --after new.png generates visual diff$D evolve --screenshot current.png --brief "make it calmer"$D variants --brief "..." --viewports desktop,tablet,mobile/office-hours on a UI idea produces actual PNG mockups alongside the design doc/plan-design-review shows "what better looks like" as a mockup, not proseThe design binary is compiled and distributed alongside the browse binary:
bun build --compile design/src/cli.ts --outfile design/dist/design./setup and bun run build~/.claude/skills/gstack/ install pathdesign/src/ with cli.ts, commands.ts, generate.ts, check.ts, brief.ts, session.ts, compare.tscompare command generates HTML comparison board with per-variant feedback textareaspackage.json build command (separate bun build --compile from browse)setup script integration (including Codex + Kiro asset linking)design/src/variants.ts, design/src/iterate.tsgenerateDesignSetup() + generateDesignMockup() to existing scripts/resolvers/design.tsdesignDir to HostPaths in scripts/resolvers/types.tsscripts/resolvers/index.tsscripts/resolvers/preamble.ts (Codex host)test/gen-skill-docs.test.ts (DESIGN_SKETCH test suite){{DESIGN_MOCKUP}}{{DESIGN_SETUP}} and mockup generation for low-scoring dimensions$D diff command: takes two PNGs, uses GPT-4o vision to identify differences, generates overlay$D verify command: screenshots live site via $B, diffs against approved mockup from docs/designs/$D evolve command: takes screenshot + brief, generates "how it should look" mockup--viewports flag on $D variants for multi-size generationTell Variant to build an API. As their investor: "I'm building a workflow where AI agents generate visual designs programmatically. GPT Image API works today — but I'd rather use Variant because the multi-variation approach is better for design exploration. Ship an API endpoint: prompt in, React code + preview image out. I'll be your first integration partner."
bun run build compiles design/dist/design binary$D generate --brief "Landing page for a developer tool" --output /tmp/test.png produces a real PNG$D check --image /tmp/test.png --brief "Landing page" returns PASS/FAIL$D variants --brief "..." --count 3 --output-dir /tmp/variants/ produces 3 PNGs/office-hours on a UI idea produces mockups inlinebun test passes (skill validation, gen-skill-docs)bun run test:evals passes (E2E tests)Doc survived 1 round of adversarial review. 11 issues caught and fixed. Quality score: 7/10 → estimated 8.5/10 after fixes.
Issues fixed:
| Review | Trigger | Why | Runs | Status | Findings |
|---|---|---|---|---|---|
| Office Hours | /office-hours |
Design brainstorm | 1 | DONE | 4 premises, 1 revised (Codex: opt-in->default-on) |
| CEO Review | /plan-ceo-review |
Scope & strategy | 1 | CLEAR | EXPANSION: 6 proposed, 6 accepted, 0 deferred |
| Eng Review | /plan-eng-review |
Architecture & tests (required) | 1 | CLEAR | 7 issues, 0 critical gaps, 4 outside voices |
| Design Review | /plan-design-review |
UI/UX gaps | 1 | CLEAR | score: 2/10 -> 8/10, 5 decisions made |
| Outside Voice | structured + adversarial | Independent challenge | 4 | DONE | Sequential PNG->HTML workflow, trust boundary noted |
CEO EXPANSIONS: Design Memory + Exploration Width, Mockup Diffing, Screenshot Evolution, Design Intent Verification, Responsive Variants, Design-to-Code Prompt. DESIGN DECISIONS: Single-column full-width layout, per-card "More like this", explicit radio Pick, smooth fade regeneration, skeleton loading states. UNRESOLVED: 0 VERDICT: CEO + ENG + DESIGN CLEARED. Ready to implement. Start with Commit 0 (prototype validation).