M .agents/skills/gstack-office-hours/SKILL.md => .agents/skills/gstack-office-hours/SKILL.md +146 -1
@@ 218,6 218,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
+## SETUP (run this check BEFORE any browse command)
+
+```bash
+_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
+B=""
+[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
+[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
+if [ -x "$B" ]; then
+ echo "READY: $B"
+else
+ echo "NEEDS_SETUP"
+fi
+```
+
+If `NEEDS_SETUP`:
+1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
+2. Run: `cd <SKILL_DIR> && ./setup`
+3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
+
# YC Office Hours
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
@@ 482,6 501,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
---
+## Visual Sketch (UI ideas only)
+
+If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
+or interactive elements), generate a rough wireframe to help the user visualize it.
+If the idea is backend-only, infrastructure, or has no UI component — skip this
+section silently.
+
+**Step 1: Gather design context**
+
+1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
+ system constraints (colors, typography, spacing, component patterns). Use these
+ constraints in the wireframe.
+2. Apply core design principles:
+ - **Information hierarchy** — what does the user see first, second, third?
+ - **Interaction states** — loading, empty, error, success, partial
+ - **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
+ - **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
+ - **Design for trust** — every interface element builds or erodes user trust.
+
+**Step 2: Generate wireframe HTML**
+
+Generate a single-page HTML file with these constraints:
+- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
+ hand-drawn-style elements. This is a sketch, not a polished mockup.
+- Self-contained — no external dependencies, no CDN links, inline CSS only
+- Show the core interaction flow (1-3 screens/states max)
+- Include realistic placeholder content (not "Lorem ipsum" — use content that
+ matches the actual use case)
+- Add HTML comments explaining design decisions
+
+Write to a temp file:
+```bash
+SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
+```
+
+**Step 3: Render and capture**
+
+```bash
+$B goto "file://$SKETCH_FILE"
+$B screenshot /tmp/gstack-sketch.png
+```
+
+If `$B` is not available (browse binary not set up), skip the render step. Tell the
+user: "Visual sketch requires the browse binary. Run the setup script to enable it."
+
+**Step 4: Present and iterate**
+
+Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
+
+If they want changes, regenerate the HTML with their feedback and re-render.
+If they approve or say "good enough," proceed.
+
+**Step 5: Include in design doc**
+
+Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
+The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
+(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
+
+---
+
## Phase 4.5: Founder Signal Synthesis
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
@@ 618,7 697,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
```
-Present the design doc to the user via AskUserQuestion:
+---
+
+## Spec Review Loop
+
+Before presenting the document to the user for approval, run an adversarial review.
+
+**Step 1: Dispatch reviewer subagent**
+
+Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
+and cannot see the brainstorming conversation — only the document. This ensures genuine
+adversarial independence.
+
+Prompt the subagent with:
+- The file path of the document just written
+- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
+ list specific issues with suggested fixes. At the end, output a quality score (1-10)
+ across all dimensions."
+
+**Dimensions:**
+1. **Completeness** — Are all requirements addressed? Missing edge cases?
+2. **Consistency** — Do parts of the document agree with each other? Contradictions?
+3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
+4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
+5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
+
+The subagent should return:
+- A quality score (1-10)
+- PASS if no issues, or a numbered list of issues with dimension, description, and fix
+
+**Step 2: Fix and re-dispatch**
+
+If the reviewer returns issues:
+1. Fix each issue in the document on disk (use Edit tool)
+2. Re-dispatch the reviewer subagent with the updated document
+3. Maximum 3 iterations total
+
+**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
+(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
+and persist those issues as "Reviewer Concerns" in the document rather than looping
+further.
+
+If the subagent fails, times out, or is unavailable — skip the review loop entirely.
+Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
+already written to disk; the review is a quality bonus, not a gate.
+
+**Step 3: Report and persist metrics**
+
+After the loop completes (PASS, max iterations, or convergence guard):
+
+1. Tell the user the result — summary by default:
+ "Your doc survived N rounds of adversarial review. M issues caught and fixed.
+ Quality score: X/10."
+ If they ask "what did the reviewer find?", show the full reviewer output.
+
+2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
+ section to the document listing each unresolved issue. Downstream skills will see this.
+
+3. Append metrics:
+```bash
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
+```
+Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
+
+---
+
+Present the reviewed design doc to the user via AskUserQuestion:
- A) Approve — mark Status: APPROVED and proceed to handoff
- B) Revise — specify which sections need changes (loop back to revise those sections)
- C) Start over — return to Phase 2
M .agents/skills/gstack-plan-ceo-review/SKILL.md => .agents/skills/gstack-plan-ceo-review/SKILL.md +95 -0
@@ 324,6 324,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
+## Prerequisite Skill Offer
+
+When the design doc check above prints "No design doc found," offer the prerequisite
+skill before proceeding.
+
+Say to the user via AskUserQuestion:
+
+> "No design doc found for this branch. `/office-hours` produces a structured problem
+> statement, premise challenge, and explored alternatives — it gives this review much
+> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
+> not per-product — it captures the thinking behind this specific change."
+
+Options:
+- A) Run /office-hours first (in another window, then come back)
+- B) Skip — proceed with standard review
+
+If they skip: "No worries — standard review. If you ever want sharper input, try
+/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
+
+**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
+articulate the problem, keeps changing the problem statement, answers with "I'm not
+sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
+
+> "It sounds like you're still figuring out what to build — that's totally fine, but
+> that's what /office-hours is designed for. Want to pause this review and run
+> /office-hours first? It'll help you nail down the problem and approach, then come
+> back here for the strategic review."
+
+Options: A) Yes, run /office-hours first. B) No, keep going.
+If they keep going, proceed normally — no guilt, no re-asking.
+
When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
* Check if deferred work from prior reviews relates to this plan
@@ 467,6 498,70 @@ Repo: {owner/repo}
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
+After writing the CEO plan, run the spec review loop on it:
+
+## Spec Review Loop
+
+Before presenting the document to the user for approval, run an adversarial review.
+
+**Step 1: Dispatch reviewer subagent**
+
+Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
+and cannot see the brainstorming conversation — only the document. This ensures genuine
+adversarial independence.
+
+Prompt the subagent with:
+- The file path of the document just written
+- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
+ list specific issues with suggested fixes. At the end, output a quality score (1-10)
+ across all dimensions."
+
+**Dimensions:**
+1. **Completeness** — Are all requirements addressed? Missing edge cases?
+2. **Consistency** — Do parts of the document agree with each other? Contradictions?
+3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
+4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
+5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
+
+The subagent should return:
+- A quality score (1-10)
+- PASS if no issues, or a numbered list of issues with dimension, description, and fix
+
+**Step 2: Fix and re-dispatch**
+
+If the reviewer returns issues:
+1. Fix each issue in the document on disk (use Edit tool)
+2. Re-dispatch the reviewer subagent with the updated document
+3. Maximum 3 iterations total
+
+**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
+(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
+and persist those issues as "Reviewer Concerns" in the document rather than looping
+further.
+
+If the subagent fails, times out, or is unavailable — skip the review loop entirely.
+Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
+already written to disk; the review is a quality bonus, not a gate.
+
+**Step 3: Report and persist metrics**
+
+After the loop completes (PASS, max iterations, or convergence guard):
+
+1. Tell the user the result — summary by default:
+ "Your doc survived N rounds of adversarial review. M issues caught and fixed.
+ Quality score: X/10."
+ If they ask "what did the reviewer find?", show the full reviewer output.
+
+2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
+ section to the document listing each unresolved issue. Downstream skills will see this.
+
+3. Append metrics:
+```bash
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
+```
+Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
+
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
M .agents/skills/gstack-plan-eng-review/SKILL.md => .agents/skills/gstack-plan-eng-review/SKILL.md +19 -0
@@ 269,6 269,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
+## Prerequisite Skill Offer
+
+When the design doc check above prints "No design doc found," offer the prerequisite
+skill before proceeding.
+
+Say to the user via AskUserQuestion:
+
+> "No design doc found for this branch. `/office-hours` produces a structured problem
+> statement, premise challenge, and explored alternatives — it gives this review much
+> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
+> not per-product — it captures the thinking behind this specific change."
+
+Options:
+- A) Run /office-hours first (in another window, then come back)
+- B) Skip — proceed with standard review
+
+If they skip: "No worries — standard review. If you ever want sharper input, try
+/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
+
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
M CHANGELOG.md => CHANGELOG.md +9 -0
@@ 1,5 1,14 @@
# Changelog
+## [0.9.1.0] - 2026-03-20 — Adversarial Spec Review + Skill Chaining
+
+### Added
+
+- **Your design docs now get stress-tested before you see them.** When you run `/office-hours`, an independent AI reviewer checks your design doc for completeness, consistency, clarity, scope creep, and feasibility — up to 3 rounds. You get a quality score (1-10) and a summary of what was caught and fixed. The doc you approve has already survived adversarial review.
+- **Visual wireframes during brainstorming.** For UI ideas, `/office-hours` now generates a rough HTML wireframe using your project's design system (from DESIGN.md) and screenshots it. You see what you're designing while you're still thinking, not after you've coded it.
+- **Skills help each other now.** `/plan-ceo-review` and `/plan-eng-review` detect when you'd benefit from running `/office-hours` first and offer it — one-tap to switch, one-tap to decline. If you seem lost during a CEO review, it'll gently suggest brainstorming first.
+- **Spec review metrics.** Every adversarial review logs iterations, issues found/fixed, and quality score to `~/.gstack/analytics/spec-review.jsonl`. Over time, you can see if your design docs are getting better.
+
## [0.9.0.1] - 2026-03-19
### Changed
M VERSION => VERSION +1 -1
@@ 1,1 1,1 @@
-0.9.0.1
+0.9.1.0
M office-hours/SKILL.md => office-hours/SKILL.md +146 -1
@@ 227,6 227,25 @@ success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
+## SETUP (run this check BEFORE any browse command)
+
+```bash
+_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
+B=""
+[ -n "$_ROOT" ] && [ -x "$_ROOT/.claude/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.claude/skills/gstack/browse/dist/browse"
+[ -z "$B" ] && B=~/.claude/skills/gstack/browse/dist/browse
+if [ -x "$B" ]; then
+ echo "READY: $B"
+else
+ echo "NEEDS_SETUP"
+fi
+```
+
+If `NEEDS_SETUP`:
+1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
+2. Run: `cd <SKILL_DIR> && ./setup`
+3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
+
# YC Office Hours
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
@@ 491,6 510,66 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
---
+## Visual Sketch (UI ideas only)
+
+If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
+or interactive elements), generate a rough wireframe to help the user visualize it.
+If the idea is backend-only, infrastructure, or has no UI component — skip this
+section silently.
+
+**Step 1: Gather design context**
+
+1. Check if `DESIGN.md` exists in the repo root. If it does, read it for design
+ system constraints (colors, typography, spacing, component patterns). Use these
+ constraints in the wireframe.
+2. Apply core design principles:
+ - **Information hierarchy** — what does the user see first, second, third?
+ - **Interaction states** — loading, empty, error, success, partial
+ - **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
+ - **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
+ - **Design for trust** — every interface element builds or erodes user trust.
+
+**Step 2: Generate wireframe HTML**
+
+Generate a single-page HTML file with these constraints:
+- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
+ hand-drawn-style elements. This is a sketch, not a polished mockup.
+- Self-contained — no external dependencies, no CDN links, inline CSS only
+- Show the core interaction flow (1-3 screens/states max)
+- Include realistic placeholder content (not "Lorem ipsum" — use content that
+ matches the actual use case)
+- Add HTML comments explaining design decisions
+
+Write to a temp file:
+```bash
+SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
+```
+
+**Step 3: Render and capture**
+
+```bash
+$B goto "file://$SKETCH_FILE"
+$B screenshot /tmp/gstack-sketch.png
+```
+
+If `$B` is not available (browse binary not set up), skip the render step. Tell the
+user: "Visual sketch requires the browse binary. Run the setup script to enable it."
+
+**Step 4: Present and iterate**
+
+Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
+
+If they want changes, regenerate the HTML with their feedback and re-render.
+If they approve or say "good enough," proceed.
+
+**Step 5: Include in design doc**
+
+Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
+The screenshot file at `/tmp/gstack-sketch.png` can be referenced by downstream skills
+(`/plan-design-review`, `/design-review`) to see what was originally envisioned.
+
+---
+
## Phase 4.5: Founder Signal Synthesis
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
@@ 627,7 706,73 @@ Supersedes: {prior filename — omit this line if first design on this branch}
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
```
-Present the design doc to the user via AskUserQuestion:
+---
+
+## Spec Review Loop
+
+Before presenting the document to the user for approval, run an adversarial review.
+
+**Step 1: Dispatch reviewer subagent**
+
+Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
+and cannot see the brainstorming conversation — only the document. This ensures genuine
+adversarial independence.
+
+Prompt the subagent with:
+- The file path of the document just written
+- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
+ list specific issues with suggested fixes. At the end, output a quality score (1-10)
+ across all dimensions."
+
+**Dimensions:**
+1. **Completeness** — Are all requirements addressed? Missing edge cases?
+2. **Consistency** — Do parts of the document agree with each other? Contradictions?
+3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
+4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
+5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
+
+The subagent should return:
+- A quality score (1-10)
+- PASS if no issues, or a numbered list of issues with dimension, description, and fix
+
+**Step 2: Fix and re-dispatch**
+
+If the reviewer returns issues:
+1. Fix each issue in the document on disk (use Edit tool)
+2. Re-dispatch the reviewer subagent with the updated document
+3. Maximum 3 iterations total
+
+**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
+(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
+and persist those issues as "Reviewer Concerns" in the document rather than looping
+further.
+
+If the subagent fails, times out, or is unavailable — skip the review loop entirely.
+Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
+already written to disk; the review is a quality bonus, not a gate.
+
+**Step 3: Report and persist metrics**
+
+After the loop completes (PASS, max iterations, or convergence guard):
+
+1. Tell the user the result — summary by default:
+ "Your doc survived N rounds of adversarial review. M issues caught and fixed.
+ Quality score: X/10."
+ If they ask "what did the reviewer find?", show the full reviewer output.
+
+2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
+ section to the document listing each unresolved issue. Downstream skills will see this.
+
+3. Append metrics:
+```bash
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
+```
+Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
+
+---
+
+Present the reviewed design doc to the user via AskUserQuestion:
- A) Approve — mark Status: APPROVED and proceed to handoff
- B) Revise — specify which sections need changes (loop back to revise those sections)
- C) Start over — return to Phase 2
M office-hours/SKILL.md.tmpl => office-hours/SKILL.md.tmpl +13 -1
@@ 23,6 23,8 @@ allowed-tools:
{{PREAMBLE}}
+{{BROWSE_SETUP}}
+
# YC Office Hours
You are a **YC office hours partner**. Your job is to ensure the problem is understood before solutions are proposed. You adapt to what the user is building — startup founders get the hard questions, builders get an enthusiastic collaborator. This skill produces design docs, not code.
@@ 287,6 289,10 @@ Present via AskUserQuestion. Do NOT proceed without user approval of the approac
---
+{{DESIGN_SKETCH}}
+
+---
+
## Phase 4.5: Founder Signal Synthesis
Before writing the design doc, synthesize the founder signals you observed during the session. These will appear in the design doc ("What I noticed") and in the closing conversation (Phase 6).
@@ 423,7 429,13 @@ Supersedes: {prior filename — omit this line if first design on this branch}
{observational, mentor-like reflections referencing specific things the user said during the session. Quote their words back to them — don't characterize their behavior. 2-4 bullets.}
```
-Present the design doc to the user via AskUserQuestion:
+---
+
+{{SPEC_REVIEW_LOOP}}
+
+---
+
+Present the reviewed design doc to the user via AskUserQuestion:
- A) Approve — mark Status: APPROVED and proceed to handoff
- B) Revise — specify which sections need changes (loop back to revise those sections)
- C) Start over — return to Phase 2
M plan-ceo-review/SKILL.md => plan-ceo-review/SKILL.md +96 -0
@@ 10,6 10,7 @@ description: |
or "is this ambitious enough".
Proactively suggest when the user is questioning scope or ambition of a plan,
or when the plan feels like it could be thinking bigger.
+benefits-from: [office-hours]
allowed-tools:
- Read
- Grep
@@ 331,6 332,37 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
+## Prerequisite Skill Offer
+
+When the design doc check above prints "No design doc found," offer the prerequisite
+skill before proceeding.
+
+Say to the user via AskUserQuestion:
+
+> "No design doc found for this branch. `/office-hours` produces a structured problem
+> statement, premise challenge, and explored alternatives — it gives this review much
+> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
+> not per-product — it captures the thinking behind this specific change."
+
+Options:
+- A) Run /office-hours first (in another window, then come back)
+- B) Skip — proceed with standard review
+
+If they skip: "No worries — standard review. If you ever want sharper input, try
+/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
+
+**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
+articulate the problem, keeps changing the problem statement, answers with "I'm not
+sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
+
+> "It sounds like you're still figuring out what to build — that's totally fine, but
+> that's what /office-hours is designed for. Want to pause this review and run
+> /office-hours first? It'll help you nail down the problem and approach, then come
+> back here for the strategic review."
+
+Options: A) Yes, run /office-hours first. B) No, keep going.
+If they keep going, proceed normally — no guilt, no re-asking.
+
When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
* Check if deferred work from prior reviews relates to this plan
@@ 474,6 506,70 @@ Repo: {owner/repo}
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
+After writing the CEO plan, run the spec review loop on it:
+
+## Spec Review Loop
+
+Before presenting the document to the user for approval, run an adversarial review.
+
+**Step 1: Dispatch reviewer subagent**
+
+Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
+and cannot see the brainstorming conversation — only the document. This ensures genuine
+adversarial independence.
+
+Prompt the subagent with:
+- The file path of the document just written
+- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
+ list specific issues with suggested fixes. At the end, output a quality score (1-10)
+ across all dimensions."
+
+**Dimensions:**
+1. **Completeness** — Are all requirements addressed? Missing edge cases?
+2. **Consistency** — Do parts of the document agree with each other? Contradictions?
+3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
+4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
+5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
+
+The subagent should return:
+- A quality score (1-10)
+- PASS if no issues, or a numbered list of issues with dimension, description, and fix
+
+**Step 2: Fix and re-dispatch**
+
+If the reviewer returns issues:
+1. Fix each issue in the document on disk (use Edit tool)
+2. Re-dispatch the reviewer subagent with the updated document
+3. Maximum 3 iterations total
+
+**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
+(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
+and persist those issues as "Reviewer Concerns" in the document rather than looping
+further.
+
+If the subagent fails, times out, or is unavailable — skip the review loop entirely.
+Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
+already written to disk; the review is a quality bonus, not a gate.
+
+**Step 3: Report and persist metrics**
+
+After the loop completes (PASS, max iterations, or convergence guard):
+
+1. Tell the user the result — summary by default:
+ "Your doc survived N rounds of adversarial review. M issues caught and fixed.
+ Quality score: X/10."
+ If they ask "what did the reviewer find?", show the full reviewer output.
+
+2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
+ section to the document listing each unresolved issue. Downstream skills will see this.
+
+3. Append metrics:
+```bash
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
+```
+Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.
+
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
M plan-ceo-review/SKILL.md.tmpl => plan-ceo-review/SKILL.md.tmpl +19 -0
@@ 10,6 10,7 @@ description: |
or "is this ambitious enough".
Proactively suggest when the user is questioning scope or ambition of a plan,
or when the plan feels like it could be thinking bigger.
+benefits-from: [office-hours]
allowed-tools:
- Read
- Grep
@@ 110,6 111,20 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists (from `/office-hours`), read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design.
+{{BENEFITS_FROM}}
+
+**Mid-session detection:** During Step 0A (Premise Challenge), if the user can't
+articulate the problem, keeps changing the problem statement, answers with "I'm not
+sure," or is clearly exploring rather than reviewing — offer `/office-hours`:
+
+> "It sounds like you're still figuring out what to build — that's totally fine, but
+> that's what /office-hours is designed for. Want to pause this review and run
+> /office-hours first? It'll help you nail down the problem and approach, then come
+> back here for the strategic review."
+
+Options: A) Yes, run /office-hours first. B) No, keep going.
+If they keep going, proceed normally — no guilt, no re-asking.
+
When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
* Check if deferred work from prior reviews relates to this plan
@@ 253,6 268,10 @@ Repo: {owner/repo}
Derive the feature slug from the plan being reviewed (e.g., "user-dashboard", "auth-refactor"). Use the date in YYYY-MM-DD format.
+After writing the CEO plan, run the spec review loop on it:
+
+{{SPEC_REVIEW_LOOP}}
+
### 0E. Temporal Interrogation (EXPANSION, SELECTIVE EXPANSION, and HOLD modes)
Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
```
M plan-eng-review/SKILL.md => plan-eng-review/SKILL.md +20 -0
@@ 8,6 8,7 @@ description: |
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation.
+benefits-from: [office-hours]
allowed-tools:
- Read
- Write
@@ 277,6 278,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
+## Prerequisite Skill Offer
+
+When the design doc check above prints "No design doc found," offer the prerequisite
+skill before proceeding.
+
+Say to the user via AskUserQuestion:
+
+> "No design doc found for this branch. `/office-hours` produces a structured problem
+> statement, premise challenge, and explored alternatives — it gives this review much
+> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
+> not per-product — it captures the thinking behind this specific change."
+
+Options:
+- A) Run /office-hours first (in another window, then come back)
+- B) Skip — proceed with standard review
+
+If they skip: "No worries — standard review. If you ever want sharper input, try
+/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
+
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
M plan-eng-review/SKILL.md.tmpl => plan-eng-review/SKILL.md.tmpl +3 -0
@@ 8,6 8,7 @@ description: |
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation.
+benefits-from: [office-hours]
allowed-tools:
- Read
- Write
@@ 73,6 74,8 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
+{{BENEFITS_FROM}}
+
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
M scripts/gen-skill-docs.ts => scripts/gen-skill-docs.ts +162 -1
@@ 55,6 55,7 @@ const HOST_PATHS: Record<Host, HostPaths> = {
interface TemplateContext {
skillName: string;
tmplPath: string;
+ benefitsFrom?: string[];
host: Host;
paths: HostPaths;
}
@@ 1261,6 1262,156 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct
---`;
}
+function generateSpecReviewLoop(_ctx: TemplateContext): string {
+ return `## Spec Review Loop
+
+Before presenting the document to the user for approval, run an adversarial review.
+
+**Step 1: Dispatch reviewer subagent**
+
+Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
+and cannot see the brainstorming conversation — only the document. This ensures genuine
+adversarial independence.
+
+Prompt the subagent with:
+- The file path of the document just written
+- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
+ list specific issues with suggested fixes. At the end, output a quality score (1-10)
+ across all dimensions."
+
+**Dimensions:**
+1. **Completeness** — Are all requirements addressed? Missing edge cases?
+2. **Consistency** — Do parts of the document agree with each other? Contradictions?
+3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
+4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
+5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
+
+The subagent should return:
+- A quality score (1-10)
+- PASS if no issues, or a numbered list of issues with dimension, description, and fix
+
+**Step 2: Fix and re-dispatch**
+
+If the reviewer returns issues:
+1. Fix each issue in the document on disk (use Edit tool)
+2. Re-dispatch the reviewer subagent with the updated document
+3. Maximum 3 iterations total
+
+**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
+(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
+and persist those issues as "Reviewer Concerns" in the document rather than looping
+further.
+
+If the subagent fails, times out, or is unavailable — skip the review loop entirely.
+Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
+already written to disk; the review is a quality bonus, not a gate.
+
+**Step 3: Report and persist metrics**
+
+After the loop completes (PASS, max iterations, or convergence guard):
+
+1. Tell the user the result — summary by default:
+ "Your doc survived N rounds of adversarial review. M issues caught and fixed.
+ Quality score: X/10."
+ If they ask "what did the reviewer find?", show the full reviewer output.
+
+2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
+ section to the document listing each unresolved issue. Downstream skills will see this.
+
+3. Append metrics:
+\`\`\`bash
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"${_ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
+\`\`\`
+Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.`;
+}
+
+function generateBenefitsFrom(ctx: TemplateContext): string {
+ if (!ctx.benefitsFrom || ctx.benefitsFrom.length === 0) return '';
+
+ const skillList = ctx.benefitsFrom.map(s => `\`/${s}\``).join(' or ');
+ const first = ctx.benefitsFrom[0];
+
+ return `## Prerequisite Skill Offer
+
+When the design doc check above prints "No design doc found," offer the prerequisite
+skill before proceeding.
+
+Say to the user via AskUserQuestion:
+
+> "No design doc found for this branch. ${skillList} produces a structured problem
+> statement, premise challenge, and explored alternatives — it gives this review much
+> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
+> not per-product — it captures the thinking behind this specific change."
+
+Options:
+- A) Run /${first} first (in another window, then come back)
+- B) Skip — proceed with standard review
+
+If they skip: "No worries — standard review. If you ever want sharper input, try
+/${first} first next time." Then proceed normally. Do not re-offer later in the session.`;
+}
+
+function generateDesignSketch(_ctx: TemplateContext): string {
+ return `## Visual Sketch (UI ideas only)
+
+If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
+or interactive elements), generate a rough wireframe to help the user visualize it.
+If the idea is backend-only, infrastructure, or has no UI component — skip this
+section silently.
+
+**Step 1: Gather design context**
+
+1. Check if \`DESIGN.md\` exists in the repo root. If it does, read it for design
+ system constraints (colors, typography, spacing, component patterns). Use these
+ constraints in the wireframe.
+2. Apply core design principles:
+ - **Information hierarchy** — what does the user see first, second, third?
+ - **Interaction states** — loading, empty, error, success, partial
+ - **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
+ - **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
+ - **Design for trust** — every interface element builds or erodes user trust.
+
+**Step 2: Generate wireframe HTML**
+
+Generate a single-page HTML file with these constraints:
+- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
+ hand-drawn-style elements. This is a sketch, not a polished mockup.
+- Self-contained — no external dependencies, no CDN links, inline CSS only
+- Show the core interaction flow (1-3 screens/states max)
+- Include realistic placeholder content (not "Lorem ipsum" — use content that
+ matches the actual use case)
+- Add HTML comments explaining design decisions
+
+Write to a temp file:
+\`\`\`bash
+SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
+\`\`\`
+
+**Step 3: Render and capture**
+
+\`\`\`bash
+$B goto "file://$SKETCH_FILE"
+$B screenshot /tmp/gstack-sketch.png
+\`\`\`
+
+If \`$B\` is not available (browse binary not set up), skip the render step. Tell the
+user: "Visual sketch requires the browse binary. Run the setup script to enable it."
+
+**Step 4: Present and iterate**
+
+Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
+
+If they want changes, regenerate the HTML with their feedback and re-render.
+If they approve or say "good enough," proceed.
+
+**Step 5: Include in design doc**
+
+Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
+The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstream skills
+(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.`;
+}
+
const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
COMMAND_REFERENCE: generateCommandReference,
SNAPSHOT_FLAGS: generateSnapshotFlags,
@@ 1272,6 1423,9 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
DESIGN_REVIEW_LITE: generateDesignReviewLite,
REVIEW_DASHBOARD: generateReviewDashboard,
TEST_BOOTSTRAP: generateTestBootstrap,
+ SPEC_REVIEW_LOOP: generateSpecReviewLoop,
+ DESIGN_SKETCH: generateDesignSketch,
+ BENEFITS_FROM: generateBenefitsFrom,
};
// ─── Codex Helpers ───────────────────────────────────────────
@@ 1394,7 1548,14 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath:
// Extract skill name from frontmatter for TemplateContext
const nameMatch = tmplContent.match(/^name:\s*(.+)$/m);
const skillName = nameMatch ? nameMatch[1].trim() : path.basename(path.dirname(tmplPath));
- const ctx: TemplateContext = { skillName, tmplPath, host, paths: HOST_PATHS[host] };
+
+ // Extract benefits-from list from frontmatter (inline YAML: benefits-from: [a, b])
+ const benefitsMatch = tmplContent.match(/^benefits-from:\s*\[([^\]]*)\]/m);
+ const benefitsFrom = benefitsMatch
+ ? benefitsMatch[1].split(',').map(s => s.trim()).filter(Boolean)
+ : undefined;
+
+ const ctx: TemplateContext = { skillName, tmplPath, benefitsFrom, host, paths: HOST_PATHS[host] };
// Replace placeholders
let content = tmplContent.replace(/\{\{(\w+)\}\}/g, (match, name) => {
M test/gen-skill-docs.test.ts => test/gen-skill-docs.test.ts +92 -0
@@ 416,6 416,98 @@ describe('REVIEW_DASHBOARD resolver', () => {
});
});
+// --- {{SPEC_REVIEW_LOOP}} resolver tests ---
+
+describe('SPEC_REVIEW_LOOP resolver', () => {
+ const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
+
+ test('contains all 5 review dimensions', () => {
+ for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
+ expect(content).toContain(dim);
+ }
+ });
+
+ test('references Agent tool for subagent dispatch', () => {
+ expect(content).toMatch(/Agent.*tool/i);
+ });
+
+ test('specifies max 3 iterations', () => {
+ expect(content).toMatch(/3.*iteration|maximum.*3/i);
+ });
+
+ test('includes quality score', () => {
+ expect(content).toContain('quality score');
+ });
+
+ test('includes metrics path', () => {
+ expect(content).toContain('spec-review.jsonl');
+ });
+
+ test('includes convergence guard', () => {
+ expect(content).toMatch(/[Cc]onvergence/);
+ });
+
+ test('includes graceful failure handling', () => {
+ expect(content).toMatch(/skip.*review|unavailable/i);
+ });
+});
+
+// --- {{DESIGN_SKETCH}} resolver tests ---
+
+describe('DESIGN_SKETCH resolver', () => {
+ const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
+
+ test('references DESIGN.md for design system constraints', () => {
+ expect(content).toContain('DESIGN.md');
+ });
+
+ test('contains wireframe or sketch terminology', () => {
+ expect(content).toMatch(/wireframe|sketch/i);
+ });
+
+ test('references browse binary for rendering', () => {
+ expect(content).toContain('$B goto');
+ });
+
+ test('references screenshot capture', () => {
+ expect(content).toContain('$B screenshot');
+ });
+
+ test('specifies rough aesthetic', () => {
+ expect(content).toMatch(/[Rr]ough|hand-drawn/);
+ });
+
+ test('includes skip conditions', () => {
+ expect(content).toMatch(/no UI component|skip/i);
+ });
+});
+
+// --- {{BENEFITS_FROM}} resolver tests ---
+
+describe('BENEFITS_FROM resolver', () => {
+ const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
+ const engContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
+
+ test('plan-ceo-review contains prerequisite skill offer', () => {
+ expect(ceoContent).toContain('Prerequisite Skill Offer');
+ expect(ceoContent).toContain('/office-hours');
+ });
+
+ test('plan-eng-review contains prerequisite skill offer', () => {
+ expect(engContent).toContain('Prerequisite Skill Offer');
+ expect(engContent).toContain('/office-hours');
+ });
+
+ test('offer includes graceful decline', () => {
+ expect(ceoContent).toContain('No worries');
+ });
+
+ test('skills without benefits-from do NOT have prerequisite offer', () => {
+ const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
+ expect(qaContent).not.toContain('Prerequisite Skill Offer');
+ });
+});
+
// ─── Codex Generation Tests ─────────────────────────────────
describe('Codex generation (--host codex)', () => {
M test/helpers/touchfiles.ts => test/helpers/touchfiles.ts +8 -0
@@ 57,9 57,13 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
'review-base-branch': ['review/**'],
'review-design-lite': ['review/**', 'test/fixtures/review-eval-design-slop.*'],
+ // Office Hours
+ 'office-hours-spec-review': ['office-hours/**', 'scripts/gen-skill-docs.ts'],
+
// Plan reviews
'plan-ceo-review': ['plan-ceo-review/**'],
'plan-ceo-review-selective': ['plan-ceo-review/**'],
+ 'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'],
'plan-eng-review': ['plan-eng-review/**'],
'plan-eng-review-artifact': ['plan-eng-review/**'],
@@ 140,6 144,10 @@ export const LLM_JUDGE_TOUCHFILES: Record<string, string[]> = {
'design-review/SKILL.md fix loop': ['design-review/SKILL.md', 'design-review/SKILL.md.tmpl'],
'design-consultation/SKILL.md research': ['design-consultation/SKILL.md', 'design-consultation/SKILL.md.tmpl'],
+ // Office Hours
+ 'office-hours/SKILL.md spec review': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
+ 'office-hours/SKILL.md design sketch': ['office-hours/SKILL.md', 'office-hours/SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
+
// Other skills
'retro/SKILL.md instructions': ['retro/SKILL.md', 'retro/SKILL.md.tmpl'],
'qa-only/SKILL.md workflow': ['qa-only/SKILL.md', 'qa-only/SKILL.md.tmpl'],
M test/skill-e2e.test.ts => test/skill-e2e.test.ts +122 -0
@@ 2911,6 2911,128 @@ Write the full output (including the GATE verdict) to ${codexDir}/codex-output.m
}, 360_000);
});
+// --- Office Hours Spec Review E2E ---
+
+describeIfSelected('Office Hours Spec Review E2E', ['office-hours-spec-review'], () => {
+ let ohDir: string;
+
+ beforeAll(() => {
+ ohDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-oh-spec-'));
+ const run = (cmd: string, args: string[]) =>
+ spawnSync(cmd, args, { cwd: ohDir, stdio: 'pipe', timeout: 5000 });
+
+ run('git', ['init', '-b', 'main']);
+ run('git', ['config', 'user.email', 'test@test.com']);
+ run('git', ['config', 'user.name', 'Test']);
+ fs.writeFileSync(path.join(ohDir, 'README.md'), '# Test Project\n');
+ run('git', ['add', '.']);
+ run('git', ['commit', '-m', 'init']);
+
+ // Copy office-hours skill
+ fs.mkdirSync(path.join(ohDir, 'office-hours'), { recursive: true });
+ fs.copyFileSync(
+ path.join(ROOT, 'office-hours', 'SKILL.md'),
+ path.join(ohDir, 'office-hours', 'SKILL.md'),
+ );
+ });
+
+ afterAll(() => {
+ try { fs.rmSync(ohDir, { recursive: true, force: true }); } catch {}
+ });
+
+ test('/office-hours SKILL.md contains spec review loop', async () => {
+ const result = await runSkillTest({
+ prompt: `Read office-hours/SKILL.md. I want to understand the spec review loop.
+
+Summarize what the "Spec Review Loop" section does — specifically:
+1. How many dimensions does the reviewer check?
+2. What tool is used to dispatch the reviewer?
+3. What's the maximum number of iterations?
+4. What metrics are tracked?
+
+Write your summary to ${ohDir}/spec-review-summary.md`,
+ workingDirectory: ohDir,
+ maxTurns: 8,
+ timeout: 120_000,
+ testName: 'office-hours-spec-review',
+ runId,
+ });
+
+ logCost('/office-hours spec review', result);
+ recordE2E('/office-hours-spec-review', 'Office Hours Spec Review E2E', result);
+ expect(result.exitReason).toBe('success');
+
+ const summaryPath = path.join(ohDir, 'spec-review-summary.md');
+ if (fs.existsSync(summaryPath)) {
+ const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
+ // Verify the agent understood the key concepts
+ expect(summary).toMatch(/5.*dimension|dimension.*5|completeness|consistency|clarity|scope|feasibility/);
+ expect(summary).toMatch(/agent|subagent/);
+ expect(summary).toMatch(/3.*iteration|iteration.*3|maximum.*3/);
+ }
+ }, 180_000);
+});
+
+// --- Plan CEO Review Benefits-From E2E ---
+
+describeIfSelected('Plan CEO Review Benefits-From E2E', ['plan-ceo-review-benefits'], () => {
+ let benefitsDir: string;
+
+ beforeAll(() => {
+ benefitsDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-benefits-'));
+ const run = (cmd: string, args: string[]) =>
+ spawnSync(cmd, args, { cwd: benefitsDir, stdio: 'pipe', timeout: 5000 });
+
+ run('git', ['init', '-b', 'main']);
+ run('git', ['config', 'user.email', 'test@test.com']);
+ run('git', ['config', 'user.name', 'Test']);
+ fs.writeFileSync(path.join(benefitsDir, 'README.md'), '# Test Project\n');
+ run('git', ['add', '.']);
+ run('git', ['commit', '-m', 'init']);
+
+ // Copy plan-ceo-review skill
+ fs.mkdirSync(path.join(benefitsDir, 'plan-ceo-review'), { recursive: true });
+ fs.copyFileSync(
+ path.join(ROOT, 'plan-ceo-review', 'SKILL.md'),
+ path.join(benefitsDir, 'plan-ceo-review', 'SKILL.md'),
+ );
+ });
+
+ afterAll(() => {
+ try { fs.rmSync(benefitsDir, { recursive: true, force: true }); } catch {}
+ });
+
+ test('/plan-ceo-review SKILL.md contains prerequisite skill offer', async () => {
+ const result = await runSkillTest({
+ prompt: `Read plan-ceo-review/SKILL.md. Search for sections about "Prerequisite" or "office-hours" or "design doc found".
+
+Summarize what happens when no design doc is found — specifically:
+1. Is /office-hours offered as a prerequisite?
+2. What options does the user get?
+3. Is there a mid-session detection for when the user seems lost?
+
+Write your summary to ${benefitsDir}/benefits-summary.md`,
+ workingDirectory: benefitsDir,
+ maxTurns: 8,
+ timeout: 120_000,
+ testName: 'plan-ceo-review-benefits',
+ runId,
+ });
+
+ logCost('/plan-ceo-review benefits-from', result);
+ recordE2E('/plan-ceo-review-benefits', 'Plan CEO Review Benefits-From E2E', result);
+ expect(result.exitReason).toBe('success');
+
+ const summaryPath = path.join(benefitsDir, 'benefits-summary.md');
+ if (fs.existsSync(summaryPath)) {
+ const summary = fs.readFileSync(summaryPath, 'utf-8').toLowerCase();
+ // Verify the agent understood the skill chaining
+ expect(summary).toMatch(/office.hours/);
+ expect(summary).toMatch(/design doc|no design/i);
+ }
+ }, 180_000);
+});
+
// Module-level afterAll — finalize eval collector after all tests complete
afterAll(async () => {
if (evalCollector) {
M test/skill-validation.test.ts => test/skill-validation.test.ts +69 -0
@@ 644,6 644,59 @@ describe('office-hours skill structure', () => {
test('contains builder operating principles', () => {
expect(content).toContain('Delight is the currency');
});
+
+ // Spec Review Loop (Phase 5.5)
+ test('contains spec review loop', () => {
+ expect(content).toContain('Spec Review Loop');
+ });
+
+ test('contains adversarial review dimensions', () => {
+ for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
+ expect(content).toContain(dim);
+ }
+ });
+
+ test('contains subagent dispatch instruction', () => {
+ expect(content).toMatch(/Agent.*tool|subagent/i);
+ });
+
+ test('contains max 3 iterations', () => {
+ expect(content).toMatch(/3.*iteration|maximum.*3/i);
+ });
+
+ test('contains quality score', () => {
+ expect(content).toContain('quality score');
+ });
+
+ test('contains spec review metrics path', () => {
+ expect(content).toContain('spec-review.jsonl');
+ });
+
+ test('contains convergence guard', () => {
+ expect(content).toMatch(/convergence/i);
+ });
+
+ // Visual Sketch (Phase 4.5)
+ test('contains visual sketch section', () => {
+ expect(content).toContain('Visual Sketch');
+ });
+
+ test('contains wireframe generation', () => {
+ expect(content).toMatch(/wireframe|sketch/i);
+ });
+
+ test('contains DESIGN.md awareness', () => {
+ expect(content).toContain('DESIGN.md');
+ });
+
+ test('contains browse rendering', () => {
+ expect(content).toContain('$B goto');
+ expect(content).toContain('$B screenshot');
+ });
+
+ test('contains rough aesthetic instruction', () => {
+ expect(content).toMatch(/rough|hand-drawn/i);
+ });
});
describe('investigate skill structure', () => {
@@ 856,6 909,22 @@ describe('CEO review mode validation', () => {
expect(content).toContain('HOLD SCOPE');
expect(content).toContain('REDUCTION');
});
+
+ // Skill chaining (benefits-from)
+ test('contains prerequisite skill offer for office-hours', () => {
+ expect(content).toContain('Prerequisite Skill Offer');
+ expect(content).toContain('/office-hours');
+ });
+
+ test('contains mid-session detection', () => {
+ expect(content).toContain('Mid-session detection');
+ expect(content).toMatch(/still figuring out|seems lost/i);
+ });
+
+ // Spec review on CEO plans
+ test('contains spec review loop for CEO plan documents', () => {
+ expect(content).toContain('Spec Review Loop');
+ });
});
// --- gstack-slug helper ---