~cytrogen/gstack

997f7b1da6a19879fa5bc79c4fe5f71900b8c19f — Garry Tan 14 days ago 1bf888d
fix: review log architecture — close gaps, add attribution (v0.11.21.0) (#512)

* fix: review log architecture — close gaps, fix orphans, add attribution

- Ship Step 3.5 now logs its code review to the review log (via:"ship")
- Remove eng review gate — ship runs its own review in Step 3.5
- Dashboard Outside Voice row mapped to codex-plan-review
- Dashboard shows via source attribution (e.g., "via /autoplan")
- land-and-deploy checks all 8 review skill types (was 5)
- codex-review log gets commit field for staleness detection
- autoplan uses placeholder tokens instead of hardcoded "clean"
- Document autoplan-voices as audit-trail-only in review.ts
- E2E test for dashboard via attribution

* chore: bump version and changelog (v0.11.21.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
M CHANGELOG.md => CHANGELOG.md +11 -0
@@ 1,5 1,16 @@
# Changelog

## [0.11.21.0] - 2026-03-26

### Fixed

- **`/autoplan` reviews now count toward the ship readiness gate.** When `/autoplan` ran full CEO + Design + Eng reviews, `/ship` still showed "0 runs" for Eng Review because autoplan-logged entries weren't being read correctly. Now the dashboard shows source attribution (e.g., "CLEAR (PLAN via /autoplan)") so you can see exactly which tool satisfied each review.
- **`/ship` no longer tells you to "run /review first."** Ship runs its own pre-landing review in Step 3.5 — asking you to run the same review separately was redundant. The gate is removed; ship just does it.
- **`/land-and-deploy` now checks all 8 review types.** Previously missed `review`, `adversarial-review`, and `codex-plan-review` — if you only ran `/review` (not `/plan-eng-review`), land-and-deploy wouldn't see it.
- **Dashboard Outside Voice row now works.** Was showing "0 runs" even after outside voices ran in `/plan-ceo-review` or `/plan-eng-review`. Now correctly maps to `codex-plan-review` entries.
- **`/codex review` now tracks staleness.** Added the `commit` field to codex review log entries so the dashboard can detect when a codex review is outdated.
- **`/autoplan` no longer hardcodes "clean" status.** Review log entries from autoplan used to always record `status:"clean"` even when issues were found. Now uses proper placeholder tokens that Claude substitutes with real values.

## [0.11.20.0] - 2026-03-26

### Added

M VERSION => VERSION +1 -1
@@ 1,1 1,1 @@
0.11.20.0
0.11.21.0

M autoplan/SKILL.md => autoplan/SKILL.md +6 -6
@@ 929,24 929,24 @@ AskUserQuestion options:

## Completion: Write Review Logs

On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.

```bash
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
```

If Phase 2 ran (UI scope):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
```

Replace field values with actual counts from the review.

Dual voice logs (one per phase that ran):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'

M autoplan/SKILL.md.tmpl => autoplan/SKILL.md.tmpl +6 -6
@@ 584,24 584,24 @@ AskUserQuestion options:

## Completion: Write Review Logs

On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.

```bash
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'

~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
```

If Phase 2 ran (UI scope):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
```

Replace field values with actual counts from the review.

Dual voice logs (one per phase that ran):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'

M codex/SKILL.md => codex/SKILL.md +1 -1
@@ 423,7 423,7 @@ CROSS-MODEL ANALYSIS:

7. Persist the review result:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
```

Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),

M codex/SKILL.md.tmpl => codex/SKILL.md.tmpl +1 -1
@@ 127,7 127,7 @@ CROSS-MODEL ANALYSIS:

7. Persist the review result:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
```

Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),

M land-and-deploy/SKILL.md => land-and-deploy/SKILL.md +3 -2
@@ 447,7 447,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
```

Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
plan-design-review, design-review-lite, codex-review):
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
codex-plan-review):

1. Find the most recent entry within the last 7 days.
2. Extract its `commit` field.


@@ 594,7 595,7 @@ Use AskUserQuestion:
- C) Merge anyway — I understand the risks (Completeness: 3/10)

If the user chooses B: **STOP.** List exactly what needs to be done:
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
- If E2E not run: "Run `bun run test:e2e` to verify."
- If docs not updated: "Run /document-release to update documentation."
- If PR body stale: "Update the PR body to reflect current changes."

M land-and-deploy/SKILL.md.tmpl => land-and-deploy/SKILL.md.tmpl +3 -2
@@ 134,7 134,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
```

Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
plan-design-review, design-review-lite, codex-review):
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
codex-plan-review):

1. Find the most recent entry within the last 7 days.
2. Extract its `commit` field.


@@ 281,7 282,7 @@ Use AskUserQuestion:
- C) Merge anyway — I understand the risks (Completeness: 3/10)

If the user chooses B: **STOP.** List exactly what needs to be done:
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
- If E2E not run: "Run `bun run test:e2e` to verify."
- If docs not updated: "Run /document-release to update documentation."
- If PR body stale: "Update the PR body to reflect current changes."

M plan-ceo-review/SKILL.md => plan-ceo-review/SKILL.md +7 -1
@@ 1262,7 1262,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

```
+====================================================================+

M plan-design-review/SKILL.md => plan-design-review/SKILL.md +7 -1
@@ 768,7 768,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

```
+====================================================================+

M plan-eng-review/SKILL.md => plan-eng-review/SKILL.md +7 -1
@@ 870,7 870,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

```
+====================================================================+

M scripts/resolvers/review.ts => scripts/resolvers/review.ts +7 -1
@@ 9,7 9,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
\`\`\`

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent \`codex-plan-review\` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

**Source attribution:** If the most recent entry for a skill has a \\\`"via"\\\` field, append it to the status label in parentheses. Examples: \`plan-eng-review\` with \`via:"autoplan"\` shows as "CLEAR (PLAN via /autoplan)". \`review\` with \`via:"ship"\` shows as "CLEAR (DIFF via /ship)". Entries without a \`via\` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: \`autoplan-voices\` and \`design-outside-voices\` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

\`\`\`
+====================================================================+

M ship/SKILL.md => ship/SKILL.md +21 -19
@@ 364,7 364,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.

**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.

Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.

Display:

```
+====================================================================+


@@ 403,26 409,15 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl

If the Eng Review is NOT "CLEAR":

1. **Check for a prior override on this branch:**
   ```bash
   eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
   grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
   ```
   If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."

2. **If no override exists,** use AskUserQuestion:
   - Show that Eng Review is missing or has open issues
   - RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
   - Options: A) Ship anyway  B) Abort — run /review or /plan-eng-review first  C) Change is too small to need eng review
   - If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
   - For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."

3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
   ```bash
   eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
   echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
   ```
   Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.

For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.

Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.

---



@@ 1340,6 1335,13 @@ Present Codex output under a `CODEX (design):` header, merged with the checklist

   If no issues found: `Pre-Landing Review: No issues found.`

9. Persist the review result to the review log:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
```
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.

Save the review output — it goes into the PR body in Step 8.

---

M ship/SKILL.md.tmpl => ship/SKILL.md.tmpl +14 -18
@@ 64,26 64,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat

If the Eng Review is NOT "CLEAR":

1. **Check for a prior override on this branch:**
   ```bash
   {{SLUG_EVAL}}
   grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
   ```
   If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."

2. **If no override exists,** use AskUserQuestion:
   - Show that Eng Review is missing or has open issues
   - RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
   - Options: A) Ship anyway  B) Abort — run /review or /plan-eng-review first  C) Change is too small to need eng review
   - If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
   - For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."

3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
   ```bash
   {{SLUG_EVAL}}
   echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
   ```
   Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.

For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.

Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.

---



@@ 275,6 264,13 @@ Review the diff for structural issues that tests don't catch.

   If no issues found: `Pre-Landing Review: No issues found.`

9. Persist the review result to the review log:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
```
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.

Save the review output — it goes into the PR body in Step 8.

---

M test/helpers/touchfiles.ts => test/helpers/touchfiles.ts +1 -0
@@ 79,6 79,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  // Ship
  'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],
  'ship-local-workflow': ['ship/**', 'scripts/gen-skill-docs.ts'],
  'review-dashboard-via': ['ship/**', 'scripts/resolvers/review.ts', 'codex/**', 'autoplan/**', 'land-and-deploy/**'],
  'ship-plan-completion': ['ship/**', 'scripts/gen-skill-docs.ts'],
  'ship-plan-verification': ['ship/**', 'scripts/gen-skill-docs.ts'],


M test/skill-e2e-review.test.ts => test/skill-e2e-review.test.ts +113 -0
@@ 529,6 529,119 @@ Analyze the git history and produce the narrative report as described in the SKI
  }, 420_000);
});

// --- Review Dashboard Via Attribution E2E ---

describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'], () => {
  let dashDir: string;

  beforeAll(() => {
    dashDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-dashboard-via-'));
    const run = (cmd: string, args: string[], cwd = dashDir) =>
      spawnSync(cmd, args, { cwd, stdio: 'pipe', timeout: 5000 });

    // Create git repo with feature branch
    run('git', ['init', '-b', 'main']);
    run('git', ['config', 'user.email', 'test@test.com']);
    run('git', ['config', 'user.name', 'Test']);

    fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v1");\n');
    run('git', ['add', 'app.ts']);
    run('git', ['commit', '-m', 'initial']);

    run('git', ['checkout', '-b', 'feature/dashboard-test']);
    fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v2");\n');
    run('git', ['add', 'app.ts']);
    run('git', ['commit', '-m', 'feat: update']);

    // Get HEAD commit for review entries
    const headResult = spawnSync('git', ['rev-parse', '--short', 'HEAD'], { cwd: dashDir, stdio: 'pipe' });
    const commit = headResult.stdout.toString().trim();

    // Pre-populate review log with autoplan-sourced entries
    // gstack-review-read reads from ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
    // For the test, we'll write a mock gstack-review-read script that returns our test data
    const timestamp = new Date().toISOString().replace(/\.\d{3}Z$/, 'Z');
    const reviewData = [
      `{"skill":"plan-eng-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"${commit}"}`,
      `{"skill":"plan-ceo-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"${commit}"}`,
      `{"skill":"codex-plan-review","timestamp":"${timestamp}","status":"clean","source":"codex","commit":"${commit}"}`,
    ].join('\n');

    // Write a mock gstack-review-read that returns our test data
    const mockBinDir = path.join(dashDir, '.mock-bin');
    fs.mkdirSync(mockBinDir, { recursive: true });
    fs.writeFileSync(path.join(mockBinDir, 'gstack-review-read'), [
      '#!/usr/bin/env bash',
      `echo '${reviewData.split('\n').join("'\necho '")}'`,
      'echo "---CONFIG---"',
      'echo "false"',
      'echo "---HEAD---"',
      `echo "${commit}"`,
    ].join('\n'));
    fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);

    // Copy ship skill
    fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md'));
  });

  afterAll(() => {
    try { fs.rmSync(dashDir, { recursive: true, force: true }); } catch {}
  });

  testConcurrentIfSelected('review-dashboard-via', async () => {
    const mockBinDir = path.join(dashDir, '.mock-bin');

    const result = await runSkillTest({
      prompt: `Read ship-SKILL.md. You only need to run the Review Readiness Dashboard section.

Instead of running ~/.claude/skills/gstack/bin/gstack-review-read, run this mock: ${mockBinDir}/gstack-review-read

Parse the output and display the dashboard table. Pay attention to:
1. The "via" field in entries — show source attribution (e.g., "via /autoplan")
2. The codex-plan-review entry — it should populate the Outside Voice row
3. Since Eng Review IS clear, there should be NO gate blocking — just display the dashboard

Skip the preamble, lake intro, telemetry, and all other ship steps.
Write the dashboard output to ${dashDir}/dashboard-output.md`,
      workingDirectory: dashDir,
      maxTurns: 12,
      timeout: 90_000,
      testName: 'review-dashboard-via',
      runId,
    });

    logCost('/ship dashboard-via', result);
    recordE2E(evalCollector, '/ship review dashboard via attribution', 'Dashboard via field', result);
    expect(result.exitReason).toBe('success');

    // Check dashboard output for via attribution
    const dashPath = path.join(dashDir, 'dashboard-output.md');
    const allOutput = [
      result.output || '',
      ...result.toolCalls.map(tc => tc.output || ''),
    ].join('\n').toLowerCase();

    // Verify via attribution appears somewhere (conversation or file)
    let dashContent = '';
    if (fs.existsSync(dashPath)) {
      dashContent = fs.readFileSync(dashPath, 'utf-8').toLowerCase();
    }
    const combined = allOutput + dashContent;

    // Should mention autoplan attribution
    expect(combined).toMatch(/autoplan/);
    // Should show eng review as CLEAR (it has a clean entry)
    expect(combined).toMatch(/clear/i);
    // Should NOT contain AskUserQuestion gate (no blocking)
    const gateQuestions = result.toolCalls.filter(tc =>
      tc.tool === 'mcp__conductor__AskUserQuestion' ||
      (tc.tool === 'AskUserQuestion')
    );
    // Ship dashboard should not gate when eng review is clear
    expect(gateQuestions).toHaveLength(0);
  }, 120_000);
});

// Module-level afterAll — finalize eval collector after all tests complete
afterAll(async () => {
  await finalizeEvalCollector(evalCollector);