From 997f7b1da6a19879fa5bc79c4fe5f71900b8c19f Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Thu, 26 Mar 2026 08:31:53 -0600 Subject: [PATCH] =?UTF-8?q?fix:=20review=20log=20architecture=20=E2=80=94?= =?UTF-8?q?=20close=20gaps,=20add=20attribution=20(v0.11.21.0)=20(#512)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix: review log architecture — close gaps, fix orphans, add attribution - Ship Step 3.5 now logs its code review to the review log (via:"ship") - Remove eng review gate — ship runs its own review in Step 3.5 - Dashboard Outside Voice row mapped to codex-plan-review - Dashboard shows via source attribution (e.g., "via /autoplan") - land-and-deploy checks all 8 review skill types (was 5) - codex-review log gets commit field for staleness detection - autoplan uses placeholder tokens instead of hardcoded "clean" - Document autoplan-voices as audit-trail-only in review.ts - E2E test for dashboard via attribution * chore: bump version and changelog (v0.11.21.0) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 11 ++++ VERSION | 2 +- autoplan/SKILL.md | 12 ++-- autoplan/SKILL.md.tmpl | 12 ++-- codex/SKILL.md | 2 +- codex/SKILL.md.tmpl | 2 +- land-and-deploy/SKILL.md | 5 +- land-and-deploy/SKILL.md.tmpl | 5 +- plan-ceo-review/SKILL.md | 8 ++- plan-design-review/SKILL.md | 8 ++- plan-eng-review/SKILL.md | 8 ++- scripts/resolvers/review.ts | 8 ++- ship/SKILL.md | 40 ++++++------ ship/SKILL.md.tmpl | 32 +++++----- test/helpers/touchfiles.ts | 1 + test/skill-e2e-review.test.ts | 113 ++++++++++++++++++++++++++++++++++ 16 files changed, 209 insertions(+), 60 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index acbc55cd751abe537b9dd125ab6110fc23aa9638..68199eb111d73c571a2f251961414e1fdd75e38d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,16 @@ # Changelog +## [0.11.21.0] - 2026-03-26 + +### Fixed + +- **`/autoplan` reviews now count toward the ship readiness gate.** When `/autoplan` ran full CEO + Design + Eng reviews, `/ship` still showed "0 runs" for Eng Review because autoplan-logged entries weren't being read correctly. Now the dashboard shows source attribution (e.g., "CLEAR (PLAN via /autoplan)") so you can see exactly which tool satisfied each review. +- **`/ship` no longer tells you to "run /review first."** Ship runs its own pre-landing review in Step 3.5 — asking you to run the same review separately was redundant. The gate is removed; ship just does it. +- **`/land-and-deploy` now checks all 8 review types.** Previously missed `review`, `adversarial-review`, and `codex-plan-review` — if you only ran `/review` (not `/plan-eng-review`), land-and-deploy wouldn't see it. +- **Dashboard Outside Voice row now works.** Was showing "0 runs" even after outside voices ran in `/plan-ceo-review` or `/plan-eng-review`. Now correctly maps to `codex-plan-review` entries. +- **`/codex review` now tracks staleness.** Added the `commit` field to codex review log entries so the dashboard can detect when a codex review is outdated. +- **`/autoplan` no longer hardcodes "clean" status.** Review log entries from autoplan used to always record `status:"clean"` even when issues were found. Now uses proper placeholder tokens that Claude substitutes with real values. + ## [0.11.20.0] - 2026-03-26 ### Added diff --git a/VERSION b/VERSION index 508c698a966d502b78ddb1d1051cdcb7f8f83642..5e1d8ddf6488156d939d7a987005ccb8aab59ea5 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.11.20.0 +0.11.21.0 diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md index d69fc285b101980c46ae6d0c36cd80ed0fb42c9f..aee5d372e77f063fde3b802c81db47bda0c215e6 100644 --- a/autoplan/SKILL.md +++ b/autoplan/SKILL.md @@ -929,24 +929,24 @@ AskUserQuestion options: ## Completion: Write Review Logs -On approval, write 3 separate review log entries so /ship's dashboard recognizes them: +On approval, write 3 separate review log entries so /ship's dashboard recognizes them. +Replace TIMESTAMP, STATUS, and N with actual values from each review phase. +STATUS is "clean" if no unresolved issues, "issues_open" otherwise. ```bash COMMIT=$(git rev-parse --short HEAD 2>/dev/null) TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}' -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}' ``` If Phase 2 ran (UI scope): ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}' ``` -Replace field values with actual counts from the review. - Dual voice logs (one per phase that ran): ```bash ~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}' diff --git a/autoplan/SKILL.md.tmpl b/autoplan/SKILL.md.tmpl index 661e8fb016737a5443946eca19921bf12dc66297..7cf78ced3ef5dbc7e404ffcaf3c5ad3064adbe7f 100644 --- a/autoplan/SKILL.md.tmpl +++ b/autoplan/SKILL.md.tmpl @@ -584,24 +584,24 @@ AskUserQuestion options: ## Completion: Write Review Logs -On approval, write 3 separate review log entries so /ship's dashboard recognizes them: +On approval, write 3 separate review log entries so /ship's dashboard recognizes them. +Replace TIMESTAMP, STATUS, and N with actual values from each review phase. +STATUS is "clean" if no unresolved issues, "issues_open" otherwise. ```bash COMMIT=$(git rev-parse --short HEAD 2>/dev/null) TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ) -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}' -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}' ``` If Phase 2 ran (UI scope): ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}' ``` -Replace field values with actual counts from the review. - Dual voice logs (one per phase that ran): ```bash ~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}' diff --git a/codex/SKILL.md b/codex/SKILL.md index 6e19cd0426a39fbfeabd6e028f3eeb1d48c0164f..ec9eea7c5a170104a2fcfe46d078eeae8e4370c0 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -423,7 +423,7 @@ CROSS-MODEL ANALYSIS: 7. Persist the review result: ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}' ``` Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl index 338df93b6afc15747e3f927c0969bf379922fb63..77021c8237e7a4ef7cef92d18fd847dd2ee526d0 100644 --- a/codex/SKILL.md.tmpl +++ b/codex/SKILL.md.tmpl @@ -127,7 +127,7 @@ CROSS-MODEL ANALYSIS: 7. Persist the review result: ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}' ``` Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL), diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md index 131c1f2d35cfb5087ca87278370336fd9abb50e2..d5f2c8d6411a2389634edaa5c5c817235e699fde 100644 --- a/land-and-deploy/SKILL.md +++ b/land-and-deploy/SKILL.md @@ -447,7 +447,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red ``` Parse the output. For each review skill (plan-eng-review, plan-ceo-review, -plan-design-review, design-review-lite, codex-review): +plan-design-review, design-review-lite, codex-review, review, adversarial-review, +codex-plan-review): 1. Find the most recent entry within the last 7 days. 2. Extract its `commit` field. @@ -594,7 +595,7 @@ Use AskUserQuestion: - C) Merge anyway — I understand the risks (Completeness: 3/10) If the user chooses B: **STOP.** List exactly what needs to be done: -- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code." +- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code." - If E2E not run: "Run `bun run test:e2e` to verify." - If docs not updated: "Run /document-release to update documentation." - If PR body stale: "Update the PR body to reflect current changes." diff --git a/land-and-deploy/SKILL.md.tmpl b/land-and-deploy/SKILL.md.tmpl index 7fcf67975754c1d3b234756e8c65602ebb93bd1c..2af2acba263484054a6140a189f983e193abc53a 100644 --- a/land-and-deploy/SKILL.md.tmpl +++ b/land-and-deploy/SKILL.md.tmpl @@ -134,7 +134,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red ``` Parse the output. For each review skill (plan-eng-review, plan-ceo-review, -plan-design-review, design-review-lite, codex-review): +plan-design-review, design-review-lite, codex-review, review, adversarial-review, +codex-plan-review): 1. Find the most recent entry within the last 7 days. 2. Extract its `commit` field. @@ -281,7 +282,7 @@ Use AskUserQuestion: - C) Merge anyway — I understand the risks (Completeness: 3/10) If the user chooses B: **STOP.** List exactly what needs to be done: -- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code." +- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code." - If E2E not run: "Run `bun run test:e2e` to verify." - If docs not updated: "Run /document-release to update documentation." - If PR body stale: "Update the PR body to reflect current changes." diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index d05be05f2949d58ddc2aa08d100397a0717817e1..c092ebc15ac503b9769ea10acc7fe9ce60556b1d 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -1262,7 +1262,13 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review. + +**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before. + +Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer. + +Display: ``` +====================================================================+ diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 5960ea18c987403e5c63e4057d8eddc4795ce29f..3ff7d9f87afc9f69df882edc317352dca3386b75 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -768,7 +768,13 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review. + +**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before. + +Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer. + +Display: ``` +====================================================================+ diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index 0b61d5f6683ae616854a37a85d54680f536c0662..5b57c16f66e5774c07ea3e6a70772ecd86adf74c 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -870,7 +870,13 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review. + +**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before. + +Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer. + +Display: ``` +====================================================================+ diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts index 2b83f36d40dbc4372af09dfb48316db59e19c639..86da3b860b9ff6632bea73e624a4ace8d5a5f665 100644 --- a/scripts/resolvers/review.ts +++ b/scripts/resolvers/review.ts @@ -9,7 +9,13 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read \`\`\` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent \`codex-plan-review\` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review. + +**Source attribution:** If the most recent entry for a skill has a \\\`"via"\\\` field, append it to the status label in parentheses. Examples: \`plan-eng-review\` with \`via:"autoplan"\` shows as "CLEAR (PLAN via /autoplan)". \`review\` with \`via:"ship"\` shows as "CLEAR (DIFF via /ship)". Entries without a \`via\` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before. + +Note: \`autoplan-voices\` and \`design-outside-voices\` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer. + +Display: \`\`\` +====================================================================+ diff --git a/ship/SKILL.md b/ship/SKILL.md index 8999bf84020eab2d7210b353e7c39a4f59de4bb1..0fbc474fa2353f083a7511cecd9fde353dc5e6ae 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -364,7 +364,13 @@ After completing the review, read the review log and config to display the dashb ~/.claude/skills/gstack/bin/gstack-review-read ``` -Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display: +Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review. + +**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before. + +Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer. + +Display: ``` +====================================================================+ @@ -403,26 +409,15 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl If the Eng Review is NOT "CLEAR": -1. **Check for a prior override on this branch:** - ```bash - eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" - grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE" - ``` - If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again. +Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5." -2. **If no override exists,** use AskUserQuestion: - - Show that Eng Review is missing or has open issues - - RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes - - Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review - - If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block - - For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block. +Check diff size: `git diff ...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping." -3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate: - ```bash - eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" - echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl - ``` - Substitute USER_CHOICE with "ship_anyway" or "not_relevant". +If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block. + +For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block. + +Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5. --- @@ -1340,6 +1335,13 @@ Present Codex output under a `CODEX (design):` header, merged with the checklist If no issues found: `Pre-Landing Review: No issues found.` +9. Persist the review result to the review log: +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}' +``` +Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise), +and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs. + Save the review output — it goes into the PR body in Step 8. --- diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index d630e330803cb28c0bf122f2b38acebc22f107b1..7f545cd97a6fe77bc5504329c34934bc10e63393 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -64,26 +64,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat If the Eng Review is NOT "CLEAR": -1. **Check for a prior override on this branch:** - ```bash - {{SLUG_EVAL}} - grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE" - ``` - If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again. +Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5." -2. **If no override exists,** use AskUserQuestion: - - Show that Eng Review is missing or has open issues - - RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes - - Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review - - If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block - - For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block. +Check diff size: `git diff ...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping." -3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate: - ```bash - {{SLUG_EVAL}} - echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl - ``` - Substitute USER_CHOICE with "ship_anyway" or "not_relevant". +If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block. + +For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block. + +Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5. --- @@ -275,6 +264,13 @@ Review the diff for structural issues that tests don't catch. If no issues found: `Pre-Landing Review: No issues found.` +9. Persist the review result to the review log: +```bash +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}' +``` +Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise), +and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs. + Save the review output — it goes into the PR body in Step 8. --- diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts index 585e9dd30c6c69e76858f46ef76c10a7dbd08d23..d1a0fa5701a80384876bed5003895f99e479188e 100644 --- a/test/helpers/touchfiles.ts +++ b/test/helpers/touchfiles.ts @@ -79,6 +79,7 @@ export const E2E_TOUCHFILES: Record = { // Ship 'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'], 'ship-local-workflow': ['ship/**', 'scripts/gen-skill-docs.ts'], + 'review-dashboard-via': ['ship/**', 'scripts/resolvers/review.ts', 'codex/**', 'autoplan/**', 'land-and-deploy/**'], 'ship-plan-completion': ['ship/**', 'scripts/gen-skill-docs.ts'], 'ship-plan-verification': ['ship/**', 'scripts/gen-skill-docs.ts'], diff --git a/test/skill-e2e-review.test.ts b/test/skill-e2e-review.test.ts index b1d5442df87e6d61b9ad81e28e4316a137bef90f..b5ad501ce668b86c715c76e3da7b9ae132f1a5f6 100644 --- a/test/skill-e2e-review.test.ts +++ b/test/skill-e2e-review.test.ts @@ -529,6 +529,119 @@ Analyze the git history and produce the narrative report as described in the SKI }, 420_000); }); +// --- Review Dashboard Via Attribution E2E --- + +describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'], () => { + let dashDir: string; + + beforeAll(() => { + dashDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-dashboard-via-')); + const run = (cmd: string, args: string[], cwd = dashDir) => + spawnSync(cmd, args, { cwd, stdio: 'pipe', timeout: 5000 }); + + // Create git repo with feature branch + run('git', ['init', '-b', 'main']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + + fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v1");\n'); + run('git', ['add', 'app.ts']); + run('git', ['commit', '-m', 'initial']); + + run('git', ['checkout', '-b', 'feature/dashboard-test']); + fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v2");\n'); + run('git', ['add', 'app.ts']); + run('git', ['commit', '-m', 'feat: update']); + + // Get HEAD commit for review entries + const headResult = spawnSync('git', ['rev-parse', '--short', 'HEAD'], { cwd: dashDir, stdio: 'pipe' }); + const commit = headResult.stdout.toString().trim(); + + // Pre-populate review log with autoplan-sourced entries + // gstack-review-read reads from ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl + // For the test, we'll write a mock gstack-review-read script that returns our test data + const timestamp = new Date().toISOString().replace(/\.\d{3}Z$/, 'Z'); + const reviewData = [ + `{"skill":"plan-eng-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"${commit}"}`, + `{"skill":"plan-ceo-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"${commit}"}`, + `{"skill":"codex-plan-review","timestamp":"${timestamp}","status":"clean","source":"codex","commit":"${commit}"}`, + ].join('\n'); + + // Write a mock gstack-review-read that returns our test data + const mockBinDir = path.join(dashDir, '.mock-bin'); + fs.mkdirSync(mockBinDir, { recursive: true }); + fs.writeFileSync(path.join(mockBinDir, 'gstack-review-read'), [ + '#!/usr/bin/env bash', + `echo '${reviewData.split('\n').join("'\necho '")}'`, + 'echo "---CONFIG---"', + 'echo "false"', + 'echo "---HEAD---"', + `echo "${commit}"`, + ].join('\n')); + fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755); + + // Copy ship skill + fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md')); + }); + + afterAll(() => { + try { fs.rmSync(dashDir, { recursive: true, force: true }); } catch {} + }); + + testConcurrentIfSelected('review-dashboard-via', async () => { + const mockBinDir = path.join(dashDir, '.mock-bin'); + + const result = await runSkillTest({ + prompt: `Read ship-SKILL.md. You only need to run the Review Readiness Dashboard section. + +Instead of running ~/.claude/skills/gstack/bin/gstack-review-read, run this mock: ${mockBinDir}/gstack-review-read + +Parse the output and display the dashboard table. Pay attention to: +1. The "via" field in entries — show source attribution (e.g., "via /autoplan") +2. The codex-plan-review entry — it should populate the Outside Voice row +3. Since Eng Review IS clear, there should be NO gate blocking — just display the dashboard + +Skip the preamble, lake intro, telemetry, and all other ship steps. +Write the dashboard output to ${dashDir}/dashboard-output.md`, + workingDirectory: dashDir, + maxTurns: 12, + timeout: 90_000, + testName: 'review-dashboard-via', + runId, + }); + + logCost('/ship dashboard-via', result); + recordE2E(evalCollector, '/ship review dashboard via attribution', 'Dashboard via field', result); + expect(result.exitReason).toBe('success'); + + // Check dashboard output for via attribution + const dashPath = path.join(dashDir, 'dashboard-output.md'); + const allOutput = [ + result.output || '', + ...result.toolCalls.map(tc => tc.output || ''), + ].join('\n').toLowerCase(); + + // Verify via attribution appears somewhere (conversation or file) + let dashContent = ''; + if (fs.existsSync(dashPath)) { + dashContent = fs.readFileSync(dashPath, 'utf-8').toLowerCase(); + } + const combined = allOutput + dashContent; + + // Should mention autoplan attribution + expect(combined).toMatch(/autoplan/); + // Should show eng review as CLEAR (it has a clean entry) + expect(combined).toMatch(/clear/i); + // Should NOT contain AskUserQuestion gate (no blocking) + const gateQuestions = result.toolCalls.filter(tc => + tc.tool === 'mcp__conductor__AskUserQuestion' || + (tc.tool === 'AskUserQuestion') + ); + // Ship dashboard should not gate when eng review is clear + expect(gateQuestions).toHaveLength(0); + }, 120_000); +}); + // Module-level afterAll — finalize eval collector after all tests complete afterAll(async () => { await finalizeEvalCollector(evalCollector);