~cytrogen/gstack: feat: Fix-First Review — auto-fix obvious issues, ask about hard ones (v0.4.5) (#116)

7 files changed, 193 insertions(+), 49 deletions(-)

M CHANGELOG.md
M review/SKILL.md
M review/SKILL.md.tmpl
M review/checklist.md
M ship/SKILL.md
M ship/SKILL.md.tmpl
M test/skill-validation.test.ts

M CHANGELOG.md => CHANGELOG.md +6 -0

@@ 2,6 2,9 @@
 
 ## 0.4.5 — 2026-03-16
 
+- **Review findings now actually get fixed, not just listed.** `/review` and `/ship` used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see `[AUTO-FIXED] file:line Problem → what was done` for each auto-fix.
+- **You control the line between "just fix it" and "ask me first."** Dead code, stale comments, N+1 queries get auto-fixed. Security issues, race conditions, design decisions get surfaced for your call. The classification lives in one place (`review/checklist.md`) so both `/review` and `/ship` stay in sync.
+
 ### Fixed
 
 - **`$B js "const x = await fetch(...); return x.status"` now works.** The `js` command used to wrap everything as an expression — so `const`, semicolons, and multi-line code all broke. It now detects statements and uses a block wrapper, just like `eval` already did.


@@ 10,6 13,9 @@
 
 ### For contributors
 
+- Gate Classification → Severity Classification rename (severity determines presentation order, not whether you see a prompt).
+- Fix-First Heuristic section added to `review/checklist.md` — the canonical AUTO-FIX vs ASK classification.
+- New validation test: `Fix-First Heuristic exists in checklist and is referenced by review + ship`.
 - Extracted `needsBlockWrapper()` and `wrapForEvaluate()` helpers in `read-commands.ts` — shared by both `js` and `eval` commands (DRY).
 - Added `getRefRole()` to `BrowserManager` — exposes ARIA role for ref selectors without changing `resolveRef` return type.
 - Click handler auto-routes `[role=option]` refs to `selectOption()` via parent `<select>`, with DOM `tagName` check to avoid blocking custom listbox components.

M review/SKILL.md => review/SKILL.md +47 -8

@@ 157,14 157,53 @@ Follow the output format specified in the checklist. Respect the suppressions 
 
 ---
 
-## Step 5: Output findings
+## Step 5: Fix-First Review
 
-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**
 
-- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
-  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
-- If only non-critical issues found: output findings. No further action needed.
-- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+   Fix: Add `WHERE status = 'draft'` to the UPDATE
+   → A) Fix  B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+   Fix: Add JSON schema validation
+   → A) Fix  B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
 
 ### Greptile comment resolution
 


@@ 174,7 213,7 @@ After outputting your own findings, if Greptile comments were classified in Step
 
 Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
 
-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
 
 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
    - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL


@@ 223,7 262,7 @@ If no documentation files exist, skip this step silently.
 ## Important Rules
 
 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
-- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
 - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.

M review/SKILL.md.tmpl => review/SKILL.md.tmpl +47 -8

@@ 75,14 75,53 @@ Follow the output format specified in the checklist. Respect the suppressions 
 
 ---
 
-## Step 5: Output findings
+## Step 5: Fix-First Review
 
-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**
 
-- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
-  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
-- If only non-critical issues found: output findings. No further action needed.
-- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+   Fix: Add `WHERE status = 'draft'` to the UPDATE
+   → A) Fix  B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+   Fix: Add JSON schema validation
+   → A) Fix  B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
 
 ### Greptile comment resolution
 


@@ 92,7 131,7 @@ After outputting your own findings, if Greptile comments were classified in Step
 
 Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
 
-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
 
 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
    - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL


@@ 141,7 180,7 @@ If no documentation files exist, skip this step silently.
 ## Important Rules
 
 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
-- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
 - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.

M review/checklist.md => review/checklist.md +42 -9

@@ 5,21 5,23 @@
 Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
 
 **Two-pass review:**
-- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
-- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
+- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. Highest severity.
+- **Pass 2 (INFORMATIONAL):** Run all remaining categories. Lower severity but still actioned.
+
+All findings get action via Fix-First Review: obvious mechanical fixes are applied automatically,
+genuinely ambiguous issues are batched into a single user question.
 
 **Output format:**
 
 ```
 Pre-Landing Review: N issues (X critical, Y informational)
 
-**CRITICAL** (blocking /ship):
-- [file:line] Problem description
-  Fix: suggested fix
+**AUTO-FIXED:**
+- [file:line] Problem → fix applied
 
-**Issues** (non-blocking):
+**NEEDS INPUT:**
 - [file:line] Problem description
-  Fix: suggested fix
+  Recommended fix: suggested fix
 ```
 
 If no issues found: `Pre-Landing Review: No issues found.`


@@ 102,10 104,10 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo
 
 ---
 
-## Gate Classification
+## Severity Classification
 
 ```
-CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
+CRITICAL (highest severity):      INFORMATIONAL (lower severity):
 ├─ SQL & Data Safety              ├─ Conditional Side Effects
 ├─ Race Conditions & Concurrency  ├─ Magic Numbers & String Coupling
 ├─ LLM Output Trust Boundary      ├─ Dead Code & Consistency


@@ 115,7 117,38 @@ CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
                                    ├─ Time Window Safety
                                    ├─ Type Coercion at Boundaries
                                    └─ View/Frontend
+
+All findings are actioned via Fix-First Review. Severity determines
+presentation order and classification of AUTO-FIX vs ASK — critical
+findings lean toward ASK (they're riskier), informational findings
+lean toward AUTO-FIX (they're more mechanical).
+```
+
+---
+
+## Fix-First Heuristic
+
+This heuristic is referenced by both `/review` and `/ship`. It determines whether
+the agent auto-fixes a finding or asks the user.
+
 ```
+AUTO-FIX (agent fixes without asking):     ASK (needs human judgment):
+├─ Dead code / unused variables            ├─ Security (auth, XSS, injection)
+├─ N+1 queries (missing .includes())      ├─ Race conditions
+├─ Stale comments contradicting code       ├─ Design decisions
+├─ Magic numbers → named constants         ├─ Large fixes (>20 lines)
+├─ Missing LLM output validation           ├─ Enum completeness
+├─ Version/path mismatches                 ├─ Removing functionality
+├─ Variables assigned but never read       └─ Anything changing user-visible
+└─ Inline styles, O(n*m) view lookups        behavior
+```
+
+**Rule of thumb:** If the fix is mechanical and a senior engineer would apply it
+without discussion, it's AUTO-FIX. If reasonable engineers could disagree about
+the fix, it's ASK.
+
+**Critical findings default toward ASK** (they're inherently riskier).
+**Informational findings default toward AUTO-FIX** (they're more mechanical).
 
 ---

M ship/SKILL.md => ship/SKILL.md +18 -11

@@ 107,7 107,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - On the base branch (abort)
 - Merge conflicts that can't be auto-resolved (stop, show conflicts)
 - Test failures (stop, show failures)
-- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
+- Pre-landing review finds ASK items that need user judgment
 - MINOR or MAJOR version bump needed (ask — see Step 4)
 - Greptile review comments that need user decision (complex fixes, false positives)
 - TODOS.md missing and user wants to create one (ask — see Step 5.5)


@@ 120,6 120,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - Commit message approval (auto-commit)
 - Multi-file changesets (auto-split into bisectable commits)
 - TODOS.md completed-item detection (auto-mark)
+- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
 
 ---
 


@@ 243,19 244,25 @@ Review the diff for structural issues that tests don't catch.
    - **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
    - **Pass 2 (INFORMATIONAL):** All remaining categories
 
-4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
+4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
+   checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
 
-5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
+   `[AUTO-FIXED] [file:line] Problem → what you did`
 
-6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
-   - The problem (`file:line` + description)
-   - `RECOMMENDATION: Choose A because [one-line reason]`
-   - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
-   After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
+6. **If ASK items remain,** present them in ONE AskUserQuestion:
+   - List each with number, severity, problem, recommended fix
+   - Per-item options: A) Fix  B) Skip
+   - Overall RECOMMENDATION
+   - If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
 
-7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
+7. **After all fixes (auto + user-approved):**
+   - If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
+   - If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
 
-8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
+8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
+
+   If no issues found: `Pre-Landing Review: No issues found.`
 
 Save the review output — it goes into the PR body in Step 8.
 


@@ 488,7 495,7 @@ EOF
 - **Never skip tests.** If tests fail, stop.
 - **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
 - **Never force push.** Use regular `git push` only.
-- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
+- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
 - **Always use the 4-digit version format** from the VERSION file.
 - **Date format in CHANGELOG:** `YYYY-MM-DD`
 - **Split commits for bisectability** — each commit = one logical change.

M ship/SKILL.md.tmpl => ship/SKILL.md.tmpl +18 -11

@@ 25,7 25,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - On the base branch (abort)
 - Merge conflicts that can't be auto-resolved (stop, show conflicts)
 - Test failures (stop, show failures)
-- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
+- Pre-landing review finds ASK items that need user judgment
 - MINOR or MAJOR version bump needed (ask — see Step 4)
 - Greptile review comments that need user decision (complex fixes, false positives)
 - TODOS.md missing and user wants to create one (ask — see Step 5.5)


@@ 38,6 38,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - Commit message approval (auto-commit)
 - Multi-file changesets (auto-split into bisectable commits)
 - TODOS.md completed-item detection (auto-mark)
+- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
 
 ---
 


@@ 161,19 162,25 @@ Review the diff for structural issues that tests don't catch.
    - **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
    - **Pass 2 (INFORMATIONAL):** All remaining categories
 
-4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
+4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
+   checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
 
-5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
+   `[AUTO-FIXED] [file:line] Problem → what you did`
 
-6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
-   - The problem (`file:line` + description)
-   - `RECOMMENDATION: Choose A because [one-line reason]`
-   - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
-   After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
+6. **If ASK items remain,** present them in ONE AskUserQuestion:
+   - List each with number, severity, problem, recommended fix
+   - Per-item options: A) Fix  B) Skip
+   - Overall RECOMMENDATION
+   - If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
 
-7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
+7. **After all fixes (auto + user-approved):**
+   - If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
+   - If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
 
-8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
+8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
+
+   If no issues found: `Pre-Landing Review: No issues found.`
 
 Save the review output — it goes into the PR body in Step 8.
 


@@ 406,7 413,7 @@ EOF
 - **Never skip tests.** If tests fail, stop.
 - **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
 - **Never force push.** Use regular `git push` only.
-- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
+- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
 - **Always use the 4-digit version format** from the VERSION file.
 - **Date format in CHANGELOG:** `YYYY-MM-DD`
 - **Split commits for bisectability** — each commit = one logical change.

M test/skill-validation.test.ts => test/skill-validation.test.ts +15 -2

@@ 559,8 559,8 @@ describe('Enum & Value Completeness in review checklist', () => {
     expect(checklist).toContain('allowlist');
   });
 
-  test('Enum & Value Completeness is in the gate classification as CRITICAL', () => {
-    const gateSection = checklist.slice(checklist.indexOf('## Gate Classification'));
+  test('Enum & Value Completeness is in the severity classification as CRITICAL', () => {
+    const gateSection = checklist.slice(checklist.indexOf('## Severity Classification'));
     // The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
     // Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
     const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));


@@ 568,6 568,19 @@ describe('Enum & Value Completeness in review checklist', () => {
     // It's on the left (CRITICAL) side — starts with ├─ or └─
     expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
   });
+
+  test('Fix-First Heuristic exists in checklist and is referenced by review + ship', () => {
+    expect(checklist).toContain('## Fix-First Heuristic');
+    expect(checklist).toContain('AUTO-FIX');
+    expect(checklist).toContain('ASK');
+
+    const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
+    const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
+    expect(reviewSkill).toContain('AUTO-FIX');
+    expect(reviewSkill).toContain('[AUTO-FIXED]');
+    expect(shipSkill).toContain('AUTO-FIX');
+    expect(shipSkill).toContain('[AUTO-FIXED]');
+  });
 });
 
 // --- Part 7: Planted-bug fixture validation (A4) ---