M CHANGELOG.md => CHANGELOG.md +6 -0
@@ 2,6 2,9 @@
## 0.4.5 — 2026-03-16
+- **Review findings now actually get fixed, not just listed.** `/review` and `/ship` used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see `[AUTO-FIXED] file:line Problem → what was done` for each auto-fix.
+- **You control the line between "just fix it" and "ask me first."** Dead code, stale comments, N+1 queries get auto-fixed. Security issues, race conditions, design decisions get surfaced for your call. The classification lives in one place (`review/checklist.md`) so both `/review` and `/ship` stay in sync.
+
### Fixed
- **`$B js "const x = await fetch(...); return x.status"` now works.** The `js` command used to wrap everything as an expression — so `const`, semicolons, and multi-line code all broke. It now detects statements and uses a block wrapper, just like `eval` already did.
@@ 10,6 13,9 @@
### For contributors
+- Gate Classification → Severity Classification rename (severity determines presentation order, not whether you see a prompt).
+- Fix-First Heuristic section added to `review/checklist.md` — the canonical AUTO-FIX vs ASK classification.
+- New validation test: `Fix-First Heuristic exists in checklist and is referenced by review + ship`.
- Extracted `needsBlockWrapper()` and `wrapForEvaluate()` helpers in `read-commands.ts` — shared by both `js` and `eval` commands (DRY).
- Added `getRefRole()` to `BrowserManager` — exposes ARIA role for ref selectors without changing `resolveRef` return type.
- Click handler auto-routes `[role=option]` refs to `selectOption()` via parent `<select>`, with DOM `tagName` check to avoid blocking custom listbox components.
M review/SKILL.md => review/SKILL.md +47 -8
@@ 157,14 157,53 @@ Follow the output format specified in the checklist. Respect the suppressions
---
-## Step 5: Output findings
+## Step 5: Fix-First Review
-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**
-- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
- After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
-- If only non-critical issues found: output findings. No further action needed.
-- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+ Fix: Add `WHERE status = 'draft'` to the UPDATE
+ → A) Fix B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+ Fix: Add JSON schema validation
+ → A) Fix B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
### Greptile comment resolution
@@ 174,7 213,7 @@ After outputting your own findings, if Greptile comments were classified in Step
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ 223,7 262,7 @@ If no documentation files exist, skip this step silently.
## Important Rules
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
-- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
M review/SKILL.md.tmpl => review/SKILL.md.tmpl +47 -8
@@ 75,14 75,53 @@ Follow the output format specified in the checklist. Respect the suppressions
---
-## Step 5: Output findings
+## Step 5: Fix-First Review
-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**
-- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
- After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
-- If only non-critical issues found: output findings. No further action needed.
-- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+ Fix: Add `WHERE status = 'draft'` to the UPDATE
+ → A) Fix B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+ Fix: Add JSON schema validation
+ → A) Fix B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
### Greptile comment resolution
@@ 92,7 131,7 @@ After outputting your own findings, if Greptile comments were classified in Step
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ 141,7 180,7 @@ If no documentation files exist, skip this step silently.
## Important Rules
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
-- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
M review/checklist.md => review/checklist.md +42 -9
@@ 5,21 5,23 @@
Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
**Two-pass review:**
-- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
-- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
+- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. Highest severity.
+- **Pass 2 (INFORMATIONAL):** Run all remaining categories. Lower severity but still actioned.
+
+All findings get action via Fix-First Review: obvious mechanical fixes are applied automatically,
+genuinely ambiguous issues are batched into a single user question.
**Output format:**
```
Pre-Landing Review: N issues (X critical, Y informational)
-**CRITICAL** (blocking /ship):
-- [file:line] Problem description
- Fix: suggested fix
+**AUTO-FIXED:**
+- [file:line] Problem → fix applied
-**Issues** (non-blocking):
+**NEEDS INPUT:**
- [file:line] Problem description
- Fix: suggested fix
+ Recommended fix: suggested fix
```
If no issues found: `Pre-Landing Review: No issues found.`
@@ 102,10 104,10 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo
---
-## Gate Classification
+## Severity Classification
```
-CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
+CRITICAL (highest severity): INFORMATIONAL (lower severity):
├─ SQL & Data Safety ├─ Conditional Side Effects
├─ Race Conditions & Concurrency ├─ Magic Numbers & String Coupling
├─ LLM Output Trust Boundary ├─ Dead Code & Consistency
@@ 115,7 117,38 @@ CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
├─ Time Window Safety
├─ Type Coercion at Boundaries
└─ View/Frontend
+
+All findings are actioned via Fix-First Review. Severity determines
+presentation order and classification of AUTO-FIX vs ASK — critical
+findings lean toward ASK (they're riskier), informational findings
+lean toward AUTO-FIX (they're more mechanical).
+```
+
+---
+
+## Fix-First Heuristic
+
+This heuristic is referenced by both `/review` and `/ship`. It determines whether
+the agent auto-fixes a finding or asks the user.
+
```
+AUTO-FIX (agent fixes without asking): ASK (needs human judgment):
+├─ Dead code / unused variables ├─ Security (auth, XSS, injection)
+├─ N+1 queries (missing .includes()) ├─ Race conditions
+├─ Stale comments contradicting code ├─ Design decisions
+├─ Magic numbers → named constants ├─ Large fixes (>20 lines)
+├─ Missing LLM output validation ├─ Enum completeness
+├─ Version/path mismatches ├─ Removing functionality
+├─ Variables assigned but never read └─ Anything changing user-visible
+└─ Inline styles, O(n*m) view lookups behavior
+```
+
+**Rule of thumb:** If the fix is mechanical and a senior engineer would apply it
+without discussion, it's AUTO-FIX. If reasonable engineers could disagree about
+the fix, it's ASK.
+
+**Critical findings default toward ASK** (they're inherently riskier).
+**Informational findings default toward AUTO-FIX** (they're more mechanical).
---
M ship/SKILL.md => ship/SKILL.md +18 -11
@@ 107,7 107,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
-- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
+- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
@@ 120,6 120,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
+- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
---
@@ 243,19 244,25 @@ Review the diff for structural issues that tests don't catch.
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories
-4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
+4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
+ checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
-5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
+ `[AUTO-FIXED] [file:line] Problem → what you did`
-6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
- - The problem (`file:line` + description)
- - `RECOMMENDATION: Choose A because [one-line reason]`
- - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
- After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
+6. **If ASK items remain,** present them in ONE AskUserQuestion:
+ - List each with number, severity, problem, recommended fix
+ - Per-item options: A) Fix B) Skip
+ - Overall RECOMMENDATION
+ - If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
-7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
+7. **After all fixes (auto + user-approved):**
+ - If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
+ - If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
-8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
+8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
+
+ If no issues found: `Pre-Landing Review: No issues found.`
Save the review output — it goes into the PR body in Step 8.
@@ 488,7 495,7 @@ EOF
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
-- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
+- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
M ship/SKILL.md.tmpl => ship/SKILL.md.tmpl +18 -11
@@ 25,7 25,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
-- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
+- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
@@ 38,6 38,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
+- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
---
@@ 161,19 162,25 @@ Review the diff for structural issues that tests don't catch.
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories
-4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
+4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
+ checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.
-5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
+ `[AUTO-FIXED] [file:line] Problem → what you did`
-6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
- - The problem (`file:line` + description)
- - `RECOMMENDATION: Choose A because [one-line reason]`
- - Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
- After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
+6. **If ASK items remain,** present them in ONE AskUserQuestion:
+ - List each with number, severity, problem, recommended fix
+ - Per-item options: A) Fix B) Skip
+ - Overall RECOMMENDATION
+ - If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead
-7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
+7. **After all fixes (auto + user-approved):**
+ - If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
+ - If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.
-8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
+8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`
+
+ If no issues found: `Pre-Landing Review: No issues found.`
Save the review output — it goes into the PR body in Step 8.
@@ 406,7 413,7 @@ EOF
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
-- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
+- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
M test/skill-validation.test.ts => test/skill-validation.test.ts +15 -2
@@ 559,8 559,8 @@ describe('Enum & Value Completeness in review checklist', () => {
expect(checklist).toContain('allowlist');
});
- test('Enum & Value Completeness is in the gate classification as CRITICAL', () => {
- const gateSection = checklist.slice(checklist.indexOf('## Gate Classification'));
+ test('Enum & Value Completeness is in the severity classification as CRITICAL', () => {
+ const gateSection = checklist.slice(checklist.indexOf('## Severity Classification'));
// The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
// Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));
@@ 568,6 568,19 @@ describe('Enum & Value Completeness in review checklist', () => {
// It's on the left (CRITICAL) side — starts with ├─ or └─
expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
});
+
+ test('Fix-First Heuristic exists in checklist and is referenced by review + ship', () => {
+ expect(checklist).toContain('## Fix-First Heuristic');
+ expect(checklist).toContain('AUTO-FIX');
+ expect(checklist).toContain('ASK');
+
+ const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
+ const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
+ expect(reviewSkill).toContain('AUTO-FIX');
+ expect(reviewSkill).toContain('[AUTO-FIXED]');
+ expect(shipSkill).toContain('AUTO-FIX');
+ expect(shipSkill).toContain('[AUTO-FIXED]');
+ });
});
// --- Part 7: Planted-bug fixture validation (A4) ---