~cytrogen/gstack

7ea6ead9fa88ba439002a6dc1d7409649b45e9f5 — Garry Tan 9 days ago a4a181c
fix: ship idempotency + skill prefix name patching (v0.14.3.0) (#693)

* fix: add idempotency guards to /ship Steps 4, 7, 8 (#649)

If git push succeeds but gh pr create fails, re-running /ship would
double-bump VERSION and duplicate CHANGELOG entries. Now:
- Step 4: check if VERSION already differs from base branch
- Step 7: fetch only the specific branch, skip push if already up to date
- Step 8: if PR exists, update body via gh pr edit instead of creating duplicate

No CHANGELOG guard needed — Step 5 is already idempotent by design
("replace existing entries with one unified entry").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: patch name: in SKILL.md frontmatter for prefix mode (#620, #578)

./setup --prefix creates gstack-* symlinks but SKILL.md still says
name: qa, so Claude Code ignores the prefix. Now:
- New bin/gstack-patch-names shared helper patches name: field via sed
- setup calls it after link_claude_skill_dirs
- gstack-relink calls it after symlink loop
- gen-skill-docs.ts prints warning when skill_prefix is true

Edge cases: gstack-upgrade not double-prefixed, root gstack skill
never prefixed, prefix removal restores original names, SKILL.md
without frontmatter is a safe no-op.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add name patching + ship idempotency tests (#620, #649)

- 4 unit tests for name: patching in relink.test.ts (prefix on/off,
  gstack-upgrade not double-prefixed, no-frontmatter no-op)
- 2 tests for gen-skill-docs prefix warning
- 1 E2E test for ship idempotency (periodic tier)
- Updated setupMockInstall to write SKILL.md with proper frontmatter
- Added ship-idempotency touchfiles + tier classification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.14.3.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: PR idempotency checks open state, dedupe touchfiles, sync package.json

- Step 8 PR guard now checks state==OPEN so closed PRs don't prevent
  new PR creation (adversarial review finding)
- Remove duplicate ship-idempotency entry in E2E_TOUCHFILES
- Sync package.json version to 0.14.3.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: patch name: before creating symlinks to fix --no-prefix ordering bug

gstack-patch-names must run BEFORE link_claude_skill_dirs so symlink
names reflect the correct (patched) name: values. Previously, switching
from --prefix to --no-prefix would read stale gstack-* names from
SKILL.md and create wrong symlinks. (Codex adversarial finding)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
M CHANGELOG.md => CHANGELOG.md +23 -0
@@ 1,5 1,28 @@
# Changelog

## [0.14.5.0] - 2026-03-31 — Ship Idempotency + Skill Prefix Fix

Re-running `/ship` after a failed push or PR creation no longer double-bumps your version or duplicates your CHANGELOG. And if you use `--prefix` mode, your skill names actually work now.

### Fixed

- **`/ship` is now idempotent (#649).** If push succeeds but PR creation fails (API outage, rate limit), re-running `/ship` detects the already-bumped VERSION, skips the push if already up to date, and updates the existing PR body instead of creating a duplicate. The CHANGELOG step was already idempotent by design ("replace with unified entry"), so no guard needed there.
- **Skill prefix actually patches `name:` in SKILL.md (#620, #578).** `./setup --prefix` and `gstack-relink` now patch the `name:` field in each skill's SKILL.md frontmatter to match the prefix setting. Previously, symlinks were prefixed but Claude Code read the unprefixed `name:` field and ignored the prefix entirely. Edge cases handled: `gstack-upgrade` not double-prefixed, root `gstack` skill never prefixed, prefix removal restores original names.
- **`gen-skill-docs` warns when prefix patches need re-applying.** After regenerating SKILL.md files, if `skill_prefix: true` is set in config, a warning reminds you to run `gstack-relink`.
- **PR idempotency checks open state.** The PR guard now verifies the existing PR is `OPEN`, so closed PRs don't block new PR creation.
- **`--no-prefix` ordering bug.** `gstack-patch-names` now runs before `link_claude_skill_dirs` so symlink names reflect the correct patched values.

### Added

- **`bin/gstack-patch-names` shared helper.** DRY extraction of the name-patching logic used by both `setup` and `gstack-relink`. Handles all edge cases (no frontmatter, already-prefixed, inherently-prefixed dirs) with portable `mktemp + mv` sed.

### For contributors

- 4 unit tests for name: patching in `relink.test.ts`
- 2 tests for gen-skill-docs prefix warning
- 1 E2E test for ship idempotency (periodic tier)
- Updated `setupMockInstall` to write SKILL.md with proper frontmatter

## [0.14.4.0] - 2026-03-31 — Review Army: Parallel Specialist Reviewers

Every `/review` now dispatches specialist subagents in parallel. Instead of one agent applying one giant checklist, you get focused reviewers for testing gaps, maintainability, security, performance, data migrations, API contracts, and adversarial red-teaming. Each specialist reads the diff independently with fresh context, outputs structured JSON findings, and the main agent merges, deduplicates, and boosts confidence when multiple specialists flag the same issue. Small diffs (<50 lines) skip specialists entirely for speed. Large diffs (200+ lines) activate the Red Team for adversarial analysis on top.

M VERSION => VERSION +1 -1
@@ 1,1 1,1 @@
0.14.4.0
0.14.5.0

A bin/gstack-patch-names => bin/gstack-patch-names +34 -0
@@ 0,0 1,34 @@
#!/usr/bin/env bash
# gstack-patch-names — patch name: field in SKILL.md frontmatter for prefix mode
# Usage: gstack-patch-names <gstack-dir> <true|false|1|0>
set -euo pipefail

GSTACK_DIR="$1"
DO_PREFIX="$2"

# Normalize prefix arg
case "$DO_PREFIX" in true|1) DO_PREFIX=1 ;; *) DO_PREFIX=0 ;; esac

PATCHED=0
for skill_dir in "$GSTACK_DIR"/*/; do
  [ -f "$skill_dir/SKILL.md" ] || continue
  dir_name="$(basename "$skill_dir")"
  [ "$dir_name" = "node_modules" ] && continue
  cur=$(grep -m1 '^name:' "$skill_dir/SKILL.md" 2>/dev/null | sed 's/^name:[[:space:]]*//' | tr -d '[:space:]' || true)
  [ -z "$cur" ] && continue
  [ "$cur" = "gstack" ] && continue  # never prefix root skill
  if [ "$DO_PREFIX" -eq 1 ]; then
    case "$cur" in gstack-*) continue ;; esac
    new="gstack-$cur"
  else
    case "$cur" in gstack-*) ;; *) continue ;; esac
    [ "$dir_name" = "$cur" ] && continue  # inherently prefixed (gstack-upgrade)
    new="${cur#gstack-}"
  fi
  tmp="$(mktemp "${skill_dir}/SKILL.md.XXXXXX")"
  sed "1,/^---$/s/^name:[[:space:]]*${cur}/name: ${new}/" "$skill_dir/SKILL.md" > "$tmp" && mv "$tmp" "$skill_dir/SKILL.md"
  PATCHED=$((PATCHED + 1))
done
if [ "$PATCHED" -gt 0 ]; then
  echo "  patched name: field in $PATCHED skills"
fi

M bin/gstack-relink => bin/gstack-relink +3 -0
@@ 66,6 66,9 @@ for skill_dir in "$INSTALL_DIR"/*/; do
  SKILL_COUNT=$((SKILL_COUNT + 1))
done

# Patch SKILL.md name: fields to match prefix setting
"$INSTALL_DIR/bin/gstack-patch-names" "$INSTALL_DIR" "$PREFIX"

if [ "$PREFIX" = "true" ]; then
  echo "Relinked $SKILL_COUNT skills as gstack-*"
else

M package.json => package.json +1 -1
@@ 1,6 1,6 @@
{
  "name": "gstack",
  "version": "0.14.4.0",
  "version": "0.14.5.0",
  "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
  "license": "MIT",
  "type": "module",

M scripts/gen-skill-docs.ts => scripts/gen-skill-docs.ts +13 -0
@@ 472,3 472,16 @@ if (failures.length > 0 && HOST_ARG_VAL === 'all') {
  if (failures.some(f => f.host === 'claude')) process.exit(1);
}
// Single host dry-run failure already handled above

// After all hosts processed, warn if prefix patches may need re-applying
if (!DRY_RUN) {
  try {
    const configPath = path.join(process.env.HOME || '', '.gstack', 'config.yaml');
    if (fs.existsSync(configPath)) {
      const config = fs.readFileSync(configPath, 'utf-8');
      if (/^skill_prefix:\s*true/m.test(config)) {
        console.log('\nNote: skill_prefix is true. Run gstack-relink to re-apply name: patches.');
      }
    }
  } catch { /* non-fatal */ }
}

M setup => setup +3 -0
@@ 566,6 566,9 @@ if [ "$INSTALL_CLAUDE" -eq 1 ]; then
    else
      cleanup_prefixed_claude_symlinks "$SOURCE_GSTACK_DIR" "$INSTALL_SKILLS_DIR"
    fi
    # Patch name: fields BEFORE creating symlinks so link_claude_skill_dirs
    # reads the correct (patched) name: values for symlink naming
    "$SOURCE_GSTACK_DIR/bin/gstack-patch-names" "$SOURCE_GSTACK_DIR" "$SKILL_PREFIX"
    link_claude_skill_dirs "$SOURCE_GSTACK_DIR" "$INSTALL_SKILLS_DIR"
    if [ "$LOCAL_INSTALL" -eq 1 ]; then
      echo "gstack ready (project-local)."

M ship/SKILL.md => ship/SKILL.md +37 -2
@@ 1791,6 1791,17 @@ already knows. A good test: would this insight save time in a future session? If

## Step 4: Version bump (auto-decide)

**Idempotency check:** Before bumping, compare VERSION against the base branch.

```bash
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null || echo "0.0.0.0")
CURRENT_VERSION=$(cat VERSION 2>/dev/null || echo "0.0.0.0")
echo "BASE: $BASE_VERSION  HEAD: $CURRENT_VERSION"
if [ "$CURRENT_VERSION" != "$BASE_VERSION" ]; then echo "ALREADY_BUMPED"; fi
```

If output shows `ALREADY_BUMPED`, VERSION was already bumped on this branch (prior `/ship` run). Skip the rest of Step 4 and use the current VERSION. Otherwise proceed with the bump.

1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)

2. **Auto-decide the bump level based on the diff:**


@@ 1970,7 1981,17 @@ Claiming work is complete without verification is dishonesty, not efficiency.

## Step 7: Push

Push to the remote with upstream tracking:
**Idempotency check:** Check if the branch is already pushed and up to date.

```bash
git fetch origin <branch-name> 2>/dev/null
LOCAL=$(git rev-parse HEAD)
REMOTE=$(git rev-parse origin/<branch-name> 2>/dev/null || echo "none")
echo "LOCAL: $LOCAL  REMOTE: $REMOTE"
[ "$LOCAL" = "$REMOTE" ] && echo "ALREADY_PUSHED" || echo "PUSH_NEEDED"
```

If `ALREADY_PUSHED`, skip the push. Otherwise push with upstream tracking:

```bash
git push -u origin <branch-name>


@@ 1980,7 2001,21 @@ git push -u origin <branch-name>

## Step 8: Create PR/MR

Create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
**Idempotency check:** Check if a PR/MR already exists for this branch.

**If GitHub:**
```bash
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
```

**If GitLab:**
```bash
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
```

If an **open** PR/MR already exists: **update** the PR body with the latest test results, coverage, and review findings using `gh pr edit --body "..."` (GitHub) or `glab mr update -d "..."` (GitLab). Print the existing URL and continue to Step 8.5.

If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.

The PR/MR body should contain these sections:


M ship/SKILL.md.tmpl => ship/SKILL.md.tmpl +37 -2
@@ 330,6 330,17 @@ For each classified comment:

## Step 4: Version bump (auto-decide)

**Idempotency check:** Before bumping, compare VERSION against the base branch.

```bash
BASE_VERSION=$(git show origin/<base>:VERSION 2>/dev/null || echo "0.0.0.0")
CURRENT_VERSION=$(cat VERSION 2>/dev/null || echo "0.0.0.0")
echo "BASE: $BASE_VERSION  HEAD: $CURRENT_VERSION"
if [ "$CURRENT_VERSION" != "$BASE_VERSION" ]; then echo "ALREADY_BUMPED"; fi
```

If output shows `ALREADY_BUMPED`, VERSION was already bumped on this branch (prior `/ship` run). Skip the rest of Step 4 and use the current VERSION. Otherwise proceed with the bump.

1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)

2. **Auto-decide the bump level based on the diff:**


@@ 469,7 480,17 @@ Claiming work is complete without verification is dishonesty, not efficiency.

## Step 7: Push

Push to the remote with upstream tracking:
**Idempotency check:** Check if the branch is already pushed and up to date.

```bash
git fetch origin <branch-name> 2>/dev/null
LOCAL=$(git rev-parse HEAD)
REMOTE=$(git rev-parse origin/<branch-name> 2>/dev/null || echo "none")
echo "LOCAL: $LOCAL  REMOTE: $REMOTE"
[ "$LOCAL" = "$REMOTE" ] && echo "ALREADY_PUSHED" || echo "PUSH_NEEDED"
```

If `ALREADY_PUSHED`, skip the push. Otherwise push with upstream tracking:

```bash
git push -u origin <branch-name>


@@ 479,7 500,21 @@ git push -u origin <branch-name>

## Step 8: Create PR/MR

Create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.
**Idempotency check:** Check if a PR/MR already exists for this branch.

**If GitHub:**
```bash
gh pr view --json url,number,state -q 'if .state == "OPEN" then "PR #\(.number): \(.url)" else "NO_PR" end' 2>/dev/null || echo "NO_PR"
```

**If GitLab:**
```bash
glab mr view -F json 2>/dev/null | jq -r 'if .state == "opened" then "MR_EXISTS" else "NO_MR" end' 2>/dev/null || echo "NO_MR"
```

If an **open** PR/MR already exists: **update** the PR body with the latest test results, coverage, and review findings using `gh pr edit --body "..."` (GitHub) or `glab mr update -d "..."` (GitLab). Print the existing URL and continue to Step 8.5.

If no PR/MR exists: create a pull request (GitHub) or merge request (GitLab) using the platform detected in Step 0.

The PR/MR body should contain these sections:


M test/gen-skill-docs.test.ts => test/gen-skill-docs.test.ts +46 -0
@@ 2496,3 2496,49 @@ describe('CONFIDENCE_CALIBRATION resolver', () => {
    }
  });
});

describe('gen-skill-docs prefix warning (#620/#578)', () => {
  const { execSync } = require('child_process');

  test('warns about skill_prefix when config has prefix=true', () => {
    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-prefix-warn-'));
    try {
      // Create a fake ~/.gstack/config.yaml with skill_prefix: true
      const fakeHome = tmpDir;
      const fakeGstack = path.join(fakeHome, '.gstack');
      fs.mkdirSync(fakeGstack, { recursive: true });
      fs.writeFileSync(path.join(fakeGstack, 'config.yaml'), 'skill_prefix: true\n');

      const output = execSync('bun run scripts/gen-skill-docs.ts', {
        cwd: ROOT,
        env: { ...process.env, HOME: fakeHome },
        encoding: 'utf-8',
        timeout: 30000,
      });
      expect(output).toContain('skill_prefix is true');
      expect(output).toContain('gstack-relink');
    } finally {
      fs.rmSync(tmpDir, { recursive: true, force: true });
    }
  });

  test('no warning when skill_prefix is false or absent', () => {
    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-prefix-warn-'));
    try {
      const fakeHome = tmpDir;
      const fakeGstack = path.join(fakeHome, '.gstack');
      fs.mkdirSync(fakeGstack, { recursive: true });
      fs.writeFileSync(path.join(fakeGstack, 'config.yaml'), 'skill_prefix: false\n');

      const output = execSync('bun run scripts/gen-skill-docs.ts', {
        cwd: ROOT,
        env: { ...process.env, HOME: fakeHome },
        encoding: 'utf-8',
        timeout: 30000,
      });
      expect(output).not.toContain('skill_prefix is true');
    } finally {
      fs.rmSync(tmpDir, { recursive: true, force: true });
    }
  });
});

M test/helpers/touchfiles.ts => test/helpers/touchfiles.ts +2 -0
@@ 131,6 131,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  // Plan completion audit + verification
  'ship-plan-completion': ['ship/**', 'scripts/gen-skill-docs.ts'],
  'ship-plan-verification': ['ship/**', 'qa-only/**', 'scripts/gen-skill-docs.ts'],
  'ship-idempotency':       ['ship/**', 'scripts/resolvers/utility.ts'],
  'review-plan-completion': ['review/**', 'scripts/gen-skill-docs.ts'],

  // Design


@@ 247,6 248,7 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
  'ship-triage': 'gate',
  'ship-plan-completion': 'gate',
  'ship-plan-verification': 'gate',
  'ship-idempotency': 'periodic',

  // Retro — gate for cheap branch detection, periodic for full Opus retro
  'retro': 'periodic',

M test/relink.test.ts => test/relink.test.ts +79 -2
@@ 42,11 42,18 @@ function setupMockInstall(skills: string[]): void {
    fs.copyFileSync(path.join(BIN, 'gstack-relink'), path.join(mockBin, 'gstack-relink'));
    fs.chmodSync(path.join(mockBin, 'gstack-relink'), 0o755);
  }
  if (fs.existsSync(path.join(BIN, 'gstack-patch-names'))) {
    fs.copyFileSync(path.join(BIN, 'gstack-patch-names'), path.join(mockBin, 'gstack-patch-names'));
    fs.chmodSync(path.join(mockBin, 'gstack-patch-names'), 0o755);
  }

  // Create mock skill directories
  // Create mock skill directories with proper frontmatter
  for (const skill of skills) {
    fs.mkdirSync(path.join(installDir, skill), { recursive: true });
    fs.writeFileSync(path.join(installDir, skill, 'SKILL.md'), `# ${skill}`);
    fs.writeFileSync(
      path.join(installDir, skill, 'SKILL.md'),
      `---\nname: ${skill}\ndescription: test\n---\n# ${skill}`
    );
  }
}



@@ 150,3 157,73 @@ describe('gstack-relink (#578)', () => {
    expect(fs.existsSync(path.join(skillsDir, 'gstack-ship'))).toBe(true);
  });
});

describe('gstack-patch-names (#620/#578)', () => {
  // Helper to read name: from SKILL.md frontmatter
  function readSkillName(skillDir: string): string | null {
    const content = fs.readFileSync(path.join(skillDir, 'SKILL.md'), 'utf-8');
    const match = content.match(/^name:\s*(.+)$/m);
    return match ? match[1].trim() : null;
  }

  test('prefix=true patches name: field in SKILL.md', () => {
    setupMockInstall(['qa', 'ship', 'review']);
    run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`);
    run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
      GSTACK_INSTALL_DIR: installDir,
      GSTACK_SKILLS_DIR: skillsDir,
    });
    // Verify name: field is patched with gstack- prefix
    expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
    expect(readSkillName(path.join(installDir, 'ship'))).toBe('gstack-ship');
    expect(readSkillName(path.join(installDir, 'review'))).toBe('gstack-review');
  });

  test('prefix=false restores name: field in SKILL.md', () => {
    setupMockInstall(['qa', 'ship']);
    // First, prefix them
    run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`);
    run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
      GSTACK_INSTALL_DIR: installDir,
      GSTACK_SKILLS_DIR: skillsDir,
    });
    expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
    // Now switch to flat mode
    run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`);
    run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
      GSTACK_INSTALL_DIR: installDir,
      GSTACK_SKILLS_DIR: skillsDir,
    });
    // Verify name: field is restored to unprefixed
    expect(readSkillName(path.join(installDir, 'qa'))).toBe('qa');
    expect(readSkillName(path.join(installDir, 'ship'))).toBe('ship');
  });

  test('gstack-upgrade name: not double-prefixed', () => {
    setupMockInstall(['qa', 'gstack-upgrade']);
    run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`);
    run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
      GSTACK_INSTALL_DIR: installDir,
      GSTACK_SKILLS_DIR: skillsDir,
    });
    // gstack-upgrade should keep its name, NOT become gstack-gstack-upgrade
    expect(readSkillName(path.join(installDir, 'gstack-upgrade'))).toBe('gstack-upgrade');
    // Regular skill should be prefixed
    expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
  });

  test('SKILL.md without frontmatter is a no-op', () => {
    setupMockInstall(['qa']);
    // Overwrite qa SKILL.md with no frontmatter
    fs.writeFileSync(path.join(installDir, 'qa', 'SKILL.md'), '# qa\nSome content.');
    run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`);
    // Should not crash
    run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
      GSTACK_INSTALL_DIR: installDir,
      GSTACK_SKILLS_DIR: skillsDir,
    });
    // Content should be unchanged (no name: to patch)
    const content = fs.readFileSync(path.join(installDir, 'qa', 'SKILL.md'), 'utf-8');
    expect(content).toBe('# qa\nSome content.');
  });
});

M test/skill-e2e.test.ts => test/skill-e2e.test.ts +96 -0
@@ 3313,6 3313,102 @@ Write your summary to ${benefitsDir}/benefits-summary.md`,
  }, 180_000);
});

// --- Ship idempotency (#649) ---
describeIfSelected('Ship idempotency', ['ship-idempotency'], () => {
  let idempDir: string;
  const gitRun = (args: string[], cwd: string) =>
    spawnSync('git', args, { cwd, stdio: 'pipe', timeout: 5000 });

  beforeAll(() => {
    idempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-ship-idemp-'));

    // Create git repo with initial commit on main
    gitRun(['init', '-b', 'main'], idempDir);
    gitRun(['config', 'user.email', 'test@test.com'], idempDir);
    gitRun(['config', 'user.name', 'Test'], idempDir);

    fs.writeFileSync(path.join(idempDir, 'app.ts'), 'console.log("v1");\n');
    fs.writeFileSync(path.join(idempDir, 'VERSION'), '0.1.0.0\n');
    fs.writeFileSync(path.join(idempDir, 'CHANGELOG.md'), '# Changelog\n');
    gitRun(['add', '.'], idempDir);
    gitRun(['commit', '-m', 'initial'], idempDir);

    // Create feature branch with changes
    gitRun(['checkout', '-b', 'feat/my-feature'], idempDir);
    fs.writeFileSync(path.join(idempDir, 'app.ts'), 'console.log("v2");\n');
    gitRun(['add', 'app.ts'], idempDir);
    gitRun(['commit', '-m', 'feat: update to v2'], idempDir);

    // Simulate prior /ship run: bump VERSION and write CHANGELOG entry
    fs.writeFileSync(path.join(idempDir, 'VERSION'), '0.2.0.0\n');
    fs.writeFileSync(path.join(idempDir, 'CHANGELOG.md'),
      '# Changelog\n\n## [0.2.0.0] — 2026-03-30\n\n- Updated app to v2\n');
    gitRun(['add', 'VERSION', 'CHANGELOG.md'], idempDir);
    gitRun(['commit', '-m', 'chore: bump version to 0.2.0.0'], idempDir);

    // Extract just the idempotency-relevant sections from ship/SKILL.md
    const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
    const step4Start = full.indexOf('## Step 4: Version bump');
    const step4End = full.indexOf('\n---\n', step4Start);
    const step7Start = full.indexOf('## Step 7: Push');
    const step8End = full.indexOf('## Step 8.5');
    const extracted = [
      full.slice(step4Start, step4End > step4Start ? step4End : step4Start + 500),
      full.slice(step7Start, step8End > step7Start ? step8End : step7Start + 500),
    ].join('\n\n---\n\n');
    fs.writeFileSync(path.join(idempDir, 'ship-steps.md'), extracted);
  });

  afterAll(() => {
    try { fs.rmSync(idempDir, { recursive: true, force: true }); } catch {}
  });

  testIfSelected('ship-idempotency', async () => {
    const result = await runSkillTest({
      prompt: `You are in a git repo on branch feat/my-feature. A prior /ship run already:
- Bumped VERSION from 0.1.0.0 to 0.2.0.0
- Wrote a CHANGELOG entry for 0.2.0.0
- But the push/PR step failed

Read ship-steps.md for the idempotency check instructions from the ship workflow.

Run ONLY the idempotency checks described in Steps 4 and 7. Do NOT actually push or create PRs (there is no remote).

After running the checks, write a report to ${idempDir}/idemp-result.md containing:
- Whether VERSION was detected as ALREADY_BUMPED or not
- Whether the push was detected as ALREADY_PUSHED or PUSH_NEEDED
- The current VERSION value (should still be 0.2.0.0)

Do NOT modify VERSION or CHANGELOG. Only run the detection checks and report.`,
      workingDirectory: idempDir,
      maxTurns: 10,
      timeout: 60_000,
      testName: 'ship-idempotency',
      runId,
    });

    logCost('/ship idempotency', result);
    recordE2E('/ship idempotency guard', 'Ship idempotency', result);
    expect(result.exitReason).toBe('success');

    // Verify VERSION was NOT modified
    const version = fs.readFileSync(path.join(idempDir, 'VERSION'), 'utf-8').trim();
    expect(version).toBe('0.2.0.0');

    // Verify CHANGELOG was NOT duplicated
    const changelog = fs.readFileSync(path.join(idempDir, 'CHANGELOG.md'), 'utf-8');
    const versionEntries = (changelog.match(/## \[0\.2\.0\.0\]/g) || []).length;
    expect(versionEntries).toBe(1);

    // Check the result report if it was written
    const reportPath = path.join(idempDir, 'idemp-result.md');
    if (fs.existsSync(reportPath)) {
      const report = fs.readFileSync(reportPath, 'utf-8');
      expect(report.toLowerCase()).toContain('already_bumped');
    }
  }, 120_000);
});

// Module-level afterAll — finalize eval collector after all tests complete
afterAll(async () => {
  if (evalCollector) {