name: ship version: 1.0.0 description: | Ship workflow: merge main, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. allowed-tools:
_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
If output shows UPGRADE_AVAILABLE <old> <new>: read ~/.claude/skills/gstack/gstack-upgrade/SKILL.md and follow the "Inline upgrade flow" (AskUserQuestion → upgrade if yes, touch ~/.gstack/last-update-check if no). If JUST_UPGRADED <from> <to>: tell user "Running gstack v{to} (just updated!)" and continue.
You are running the /ship workflow. This is a non-interactive, fully automated workflow. Do NOT ask for confirmation at any step. The user said /ship which means DO IT. Run straight through and output the PR URL at the end.
Only stop for:
main branch (abort)Never stop for:
Check the current branch. If on main, abort: "You're on main. Ship from a feature branch."
Run git status (never use -uall). Uncommitted changes are always included — no need to ask.
Run git diff main...HEAD --stat and git log main..HEAD --oneline to understand what's being shipped.
Fetch and merge origin/main into the feature branch so tests run against the merged state:
git fetch origin main && git merge origin/main --no-edit
If there are merge conflicts: Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, STOP and show them.
If already up to date: Continue silently.
Do NOT run RAILS_ENV=test bin/rails db:migrate — bin/test-lane already calls
db:test:prepare internally, which loads the schema into the correct lane database.
Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
Run both test suites in parallel:
bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
npm run test 2>&1 | tee /tmp/ship_vitest.txt &
wait
After both complete, read the output files and check pass/fail.
If any test fails: Show the failures and STOP. Do not proceed.
If all pass: Continue silently — just note the counts briefly.
Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
1. Check if the diff touches prompt-related files:
git diff origin/main --name-only
Match against these patterns (from CLAUDE.md):
app/services/*_prompt_builder.rbapp/services/*_generation_service.rb, *_writer_service.rb, *_designer_service.rbapp/services/*_evaluator.rb, *_scorer.rb, *_classifier_service.rb, *_analyzer.rbapp/services/concerns/*voice*.rb, *writing*.rb, *prompt*.rb, *token*.rbapp/services/chat_tools/*.rb, app/services/x_thread_tools/*.rbconfig/system_prompts/*.txttest/evals/**/* (eval infrastructure changes affect all suites)If no matches: Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
2. Identify affected eval suites:
Each eval runner (test/evals/*_eval_runner.rb) declares PROMPT_SOURCE_FILES listing which source files affect it. Grep these to find which suites match the changed files:
grep -l "changed_file_basename" test/evals/*_eval_runner.rb
Map runner → test file: post_generation_eval_runner.rb → post_generation_eval_test.rb.
Special cases:
test/evals/judges/*.rb, test/evals/support/*.rb, or test/evals/fixtures/ affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.config/system_prompts/*.txt — grep eval runners for the prompt filename to find affected suites.3. Run affected suites at EVAL_JUDGE_TIER=full:
/ship is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
4. Check results:
5. Save eval output — include eval results and cost dashboard in the PR body (Step 8).
Tier reference (for context — /ship always uses full):
| Tier | When | Speed (cached) | Cost |
|---|---|---|---|
fast (Haiku) |
Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
standard (Sonnet) |
Default dev, bin/test-lane --eval |
~17s (4x faster) | ~$0.37/run |
full (Opus persona) |
/ship and pre-merge |
~72s (baseline) | ~$1.27/run |
Review the diff for structural issues that tests don't catch.
Read .claude/skills/review/checklist.md. If the file cannot be read, STOP and report the error.
Run git diff origin/main to get the full diff (scoped to feature changes against the freshly-fetched remote main).
Apply the review checklist in two passes:
Always output ALL findings — both critical and informational. The user must see every issue found.
Output a summary header: Pre-Landing Review: N issues (X critical, Y informational)
If CRITICAL issues found: For EACH critical issue, use a separate AskUserQuestion with:
file:line + description)git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"), then STOP and tell the user to run /ship again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.If only non-critical issues found: Output them and continue. They will be included in the PR body at Step 8.
If no issues found: Output Pre-Landing Review: No issues found. and continue.
Save the review output — it goes into the PR body in Step 8.
Read .claude/skills/review/greptile-triage.md and follow the fetch, filter, and classify steps.
If no PR exists, gh fails, API returns an error, or there are zero Greptile comments: Skip this step silently. Continue to Step 4.
If Greptile comments are found:
Include a Greptile summary in your output: + N Greptile comments (X valid, Y fixed, Z FP)
For each classified comment:
VALID & ACTIONABLE: Use AskUserQuestion with:
git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"), reply to the comment ("Fixed in <commit-sha>."), and save to both per-project and global greptile-history (see greptile-triage.md for write details, type: fix).VALID BUT ALREADY FIXED: Reply acknowledging the catch — no AskUserQuestion needed:
"Good catch — already fixed in <commit-sha>."FALSE POSITIVE: Use AskUserQuestion:
SUPPRESSED: Skip silently — these are known false positives from previous triage.
After all comments are resolved: If any fixes were applied, the tests from Step 3 are now stale. Re-run tests (Step 3) before continuing to Step 4. If no fixes were applied, continue to Step 4.
Read the current VERSION file (4-digit format: MAJOR.MINOR.PATCH.MICRO)
Auto-decide the bump level based on the diff:
git diff origin/main...HEAD --stat | tail -1)Compute the new version:
0.19.1.0 + PATCH → 0.19.2.0Write the new version to the VERSION file.
Read CHANGELOG.md header to know the format.
Auto-generate the entry from ALL commits on the branch (not just recent ones):
git log main..HEAD --oneline to see every commit being shippedgit diff main...HEAD to see the full diff against main### Added — new features### Changed — changes to existing functionality### Fixed — bug fixes### Removed — removed features## [X.Y.Z.W] - YYYY-MM-DDDo NOT ask the user to describe changes. Infer from the diff and commit history.
Goal: Create small, logical commits that work well with git bisect and help LLMs understand what changed.
Analyze the diff and group changes into logical commits. Each commit should represent one coherent change — not one file, but one logical unit.
Commit ordering (earlier commits first):
Rules for splitting:
Each commit must be independently valid — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
Compose each commit message:
<type>: <summary> (type = feat/fix/chore/refactor/docs)git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"
Push to the remote with upstream tracking:
git push -u origin <branch-name>
Create a pull request using gh:
gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
## Summary
<bullet points from CHANGELOG>
## Pre-Landing Review
<findings from Step 3.5, or "No issues found.">
## Eval Results
<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
## Greptile Review
<If Greptile comments were found: bullet list with [FIXED] / [FALSE POSITIVE] / [ALREADY FIXED] tag + one-line summary per comment>
<If no Greptile comments found: "No Greptile comments.">
<If no PR existed during Step 3.75: omit this section entirely>
## Test plan
- [x] All Rails tests pass (N runs, 0 failures)
- [x] All Vitest tests pass (N tests)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
Output the PR URL — this should be the final output the user sees.
git push only.YYYY-MM-DD/ship, next thing they see is the review + PR URL.