feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621)
* refactor: extract processExternalHost() shared helper for multi-host generation
Refactor the Codex-specific output routing block in gen-skill-docs.ts into
a shared processExternalHost() function. Both Codex and future external hosts
(Factory Droid) will use this helper for output routing, symlink loop detection,
frontmatter transformation, path rewrites, and metadata generation.
- Rename codexSkillName() to externalSkillName() everywhere
- Extract ExternalHostConfig interface with per-host settings
- Codex output is byte-identical (verified via --dry-run)
- Skip /codex skill for all non-Claude hosts (not just codex)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Factory Droid host type, preamble, and co-author trailer
- Add 'factory' to Host union type with .factory/skills/gstack paths
- Extend preamble runtime root detection for Factory ($HOME/.factory/)
- Add GSTACK_DESIGN env var to preamble (was missing for Codex too)
- Add Factory Droid co-author trailer for git commits
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid generation, --host all, and host-aware frontmatter
- Add --host factory (alias: --host droid) to gen-skill-docs
- Add --host all: generates for claude, codex, and factory in one invocation
with fault-tolerant per-host error handling (only fails if claude fails)
- Factory frontmatter: name + description + user-invocable: true
- Factory sensitive skills: disable-model-invocation: true (from sensitive: field)
- Claude: strips sensitive: field from output (only Factory uses it)
- Factory tool name translation: Claude tool names → generic phrasing
- Replace chained gen:skill-docs calls with --host all in package.json build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sensitive frontmatter for Factory Droid auto-invocation safety
Add sensitive: true to 6 skill templates with side effects that Factory
Droids shouldn't auto-invoke (ship, land-and-deploy, guard, careful,
freeze, unfreeze). The field is:
- Factory: emitted as disable-model-invocation: true
- Claude/Codex: stripped from output by transformFrontmatter()
Also fix Claude host path: call transformFrontmatter() for Claude to
strip the sensitive: field from Claude output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: gstack-platform-detect binary for multi-host debugging
Bash script that prints a table of installed AI coding agents (Claude,
Codex, Factory Droid, Kiro) with versions, skill paths, and gstack
installation status. Useful for debugging multi-host setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid support in setup script
- Add factory to --host values (auto-detected via command -v droid)
- Add .factory/ skill doc generation step alongside .agents/
- Add create_factory_runtime_root() and link_factory_skill_dirs()
helpers mirroring the Codex equivalents
- Factory install section creates ~/.factory/skills/ with symlinks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Factory Droid awareness in skill-check and uninstall
- skill-check.ts: add Factory skills validation and freshness check
- gstack-uninstall: add Factory artifact cleanup (~/.factory/skills/gstack*
and per-project .factory/ sidecar)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: Factory Droid generation + --host all test suites
Add 13 new tests:
- Factory output paths, frontmatter (user-invocable, disable-model-invocation)
- Sensitive vs non-sensitive skill classification
- Path rewrites (no .claude/skills/ in Factory output)
- /codex skill exclusion, openai.yaml absence
- Factory keeps Codex integration blocks (for second opinions)
- --host droid alias, --dry-run freshness, preamble paths
- --host all generates for all 3 hosts
- Setup script host validation updated for factory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: Factory Droid install instructions + CI freshness check
- README: add Factory Droid section with install instructions and
restart note (Factory requires restart to rescan skills)
- CI: add Factory skill doc freshness verification to skill-docs.yml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: generated Factory Droid skill output (.factory/skills/)
29 skills generated for Factory Droid with:
- user-invocable: true on all skills
- disable-model-invocation: true on 6 sensitive skills
- .factory/skills/ paths (no .claude/skills/ references)
- $GSTACK_ROOT env vars for runtime root detection
- Tool name translation (Claude tool names → generic phrasing)
Committed to git for CI freshness checks and direct consumption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add Factory Droid P1 TODO for browse MCP server
Add 3 TODOs under new ## Factory Droid section:
- P1: Browse MCP server (Option B, deeper Factory integration)
- P3: .agent/skills/ dual output for cross-agent compatibility
- P3: Custom Droid definitions alongside skills
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: zsh glob compatibility across all skill templates (v0.12.8.1) (#559)
* fix: replace zsh-incompatible raw globs with find-based alternatives and setopt guards
Zsh's NOMATCH option (on by default) causes raw globs like `*.yaml` and
`*deploy*` to throw errors when no files match, instead of silently expanding
to nothing as bash does. The preamble resolver already handled this correctly
with find, but 38 glob instances across 13 templates and 2 resolvers still
used raw shell globs.
Two fix approaches based on complexity:
- find-based replacement for cat/for/ls-with-pipes patterns (.github/workflows/)
- setopt +o nomatch guard for simple ls -t patterns (~/.gstack/, ~/.claude/)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files from updated templates
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.12.8.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add zsh glob safety test + fix 2 missed resolver globs
Adds a test that scans all generated SKILL.md bash blocks for raw glob
patterns and verifies they have either a find-based replacement or a
setopt +o nomatch guard. The test immediately caught 2 unguarded blocks
in review.ts (design doc re-check and plan file discovery).
Also syncs package.json version to 0.12.8.1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552)
* fix(security): skip hidden directories in skill template discovery
discoverTemplates() scans subdirectories for SKILL.md.tmpl files but
only skips node_modules, .git, and dist. Hidden directories like
.claude/, .agents/, and .codex/ (which contain symlinked skill
installs) were being scanned, allowing a malicious .tmpl in a
symlinked skill to inject into the generation pipeline.
Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips
all dot-prefixed directories, matching the standard convention that
hidden dirs are not source code.
* fix(security): sanitize telemetry JSONL inputs against injection
SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly
into printf %s for JSONL output. If any contain double quotes,
backslashes, or newlines, the JSON breaks — or worse, injects
arbitrary fields.
Fix: strip quotes, backslashes, and control characters from all
string fields before JSONL construction via json_safe() helper.
* fix(security): validate JSON input in gstack-review-log
gstack-review-log appends its argument directly to a JSONL file with
no validation. Malformed or crafted input could corrupt the review log
or inject arbitrary content.
Fix: validate input is parseable JSON via python3 before appending.
Reject with exit 1 and stderr message if invalid.
* fix: treat relative dot-paths as file paths in screenshot command
Closes #495
* fix: use host-specific co-author trailer in /ship and /document-release
Codex-generated skills hardcoded a Claude co-author trailer in commit
messages. Users running gstack under Codex pushed commits attributed
to the wrong AI assistant.
Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer
based on ctx.host:
- claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- codex: Co-Authored-By: OpenAI Codex <noreply@openai.com>
Replace hardcoded trailers in ship/SKILL.md.tmpl and
document-release/SKILL.md.tmpl with the resolver placeholder.
Fixes #282. Fixes #383.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: auto-upgrade marker no longer masks newer remote versions
When a just-upgraded-from marker persists across sessions, the update
check would write UP_TO_DATE to cache and exit immediately — never
fetching the remote VERSION. Users silently miss updates that landed
after their last upgrade.
Remove the early exit and premature cache write so the script falls
through to the remote check after consuming the marker. This ensures
JUST_UPGRADED is still emitted for the preamble, while also detecting
any newer versions available upstream.
Fixes #515
* fix: decouple doc generation from binary compilation in build script
The build script chains gen:skill-docs and bun build --compile with &&,
so a doc generation failure (e.g. missing Codex host config, template
error) prevents the browse binary from being compiled. Users end up
with a broken install where setup reports the binary is missing.
Replace && with ; for the two gen:skill-docs steps so they run
independently of the compilation chain. Doc generation errors are still
visible in stderr, but no longer block binary compilation.
Fixes #482
* fix: extend security sanitization + add 10 tests for merged community PRs
- Extend json_safe() to ERROR_CLASS and FAILED_STEP fields
- Improve ERROR_MESSAGE escaping to handle backslashes and newlines
- Replace python3 with bun for JSON validation in gstack-review-log
- Add 7 telemetry injection prevention tests
- Add 2 review-log JSON validation tests
- Add 1 discover-skills hidden directory filtering test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via)
browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction)
ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md
dashboard-via: extract dashboard section only + increase timeout 90s→180s
Root cause: copying full SKILL.md files into test fixtures caused context bloat,
leading to timeouts and flaky turn limits. Extracting only the relevant section
cut dashboard-via from timing out at 240s to finishing in 38s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add E2E fixture extraction rule to CLAUDE.md
Never copy full SKILL.md files into E2E test fixtures. Extract only
the section the test needs. Also: run targeted evals in foreground,
never pkill and restart mid-run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize journey-think-bigger routing test
Use exact trigger phrases from plan-ceo-review skill description
("think bigger", "expand scope", "ambitious enough") instead of
the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut
cost per attempt ($0.12 vs $0.25). Test remains periodic tier
since LLM routing is inherently non-deterministic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* remove: delete journey-think-bigger routing test
Never passed reliably. Tests ambiguous routing ("think bigger" →
plan-ceo-review) but Claude legitimately answers directly instead
of invoking a skill. The other 10 journey tests cover routing
with clear, actionable signals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.12.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: bluzername <bluzer@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>
feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518)
* feat: /land-and-deploy first-run dry-run, staging-first, trust ladder
First run shows a dry run — detect deploy infrastructure, validate commands,
show what will happen — then confirm before proceeding. Staging-first option
when staging detected. Config decay: re-triggers dry run if deploy config
changes. Full wordsmithed copy for every user-facing message.
Key changes:
- Step 1.5: first-run dry-run with infrastructure validation table
- Step 3.5a-bis: inline review gate before deploy
- Step 4a/4b: merge queue + CI auto-deploy detection and messaging
- Step 5a: staging-first option with verify-then-promote flow
- Voice & Tone section: narrate-the-journey, teacher mode vs efficient mode
- Config fingerprinting: trust decays when deploy config changes
* chore: bump version and changelog (v0.12.2.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
feat: GitLab support for /retro, /ship, and /document-release (v0.11.20.0) (#508)
* feat: multi-platform BASE_BRANCH_DETECT (GitHub + GitLab + GHE + git-native)
Update the shared BASE_BRANCH_DETECT resolver to support GitHub, GitLab,
GitHub Enterprise, self-hosted GitLab, and a git-native fallback chain.
Platform detection uses remote URL matching plus CLI auth status for
custom domains. Add glab issue create alternative in test failure triage.
Add 7 new test assertions covering GitLab CLI presence, git symbolic-ref
fallback, and platform-specific output in retro and ship generated files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: GitLab support in /retro — use shared BASE_BRANCH_DETECT resolver
Replace retro's custom gh-only default branch detection with the shared
BASE_BRANCH_DETECT resolver (DRY — same as 10 other skills). Update
PR/MR number extraction to match both GitHub #NNN and GitLab !NNN
patterns. Remove hardcoded github.com URL from the personal card footer.
Regenerate all SKILL.md files affected by the resolver update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: GitLab MR creation in /ship + /document-release
Ship Step 1.5 now checks .gitlab-ci.yml for release workflows alongside
GitHub Actions. Step 8 routes to glab mr create on GitLab repos with
correct flag mapping (-b, -t, -d). Falls back to manual instructions
when no CLI is available. Document-release now reads MR body via
glab mr view -F json and updates via glab mr update on GitLab repos.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add P2 TODO for land-and-deploy GitLab support
Track the remaining work to support GitLab in /land-and-deploy — MR
merge, CI polling, and deploy workflow detection using glab equivalents.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: adversarial review — GitLab gate, shell safety, MR prefix preservation
Three fixes from adversarial review:
1. land-and-deploy: add GitLab gate after Step 0 — prevents detection/
execution mismatch where agent detects GitLab but all subsequent
steps are GitHub-only
2. document-release: use heredoc for glab mr update body to avoid shell
metacharacter mangling ($, backticks, !) in MR descriptions
3. retro: preserve original #/! prefix in PR/MR number extraction —
GitLab !42 stays as !42, not incorrectly converted to #42
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflicts — deduplicate gen-skill-docs resolvers
The merge from main created duplicate RESOLVERS records in gen-skill-docs.ts
(inline functions shadowing the imported module versions). Removed the inline
duplicates so the modular resolvers from scripts/resolvers/ are used.
Also added missing E2E_TIERS entries for plan-completion/verification tests.
* chore: bump version and changelog (v0.11.20.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425)
* refactor: extract gen-skill-docs into modular resolver architecture
Break the 3000-line monolith into 10 domain modules under scripts/resolvers/:
types, constants, preamble, utility, browse, design, testing, review,
codex-helpers, and index. Each module owns one domain of template generation.
The preamble module introduces a 4-tier composition system (T1-T4) so skills
only pay for the preamble sections they actually need, reducing token usage
for lightweight skills by ~40%.
Adds a token budget dashboard that prints after every generation run showing
per-skill and total token counts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: tiered preamble — skills only pay for what they use
Tag all 23 templates with preamble-tier (T1-T4). Lightweight skills
like /browse and /benchmark get a minimal preamble (~40% fewer tokens),
while review skills get the full stack. Regenerate all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: migrate eval storage to project-scoped paths
Move eval results and E2E run artifacts from ~/.gstack-dev/evals/ to
~/.gstack/projects/$SLUG/evals/ so each project's eval history lives
alongside its other gstack data. Falls back to legacy path if slug
detection fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync package.json version with VERSION after merge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add WorktreeManager for isolated test environments
Reusable platform module (lib/worktree.ts) that creates git worktrees
for test isolation and harvests useful changes as patches. Includes
SHA-256 dedup, original SHA tracking for committed change detection,
and automatic gitignored artifact copying (.agents/, browse/dist/).
12 unit tests covering lifecycle, harvest, dedup, and error handling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate worktree isolation into E2E test infrastructure
Add createTestWorktree(), harvestAndCleanup(), and describeWithWorktree()
helpers to e2e-helpers.ts. Add harvest field to EvalTestEntry for
eval-store integration. Register lib/worktree.ts as a global touchfile.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: run Gemini and Codex E2E tests in worktrees
Switch both test suites from cwd: ROOT to worktree isolation.
Gemini (--yolo) no longer pollutes the working tree. Codex
(read-only) gets worktree for consistency. Useful changes are
harvested as patches for cherry-picking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip symlinks in copyDirSync to prevent infinite recursion
Adversarial review caught that .claude/skills/gstack may be a symlink
back to the repo root, causing copyDirSync to recurse infinitely
when copying gitignored artifacts into worktrees.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.11.12.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: relax session-awareness assertion to accept structured options
The LLM consistently presents well-formatted A/B choices with pros/cons
but doesn't always use the exact string "RECOMMENDATION". Accept
case-insensitive "recommend", "option a", "which do you want", or
"which approach" as equivalent signals of a structured recommendation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>