fix: Codex compatibility — 1024-char cap, duplicate skills, repo-local installs, kiro support (v0.11.2.0) (#346)
* fix: cap gstack skill descriptions for codex (#251)
Compresses SKILL.md.tmpl root description to <1024 chars (Codex token limit).
Adds description-length validation test. Includes /autoplan in compressed
skill list (added since PR was branched).
Co-authored-by: cweill <cweill@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip sidecar dir in Codex skill linking (#269)
Adds guard to skip .agents/skills/gstack in link_codex_skill_dirs() —
it's a runtime asset sidecar, not a standalone skill. Prevents duplicate
skill discovery and symlink overwriting.
Fixes #261
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: generate .agents directory at setup time instead of shipping duplicates (#308)
Removes 14K+ lines of committed generated Codex skill files from git.
.agents/ is now gitignored and generated at setup time via
`bun run gen:skill-docs --host codex`. Updates CI workflow to validate
generation instead of checking committed file freshness.
Co-authored-by: cskwork <cskwork@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: avoid duplicate Codex skill discovery (#236)
Adds migrate_direct_codex_install() to move old direct installs from
~/.codex/skills/gstack to ~/.gstack/repos/gstack. Adds
create_codex_runtime_root() to expose only runtime assets (bin/, browse/,
review files) via symlinks instead of symlinking the entire repo.
Fixes #235
Co-authored-by: shichangs <shichangs@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: support repo-local Codex installs (#317)
Changes gen-skill-docs.ts to use dynamic $GSTACK_ROOT/$GSTACK_BIN/$GSTACK_BROWSE
variables in generated Codex preambles instead of hardcoded ~/.codex/ paths.
Renames GSTACK_DIR → SOURCE_GSTACK_DIR/INSTALL_GSTACK_DIR throughout setup for
clarity. Supports both global (~/.codex/skills/) and repo-local (.agents/skills/)
Codex installs.
Co-authored-by: pengwk <pengwk@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add --host kiro support to setup script (#309)
Adds Kiro CLI as a supported agent platform. Setup detects kiro-cli,
copies+sed-rewrites SKILL.md paths from Codex/Claude to Kiro format,
and symlinks runtime assets (bin/, browse/).
Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add sidecar skip, GSTACK_ROOT, and kiro coverage (T1-T3)
Adds 3 tests identified during CEO/Eng review:
- T1: link_codex_skill_dirs() contains sidecar skip guard
- T2: generated Codex preambles use dynamic $GSTACK_ROOT paths
- T3: setup supports --host kiro with INSTALL_KIRO and sed rewrites
Also fixes existing test to expect kiro in --host case statement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review fixes — ETHOS.md, runtime root, repo-local guard, kiro assets, upgrade paths
Paranoid 4-pass review found 7 issues, all fixed:
- Add ETHOS.md to create_codex_runtime_root
- Clean old real dirs (not just symlinks) on upgrade
- Skip runtime root for repo-local installs (prevent self-referential symlinks)
- Add review/, ETHOS.md, gstack-upgrade/ to Kiro install
- Update gstack-upgrade to detect ~/.gstack/repos/ and .agents/skills/
- Guard --host without value from silent exit
- Fix Kiro sed patterns + timeout instruction in gen-skill-docs.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.11.2.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove last tracked .agents/ file from git index
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: cweill <cweill@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-authored-by: cskwork <cskwork@users.noreply.github.com>
Co-authored-by: shichangs <shichangs@users.noreply.github.com>
Co-authored-by: pengwk <pengwk@users.noreply.github.com>
Co-authored-by: AnshulDesai <AnshulDesai@users.noreply.github.com>
fix: community security + stability fixes (wave 1) (#325)
* feat: add /cso skill — OWASP Top 10 + STRIDE security audit
* fix: harden gstack-slug against shell injection via eval
Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output
to prevent shell metacharacter injection when used with eval.
Only affects self-hosted git servers with lax naming rules — GitHub
and GitLab enforce safe characters already. Defense-in-depth.
* fix(security): sanitize gstack-slug output against shell injection
The gstack-slug script is consumed via eval $(gstack-slug) throughout
skill templates. If a git remote URL contains shell metacharacters
like $(), backticks, or semicolons, they would be executed by eval.
Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and
BRANCH before output. This preserves normal values while neutralizing
any injection payload in malicious remote URLs.
Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm
After: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf-
* fix(security): redact sensitive values in storage command output
The browse `storage` command dumps all localStorage and sessionStorage
as JSON. This can expose tokens, API keys, JWTs, and session credentials
in QA reports and agent transcripts.
Fix: redact values where the key matches sensitive patterns (token,
secret, key, password, auth, jwt, csrf) or the value starts with known
credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.).
Redacted values show length to aid debugging: [REDACTED — 128 chars]
* fix(browse): kill old server before restart to prevent orphaned chromium processes
When the health check fails or the server connection drops, `ensureServer()`
and `sendCommand()` would call `startServer()` without first killing the
previous server process. This left orphaned `chrome-headless-shell` renderer
processes running at ~120% CPU each.
After several reconnect cycles (e.g. pages that crash during hydration or
trigger hard navigations via `window.location.href`), dozens of zombie
chromium processes accumulate and exhaust system resources.
Fix: call `killServer()` on the stale PID before spawning a new server in
both the `ensureServer()` unhealthy path and the `sendCommand()` connection-
lost retry path.
Fixes #294
* Fix YAML linter error: nested mapping in compact sequence entries
Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations.
This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior.
* fix(security): add Azure metadata endpoint to SSRF blocklist
Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the
existing AWS/GCP endpoints. Closes the coverage gap identified in #125.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add coverage for storage redaction
Test key-based redaction (auth_token, api_key), value-based redaction
(JWT prefix, GitHub PAT prefix), pass-through for normal keys, and
length preservation in redacted output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add community PR triage process to CONTRIBUTING.md
Document the wave-based PR triage pattern used for batching community
contributions. References PR #205 (v0.8.3) as the original example.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: adjust test key names to avoid redaction pattern collision
Rename testKey→testData and normalKey→displayName in storage tests
to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key').
Also generate Codex variant of /cso skill.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.9.10.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)
Absorb Anthropic's security-review false positive filtering into /cso:
- 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only,
regex injection, race conditions unless concrete, etc.)
- 9 precedents (React XSS-safe, env vars trusted, client-side code
doesn't need auth, shell scripts need concrete untrusted input path)
- 8/10 confidence gate — below threshold = don't report
- Independent sub-agent verification for each finding
- Exploit scenario requirement per finding
- Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso
Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials
and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block,
removes bloated "What's new" section, adds /cso to skills table and install.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage
- Exclusion #10: test files must verify not imported by non-test code
- Exclusion #13: distinguish user-message AI input from system-prompt injection
- Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs): correct skill counts, add /autoplan to README tables
Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28).
Added /autoplan to specialist table. Fixed troubleshooting skills list
to include all skills added since v0.7.0.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(browse): DNS rebinding protection for SSRF blocklist
validateNavigationUrl is now async — resolves hostname to IP and checks
against blocked metadata IPs. Prevents DNS rebinding where evil.com
initially resolves to a safe IP, then switches to 169.254.169.254.
All callers updated to await. Tests updated for async assertions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(browse): lockfile prevents concurrent server start races
Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent
TOCTOU race where two CLI invocations could both kill the old server and
start new ones, leaving an orphaned chromium process. Second caller now
waits for the first to finish starting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(browse): improve storage redaction — word-boundary keys + more value prefixes
Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats
_ as word char). Now correctly redacts auth_token, session_token while
skipping keyboardShortcuts, monkeyPatch, primaryKey.
Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-),
Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim
5 templates and 2 bin scripts still used eval $(gstack-slug). All now use
source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3
CHANGELOG entry that falsely claimed eval was fully eliminated — it was
the output sanitization that made it safe, not a calling convention change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs): add /autoplan to install instructions, regen skill docs
The install instruction blocks and troubleshooting section were missing
/autoplan. All three skill list locations now include the complete 28-skill
set. Regenerated codex/agents SKILL.md files to match template changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.11.0.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs(cso): add disclaimer — not a substitute for professional security audits
LLMs can miss subtle vulns and produce false negatives. For production
systems with sensitive data, hire a real firm. /cso is a first pass,
not your only line of defense. Disclaimer appended to every report.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Orkun Duman <orkun1675@gmail.com>
feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226)
* refactor: host-aware gen-skill-docs + --host codex generation
Refactor gen-skill-docs.ts for multi-agent support:
- Add Host type, HostPaths interface, HOST_PATHS config
- Decompose generatePreamble() into 7 composable sub-functions
- Replace all hardcoded .claude/skills/gstack paths with ctx.paths
- Replace static findTemplates() list with dynamic filesystem scan
- Add --host codex|agents flag (aliases, same output)
- Add processTemplate host routing to .agents/skills/gstack-*/
- Add codexSkillName() with double-prefix prevention
- Add transformFrontmatter() — keeps only name + description for Codex
- Add extractHookSafetyProse() — converts hooks to inline advisory
- Add body text path rewriting for remaining hardcoded paths
- Exclude /codex skill from Codex generation (self-referential)
Claude output is unchanged (verified via --dry-run).
SKILL.md is an open standard: .agents/skills/ works on Codex, Gemini CLI, and Cursor.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: generate Codex/Gemini/Cursor skills into .agents/skills/
Generated 21 skill files for the open SKILL.md standard:
- Output: .agents/skills/gstack-*/SKILL.md (one per skill)
- Frontmatter: name + description only (no allowed-tools/version)
- No .claude/skills/ paths in any generated file
- /codex skill excluded (Claude wrapper, self-referential on Codex)
- Hook skills (careful/freeze/guard) get inline safety prose
- Build script generates both hosts: bun run build
Supported agents (all read .agents/skills/):
- Codex CLI
- Gemini CLI
- Cursor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: dual-host setup + find-browse for Codex/Gemini/Cursor
- setup: add --host codex|claude|auto flag, install to ~/.codex/skills/
when targeting Codex, auto-detect installed agents
- find-browse: priority chain .codex > .agents > .claude (both
workspace-local and global)
- dev-setup/teardown: create .agents/skills/gstack symlinks for dev mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: Codex generation tests + CI + docs for multi-agent support
Tests (28 new):
- Codex output path routing, frontmatter validation (name+description only)
- No .claude/skills/ path leaks in Codex output (regression guard)
- /codex skill exclusion, hook→prose conversion, multiline YAML
- --host agents alias, dynamic template discovery
- Codex skill validation + $B command validation
- find-browse priority chain verification
- Replace static ALL_SKILLS list with dynamic filesystem scan
CI:
- Add Codex freshness check to skill-docs workflow
Docs:
- AGENTS.md: Codex-facing project instructions
- README: multi-agent installation section
- CONTRIBUTING: dual-host development workflow
- CHANGELOG: v0.9.0 multi-agent support entry
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: Codex E2E test harness — verify skills work on Codex CLI
New test infrastructure:
- CodexSessionRunner: spawns codex exec, parses JSONL stream, returns
structured results (output, reasoning, toolCalls, tokens)
- JSONL parser ported from Python (codex/SKILL.md.tmpl) to TypeScript
- Temp HOME skill installation for Codex discovery testing
E2E tests (gated behind EVALS=1 + codex + OPENAI_API_KEY):
- codex-discover-skill: installs skill, verifies Codex finds it
- codex-review-findings: runs gstack-review via Codex, validates output
Integrates with existing eval infrastructure:
- Diff-based test selection via touchfiles
- Eval persistence via EvalCollector
- bun run test:codex / test:codex:all convenience scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: bump VERSION to 0.9.0 to match CHANGELOG
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Codex sidecar paths + setup installs generated skills
Two bugs found by Codex adversarial review:
1. Sidecar path mismatch: generated Codex skills referenced
.agents/skills/gstack-review/checklist.md but setup creates
sidecars at .agents/skills/gstack/review/. Fixed path rewriter
to emit .agents/skills/gstack/review/ (matching setup layout).
2. Setup installed Claude-format source dirs for Codex global
install instead of the generated Codex-format skills. Split
link_skill_dirs into link_claude_skill_dirs (source dirs for
Claude) and link_codex_skill_dirs (generated .agents/skills/
gstack-* dirs for Codex).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: comprehensive Codex path rewriting + setup install tests
17 new tests covering:
- Sidecar path rewriting: .claude/skills/review → .agents/skills/gstack/review/
(catches the bug where checklist.md was unreachable at gstack-review/)
- All 4 path rewrite rules tested individually across all skills
- Greptile triage sidecar path correctness
- Ship skill sidecar paths for pre-landing review
- Claude output regression guard: zero Codex paths in any Claude skill
- Setup script validation: separate link functions for Claude vs Codex,
link_codex_skill_dirs reads from .agents/skills/, create_agents_sidecar
links runtime assets (bin, browse, review, qa)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: regenerate Codex skills after investigate rename merge
Remove stale gstack-debug, add gstack-investigate, regenerate all
Codex skills to pick up changes merged from main (investigate rename,
platform-agnostic templates, review helpers).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Codex E2E uses ~/.codex/ auth, not OPENAI_API_KEY
- Remove OPENAI_API_KEY gate from test prerequisites
- Copy real ~/.codex/ auth config into temp HOME so codex can authenticate
- Increase review test timeout to 540s (codex does thorough 60+ tool call reviews)
- Document in CLAUDE.md that Codex uses its own auth config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41)
* refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata
- NEW: browse/src/commands.ts — command sets + COMMAND_DESCRIPTIONS + load-time validation (zero side effects)
- server.ts imports from commands.ts instead of declaring sets inline
- snapshot.ts: SNAPSHOT_FLAGS array drives parseSnapshotArgs (metadata-driven, no duplication)
- All 186 existing tests pass
* feat: SKILL.md template system with auto-generated command references
- SKILL.md.tmpl + browse/SKILL.md.tmpl with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders
- scripts/gen-skill-docs.ts generates SKILL.md from templates (supports --dry-run)
- Build pipeline runs gen:skill-docs before binary compilation
- Generated files have AUTO-GENERATED header, committed to git
* test: Tier 1 static validation — 34 tests for SKILL.md command correctness
- test/helpers/skill-parser.ts: extracts $B commands from code blocks, validates against registry
- test/skill-parser.test.ts: 13 parser/validator unit tests
- test/skill-validation.test.ts: 13 tests validating all SKILL.md files + registry consistency
- test/gen-skill-docs.test.ts: 8 generator tests (categories, sorting, freshness)
* feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding
- scripts/skill-check.ts: health summary for all SKILL.md files (commands, templates, freshness)
- scripts/dev-skill.ts: watch mode for template development
- test/helpers/session-runner.ts: Agent SDK wrapper for E2E skill tests
- test/skill-e2e.test.ts: 2 E2E tests + 3 stubs (auto-skip inside Claude Code sessions)
- E2E tests must run from plain terminal: SKILL_E2E=1 bun test test/skill-e2e.test.ts
* ci: SKILL.md freshness check on push/PR + TODO updates
- .github/workflows/skill-docs.yml: fails if generated SKILL.md files are stale
- TODO.md: add E2E cost tracking and model pinning to future ideas
* fix: restore rich descriptions lost in auto-generation
- Snapshot flags: add back value hints (-d <N>, -s <sel>, -o <path>)
- Snapshot flags: restore parenthetical context (@e refs, @c refs, etc.)
- Commands: is → includes valid states enum
- Commands: console → notes --errors filter behavior
- Commands: press → lists common keys (Enter, Tab, Escape)
- Commands: cookie-import-browser → describes picker UI
- Commands: dialog-accept → specifies alert/confirm/prompt
- Tips: restore → arrow (was downgraded to ->)
* test: quality evals for generated SKILL.md descriptions
Catches the exact regressions we shipped and caught in review:
- Snapshot flags must include value hints (-d <N>, -s <sel>, -o <path>)
- is command must list all valid states (visible/hidden/enabled/...)
- press command must list example keys (Enter, Tab, Escape)
- console command must describe --errors behavior
- Snapshot -i must mention @e refs, -C must mention @c refs
- All descriptions must be >= 8 chars (no empty stubs)
- Tips section must use → not ->
* feat: LLM-as-judge evals for SKILL.md documentation quality
4 eval tests using Anthropic API (claude-haiku, ~$0.01-0.03/run):
- Command reference table: clarity/completeness/actionability >= 4/5
- Snapshot flags section: same thresholds
- browse/SKILL.md overall quality
- Regression: generated version must score >= hand-maintained baseline
Requires ANTHROPIC_API_KEY. Auto-skips without it.
Run: bun run test:eval (or ANTHROPIC_API_KEY=sk-... bun test test/skill-llm-eval.test.ts)
* chore: bump version to 0.3.3, update changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add ARCHITECTURE.md, update CLAUDE.md and CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: conductor.json lifecycle hooks + .env propagation across worktrees
bin/dev-setup now copies .env from main worktree so API keys carry
over to Conductor workspaces automatically. conductor.json wires up
setup and archive hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: complete CHANGELOG for v0.3.3 (architecture, conductor, .env)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>